Data has a better idea

Sandeep Murthy , Monish Pathare
7th June 2022

We believe in a hybrid future of a data driven venture capital fund, with our products like The Gallery and Cerebro. We are building towards a future where fund managers and technology can work seamlessly together, driving greater value. As we look forward to leveraging machine learning algorithms using the data we have collected, data would work hand in hand with our traditional venture capital process making end to end processes more efficient

In one line, venture capital is primarily investors supporting early and growth stage companies that are building innovative and disruptive businesses. It is an industry that has been largely driven by qualitative signals and network effects, but in the recent decade we are seeing a gradual transformation in how venture capital funds operate. Advancements in technology and data analytics over the past two decades have paved the way for “disruption” to find its way back into the venture capital industry. The abundance of private company data from platforms like PitchBook, Tracxn and Crunchbase (which didn’t exist 20 years ago) has enabled VCs to build models to integrate data into the various stages of the venture capital investment process, right from sourcing & screening to aiding the portfolio post their investment.

Globally, we are seeing some funds diving into this trend; Veronica Wu (ex-managing partner, Hone Capital) in her interview with McKinsey & Co described how she and her team built a machine learning model to identify 20 characteristics for seed deals that are most predictive of future success, this model fed into their screening process to recommend deals that showed promise. Social Capital built an algorithmic process to invest in startups without meeting them - Companies fill up a questionnaire from anywhere in the world, and if the firm’s algorithm scored the company well, the fund would write cheques up to $250,000 as an investment. Tribe Capital built a report called the Magic 8 Ball report to objectively measure product-market fit and have developed frameworks to do so using retrospective company data. And the list goes on… Of the venture capital funds that take a data-driven approach to investing, most generate top quartile returns against the Cambridge venture capital benchmarks for their respective vintage years, which albeit early to be conclusive, is a remarkable feat.

At Lightbox, we agree that venture capital investing is long overdue to be disrupted, being strong promoters of using tech at our portfolio companies we believe in practicing what we preach – now… do we think we will have an AI like J.A.R.V.I.S that would manage everything for us, and we’d be off building a suit of armor… Maybe someday, but today given that we don’t have Tony Stark on our team, we have identified key areas where we could bring in data and automation to help us improve our operations keeping in mind our core philosophy of building and not betting.

Let us walk you through some of the areas we’ve identified and how our in-house products help us make smarter and faster decisions every day. We have built each product with use cases very specific to us and how we function as a team.


Sourcing and screening


“You miss 100 percent of the shots you don't take.”

- Hockey Hall of Famer, Wayne Gretzky


The VC equivalent for that would be “You miss 100% of the deals you don’t evaluate." In the year 2021-22, the number of new recognized startups in India was at over 14,000 up from 733 in 2016-17 based on a survey by DPIIT) -  that’s approximately 40 new startups each day! Each one a potential billion-dollar idea!

At an early stage in the fund, we preempted this explosion in companies and business ideas, but we were faced with one conundrum: how do we tap into this pool of ideas once they had started gaining some traction? 

We could connect with known founders through introductions via our mutual connects, but we had to solve for those founders who did not have an introduction to us. All this while ensuring each one of us at the fund had an opportunity to evaluate the deal and weigh in on the decision to invest, and that was the origin of our very first product “The Gallery” our insights and deal flow management platform. The process works in two parts – First, Pitch Us and Second, The Gallery.

No words could explain Pitch Us better than this video with Sid Talwar, Partner at Lightbox:


The Pitch Us form helps us get the key information about the business in a uniform and structured manner helping us evaluate each company purely on its business idea, team and its relevant KPIs, while ensuring the idea is not lost in somebody’s inbox at the fund…

And the cool app you saw in the video – Yes, that’s “The Gallery”. We believe each member on the team brings a unique perspective to every deal, so everyone - across all functions of the fund - votes on every company that applies to us and if you get through the first stage, you get to meet all of us during your pitch - this helps us get a preliminary understanding of each company as well as helps us mitigate any biases that we may unknowingly have


“The venture capital business is a 100% game of outliers- it’s extreme exceptions.”

– Marc Andreessen


The process doesn’t just help us structure our deal flow, but the data collected also helps us understand the startup ecosystem better via market and ecosystem analytics. Combining the data we get from Gallery  with data from third party platforms like Twitter, LinkedIn and app stores we get a holistic view on how various trends are shaping up across India while signaling outliers which are performing exceptionally well in comparison to peers from the industry that have historically applied for similar rounds of funding.

A testament of why we believe the process works is “Cityflo”, one of our portfolio companies that approached us through our Pitch Us form. It was also a deal that was voted “yes” on Gallery by everyone at the fund other than the partners. Of course, Cityflo won our hears during their pitch, which eventually resulted in us investing. We would have missed out on the deal, had it not been for the collective evaluation of the deal at the initial stages.


Monitoring and reporting

Once a company has been funded by us, they are generally expected to share monthly updates in the form of a Management Information System (MIS). The MIS is a repository of financial and key operational metrics related to how the business is performing currently and historically – It is THE file that every analyst and associate slice and dice data from to assess and report how the business has performed. Each company has its own template for an MIS; it can be as simple as a profit and loss statement, or extremely exhaustive with granular data across product type, customer type and geographies. In both cases all the numbers shared are crucial to assess the health of a business.

We saw an opportunity to make this process efficient, given that the analysis consists of many repeated tasks followed by a qualitative assessment by the teams. We created a centralized monthly database structure to store all our portfolio metrics. By centralizing the data collected from our portfolio we can apply that data across various use cases, namely monitoring, visualization, anomaly detection, comparing and contrasting metrics across companies and reporting, all at the click of a single button. A couple of hundred lines of python scripts and countless data quality checks later – We built “Cerebro” (Yes, just like the one Professor X uses in the X-Men franchise - just like he uses it to find mutants in the world, we use ours to find trends and anomalies in the data) - the platform uses data analytics and data visualization technology to collect, process and represent all data from varied sources in a structured manner to help streamline performance monitoring, and reporting for Lightbox. Here’s an example of how it looks.


Figure 1: Customer cohorts / Revenue cohorts


Using Python, VBA and Tableau we have been able to transform the way we process and visualize data from our portfolios and also way we monitor our portfolio companies. This means every member in the team can focus their efforts on working with the founders and teams to assist them on their journey.

Figure 2: Birds eye view of the processes


Some real time use cases for Cerebro are:

Monthly portfolio updates

For obvious reasons, we can’t share the actual dashboards we use for our updates; but to give you an idea of what they look like – Here’s a sample P&L statement below.

Figure 3: Example of a dashboard similar to what we use in our portfolio updates (Credits : Link)


Each company in our portfolio, has a unique dashboard of KPIs based on their business focus and initiatives that month. During our internal meetings, we pull up these numbers, along with their historical trends, to add context to the discussions and highlight how the metrics have been impacted based on the business decision made by the companies across different timelines (months, quarters, or years). This enables us to accurately assess how the business has transformed over various time periods adding more flavor to the discussions being had over the qualitative aspects of the business.



With great money comes great responsibility, especially in when it comes to reporting on fund performance and individual company performance. This is usually for monthly/quarterly data requests from our Limited Partners (Individuals and Organizations that have given us money to invest in companies) and other stake holders. These reports are extremely detailed, in some cases requiring historical data spanning multiple financial years.

Traditionally, each team sits with the company MIS, consolidates the data across multiple excel sheets and then once that is done, begins calculating the numbers that were requested in the report. Leading to long turnaround times while, increasing the probability of human error creeping its way into the process. Cerebro took over this process - picking, processing, and inserting the relevant values in the relevant format for each report across all funds and companies. Significantly reducing the man hours required to generate these reports, enabling us to focus more of our time on the portfolio.



Today, venture capital is largely run in its traditional way but we believe that data will play a bigger role going forward. As funds build their own proprietary data sets and find new ways to leverage publicly available data we will see more “data-driven” venture funds, the day might not be far away where artificial intelligence algorithms are a part of the investment team, sourcing and weighing in on which deals to go forward with. But with companies at this early a stage, qualitative aspects are as important as quantitative aspects – namely, the founders and the team carry a disproportionate amount of weightage in the final decision, as they are the ones fighting in the trenches driving success and value at their company. By no means will data replace the human touch but it can help us get better.

At Lightbox, we believe in a hybrid future of a data driven venture capital fund, with our products like The Gallery and Cerebro. We are building towards a future where fund managers and technology can work seamlessly together, driving greater value. As we look forward to leveraging machine learning algorithms using the data we have collected, data would work hand in hand with our traditional venture capital process and serve the purpose of making the end-to-end process more efficient enabling us to focus our efforts on the strategizing and the softer aspects of the industry.

We will keep building on and sharing our thoughts around tech in venture capital while also sharing some deeper level insights and learnings we have got through our existing products through posts like these. If you want to chat about data analytics and technology in venture capital, please reach out to