Jitendra Mudhol: Thank you for this interview. Let’s get to the beginning. I regard you as a Mathematician investing in AI and Machine Learning. So, how did you get into Math?
Lisha Li: Appreciate the question! I hold being a mathematician dear to my identity! When I started college, I initially tried International Relations, because I was interested in history and politics, but also perhaps to “rebel”. My parents are both very technical and were engineering professors in China, so I wanted to do something different. I realized quickly though, that Math was the hardest thing that I could train my mind on and it would be the most difficult to learn on my own compared to other subjects. I also knew, that to understand anything in Theoretical Physics properly (which I always had an interest in) I would have to learn more Math. For instance, to understand Gauge Theory, Quantum Physics, I would have to use Math, the formal language it is communicated in.
JM: Now, that was about the time when Geoff Hinton’s work was getting recognized. I am curious … was there any interaction? Any influence or pull?
LL: Yes, it was an exciting time to be in Toronto! Not through Hinton specifically, but I did know about Neural Networks. I had friends in CS taking courses on it (the Math department sat on one floor of the CS Department, this beautiful open concept building). If I were to rewind again, I would still specialize in Pure Math with a minor in Philosophy. What I was optimizing for then, was flexibility of thinking. What I studied gave me the flexibility to reason and carry me in any direction, and since I did not have a stable kind of interest – academia or industry, it seemed like the most sensible way to train my mind. With that said, I think I should have probably listened to my CS friends about learning how to code earlier … it wouldn’t have hurt. Especially in hindsight, it was such an opportunistic time to be in Toronto!
JM: I take it, you are still in regular touch with the Toronto folks who are active in Machine Learning?
LL: Yes. There are a lot of U of T graduates who are doing cutting-edge work in Deep Learning in both industry and academia here. And also a great amount of work is still going on back in Toronto. I’m really proud of what the Canadian government nurtured back there!
JM: After University of Toronto, you came to UC Berkeley for your PhD. Again, what drove you to choose Deep Learning and Probability applied to clustering in Graph Theory?
LL: I always wonder about my mindset back then … I was very much married to the idea of Academia and studying Pure Math was such a luxury! When you are studying Math versus doing research in it, you get the privilege of absorbing everything that was developed over a long period of time; it’s in a pristine state and you’re very much a consumer of this body of work, which is quite different from research.
For my Ph.D., I had the choice of either going to Oxford for Philosophy in Physics or Berkeley for Pure Math and Logic Programming. Knowing myself and my varied other interests outside of Academia, I thought Berkeley was the best choice in case I wanted to do anything else. To be fair, I think Oxford has it going for Deep Learning as well ... so, had I chosen it, I wouldn’t have fallen too far from the tree, which is funny how life plays out. I wanted to optimize on the subject I cared about then with the possibility of extending … a little later I realized I did not want to stay in Academia, and Berkeley has an absolutely fantastic Statistics Department. So, I transferred formally to Statistics to ensure that Probability and Machine Learning was part of my study.
JM: Since you brought up Optimization, allow me to jump a little ahead. Is there a crude analogy to the Attention Modeling (Soft Attention vs. Hard Attention)?
LL: I wish I could tell such a cohesive story … to be honest, I knew I needed to train my mind with a tool that was flexible, and that was Math. In another vein, I wanted to enjoy Academia and doing a Ph.D. allowed me to explore intellectual interests. I would not continue if it fell off my interest realm.
Maybe it was both short-term and long-term optimization. I don’t want to claim that I had an overly clear picture, though that’s easy to argue so in retrospect. I had a lot of rich interactions with friends after arriving in Berkeley, some of who graduated and did not choose Academia: they were having a great time doing applied work in the industry. After my internship atStitch Fix and Pinterest, it was clear to me that I enjoy being in a place where there was a faster cadence to problem-solving, and more exposure to solving problems that have more impact.
One of the things I tend to optimize for is interesting interactions with intellectually engaging folks – optimize for learning and interesting people!
JM: Are you still in touch with Prof David Aldous and Prof Joan Bruno?
LL: Yes, I was very fortunate to have Prof. David Aldous and Prof. Joan Bruna as my mentors. I was David’s last student, he was a legend in probability theory! Joan also recently organized a conference on Deep Learning Techniques at IPAM that I attended, we still have a paper in submission to the conferences.
JM: Why did you choose Pinterest for your internship?
LL: At that time, I was just getting into Statistics and coding; I had two choices – Pinterest and Uber. Just a toss-up, as both the teams that I was going to work with, were fantastic and the work was interesting. Pinterest had a reputation for being a very interesting product consumer-wise, and with a visual dataset not found anywhere else. I wanted to optimize on something that was not so research-oriented: How do data-science driven decisions impact people, their product choices, and strategy?
I was the first intern in Pinterest’s LL: The person I chose to work with there was on the same wavelength as me in terms of the product heavy focus I wanted from the internship. My mentor John McDonnell was just fantastic. I wanted to understand the product and the business. This fell under ‘client algorithms’ – if there’s a segment that’s very valuable to the business, then how do I identify the causal levers to get there, and how do I communicate with Marketing and Strategy arms of the company these insights? I heard this is being revived again internally because growth is high-priority.
JM: How much does Math influence your life and daily decision making? Does being a Mathematician help you look at the world differently?
LL: I absolutely think so. There are so many ways it touches upon day-to-day life. On a basic level, it gives me the endurance to work on a problem that I don’t know has a solution. Being able to grind through tough things transfers well to solving other problems.
It also gives you the ability to think with structures that are not learned in other subjects. It stretches your mind in that sense because in math the pure form of the idea is what makes novel work. On the point of how to transfer this to my day-to-day life, I learned the following from a longtime friend Eric Weinstein: use the same metaphors and scientific structures learned in these subjects like Math and Theoretical Physics to enrich the conversation. In some sense, it means to never talk down. If I have spent all this time to acquire a rich vocabulary then why shy away from using those tools to organize my thoughts. Put in another way, if you have a problem, how do you frame that problem with views from different fields to solve it?
JM: Could you clarify that more? Math seems to provide such a solid grounding through theorems, axioms and concrete proof, while business problems are fuzzy. Is there a disconnect there?
LL: By Math, I don’t mean just formal axioms … etc. It’s more a case of reasoning about structures in a formal way and exploring that space with logical inference tools. This sometimes lends itself to quantifying uncertainty (and, allowing for fuzziness). I really mean Math in this very broad sense.
JM: How about Investing? How do you view investing in Machine Learning using your Math lens?
LL: I think we have a pretty long view on Machine Learning startups. There is a lot that needs to be built out in this area in terms of companies and infrastructure.
I don’t know if I want to comment in general on the VC model. I will make do with the current capital incentives we have. In terms of how I approach investing, there’s a lot of risks here, so you can’t have a very deterministic view. Or you can’t look back and pat yourself on the back for every decision that led to a successful portfolio … To what extent is it due to luck and how much due to merit is hard to quantify. That said, I think there is some pattern matching and rules that I have picked up, that have been helpful to map to the profile of risk I am willing to assume.
I do very early stage investing – so, in these startups, you don’t have the typical revenue metrics. They are oftentimes pre-product and thus pre-product-market fit. You have to evaluate the team and a market the team plans to target. Some things worth assessing:
Is this the best team to tackle this?
Not just technically speaking but also regarding its empathy for customer needs and the ability to think through use cases.
Also is the market big enough or interesting enough?
I think the most value I add as an investor, is to help first-time entrepreneurs make connections with potential customers, help them spec out the right product-market fit and get them to an investable state.