The good, the bad, and the ugly of data Science and ML in tech/biotech/academia -- version 1.0

About the co-author

Tommaso Dreossi is a Staff ML Scientist at insitro, specializing in phenotyping and disease modeling with a focus on neurology. Before joining insitro, Tommaso worked at Amazon Search and AI, where he developed large ML models for ranking. He completed postdoctoral research at UC Berkeley, focusing on ML for computer vision. Tommaso obtained his Ph.D. in computer science from Joseph Fourier University in Grenoble and the University of Udine, Italy.

Introduction

Navigating your career as a computational scientist requires more than just technical knowledge. Well funded bigtech vs mission oriented biotech vs independence focused academia offer you differing and unique experiences.

In this post, we will cover the crucial differences between project ownership, functional/cross functional collaboration, managerial roles, data generation, and day to day activities that define these industries. We will also try to cover the wet/dry lab resourcing and how much and what kind of impact your models might produce.

Pro Tip: The views expressed in this post are the opinions of the authors. Please note that there are likely exceptions to each of these topics in each area. Your mileage will vary.

Massive table of many things:

Instead of a long-format comparison between big tech, biotech, and academia, we’ve distilled a table organized around four key themes that capture what you can expect in each of these environments:

  1. Collaboration and roles: Who will you collaborate with and what roles will you play? This theme explores team dynamics, project ownership and cross-functional collaboration.
  2. Data management: What type of data will you handle and what tools will you use? Here we cover how you will wrangle with data and what type of support you will receive.
  3. Modeling and tasks: What type of models will you deploy and what impact will they have? This theme focuses on the nature of problems you’ll be solving, the methods you’ll employ, and how your work will drive value.
  4. Resources and infrastructure: What computing and financial resources will support your work? In this theme we examine how infrastructure, computing resources, and financial stability of your organization affect your projects.

By breaking down these themes, we aim to provide a clear framework for understanding how your career might unfold across different sectors.

ThemeTopicBig TechTech Bio/BiotechAcademia
Collaboration and RolesProject Ownership & Impact: Do I own the project and will it have a large impact across the organization ?Low ownership; impact depends on project novelty. Teams have redundancy.High ownership; responsibility spans data generation to modeling and inference.High ownership; impact is narrower.
Cross-functional Collaboration: Who do I work with on a day-to-day basis besides my teammates?PMsLab scientists, SWEs, BD, PMs, PPMsPhD candidates, postdocs in other labs
Role of the Manager: What should I expect my manager to be doing at a bare minimum?Career growth, cross-functional collaboration, process management.Career growth; trust in small teams, scientific strategy in larger ones.Scientific output management, fundraising.
Data ManagementData Generation: Can I generate custom datasets for my models?Limited custom dataset generation.High, but focused on specific assay types.Limited; dependent on external lab collaborations.
Data Pipeline & Tooling: How much data munging should I be expecting to do?SWE support; high-quality internal/paid tools.Internal SWE teams; some public tools; possible DIY solutions.DIY or external APIs; free-tier tools.
Modeling and TasksTypes of Problems: What drives what types of problems we work on? What are the team mandates and timelines for delivery?Revenue-driven; clear KPIs.Biological/chemical exploration; KPIs rarely tied to model performance.Maximize in silico performance; regular publishing.
Day-to-Day Modeling Efforts: What will my day-to-day modeling work look like?Fine-tuning internal models; some R&D.Fine-tuning on lab datasets; incorporate experimental nuances; validate predictions.Novel methods development on fixed datasets.
Impact of Model Improvement: How much does a 1% increase in performance metric improve things?High; small gains can significantly impact revenue.Low to medium; small gains have limited downstream effects.High; can lead to state-of-the-art results and publications.
Resources and InfrastructureFunding & Financial Stability: How much of a runway does my company have? What internal/external factors such as central bank rate, stock performance, catalysts might be upcoming.Large, stable.Moderate; stability varies.Limited; grant-dependent.
Compute Resources & Infrastructure: How much compute resources /infrastructure should I expect?High; cutting-edge resources.Moderate; varies by company.Limited; shared resources.

Summary

Jeff Bezos has a famous framework on reversible and irreversible decisions, arguing that reversible decisions should be made quickly, while irreversible ones require careful consideration. Choosing the right place to work falls somewhere in between. We hope that this post serves as a guide to help computational scientists navigate the various facets of career choices in big tech, biotech, and academia. While these might be helpful, remember that careers are long, and it’s normal to make mistakes or pick the wrong path at times. There will always be new opportunities, and in the end, you will be okay.

Mohammad Muneeb Sultan
Mohammad Muneeb Sultan
Bio/Chem ML Researcher

My research interests include computational chemistry, protein design, generative models, and artificial intelligence.