Beyond Benchmarks: Building & Assessing Generative AI Products in High-Stakes Domains by Zachary Lipton,Will Reed

September 10

4:30-5:00 PM PDT

Summary

Traditional AI lived on simple benchmarks: accuracy, precision, BLEU scores. Generative AI broke that mold. Now, outputs are open-ended, there’s no unique gold standards, and as GenAI has been industrialized, neither datasets nor evaluation suites are shared across vendors. In this session, we’ll look at the science and strategy of evaluation in this new era: how to balance human adjudication with automated metrics, how Goodhart’s Law plays out in practice, and how evaluation itself shapes product development. Drawing on examples from healthcare, we’ll show why getting evaluation right isn’t just an academic concern, it’s the foundation for building products that customers can trust.

Speakers

Zachary Lipton

Co-founder and CTO @ Abridge

Zachary Lipton is Cofounder & CTO at Abridge, the leading platform for AI-based ambient listening technology in healthcare. Abridge’s industry-leading product listens to doctor-patient conversations and ingests reams of content from the EHR, leveraging its intelligent reasoning engine to generate high-quality drafts of after-visit notes & other artifacts. This automation frees up doctors to focus on their patients. He is also the Raj Reddy Associate Professor of Machine Learning at Carnegie Mellon University, where he directs the Approximately Correct Machine Intelligence (ACMI) lab. Research focuses include the theoretical and engineering foundations of robust and adaptive machine learning algorithms, applications to both prediction and decision-making problems in clinical medicine, natural language processing, and the impact of machine learning systems on society.

A key theme in his research is to take advantage of causal structure underlying the observed data while producing algorithms that are compatible with the modern deep learning power tools that dominate practical applications. He is the founder of the Approximately Correct blog and a co-author of Dive Into Deep Learning, an interactive open-source book drafted entirely through Jupyter notebooks that has reached millions of readers. He can be found on X (@zacharylipton), GitHub (@zackchase), or his lab's website (acmilab.org).

Will Reed

General Partner @Spark Capital

Will is a General Partner at Spark Capital, where he focuses on investing in emerging category leaders at the Series B and Series C. His investments at Spark include Abridge, Baseten, Scale AI, Discord, Benchling, Handshake, and Mercury. He joined Spark in 2015 shortly after the firm raised $375M for its first Growth fund, a strategy that has gone on to raise $5B in the subsequent decade. Before Spark, Will was an investor at Welsh, Carson, Anderson & Stowe, a NY-based private equity firm, and an investment banker at BofA Merrill Lynch focused on high-yield credit.