Observability and Incident Response in LLM Systems by Manish Tuteja

September 10

2:00 - 2:45 pm

Table 4

Summary

AI models are powerful, but production systems aren't predictable. Deploying LLMs at scale can be maddening, yet transformative when you get it right. In this session, we'll learn how to monitor and respond to incidents where traditional observability falls short - dealing with non-deterministic outputs, semantic drift, and quality regressions that standard health checks miss. We'll explore how metrics, alerting, and incident response must be reimagined for systems that are using LLMs. You'll find these LLM observability challenges surprisingly universal and applicable to your own production systems.

Speakers

Manish Tuteja

Senior Director of Engineering @ Bill.com

Manish is Sr. Director of Engineering at BILL, where he oversees engineering operations for the Core Account Payables Product, a platform that streamlines AP automation and financial workflows for SMBs. During his tenure at BILL, he scaled the Accounts Payable engineering organization 10x and successfully launched key product initiatives like Procurement, Multi Entity, 1099 tax filing and AI-assisted automation features. Manish progressed through engineering management positions at BILL, leading "Leadership & Career Development" Mentorship program and multiple business initiatives along the way including migration of 80K customers to modern frontend tech stack during the company's IPO preparation, and previously served as a Software Engineer at Cisco Systems working on smart licensing solutions.