The Engineering Leadership Podcast · Episode 47

Building Technology That Endures

with Melissa Binde

Jul 13, 2021
Melissa Binde, VP Engineering, Cloud @ Splunk, shares lessons from her time building Amazon Apollo. You’ll hear the origin story of Amazon Apollo, how to define the right problem, identify the right solution, and what makes technology endure 20+ years. Plus how to improve your engineering storytelling and pitch the right product features to the right stakeholders!
Listen On
Apple PodcastSpotifyBreakerGoogle PodcastOvercastRadio Public
ELC
NEWSLETTER
EVENTS, PODCASTS AND MORE
ELC
NEWSLETTER
EVENTS, PODCASTS AND MORE

SPEAKER

Melissa Binde - VP of Engineering, Cloud @ Splunk

Melissa Binde previously served as Splunk’s VP of Platform and Observability. Prior to joining the company, Melissa led Google’s GCP Site Reliability organization for almost five years, supporting GCP’s growth from 250M revenue in 2015 to almost 9B in 2019. Before that she led engineering teams at Nordstrom, helping them transition to online and cloud, a cloud startup providing business continuity as a service to SMBs, and several other startups. Ms. Binde began her career as one of Amazon’s first 1000 employees, spending almost ten years there developing tools and technologies that are still part of the company’s core AWS stack. Ms. Binde holds a B.A. from Swarthmore College.

"We were heavily influenced by an early project manager I worked with who called it, the 'JEDI principle' - You make just enough decisions to implement! And so anytime we hit something. We would actually stop, if we were arguing, we'd stop and go, 'Well, wait, hang on. Do we actually have to decide this now? Or can we kick this down the road?' And so that helped us avoid getting too tied up in philosophical arguments."

- Melissa Binde


Special thanks to our exclusive accessibility partner Mesmer!

Mesmer's AI-bots automate mobile app accessibility testing to ensure your app is always accessible to everybody.

To jump start your accessibility and inclusion initiative, visit mesmerhq.com/ELC


Show Notes

  • The origin story behind Amazon Apollo (2:44)
  • Pitching the right features to the right audience (10:48)
  • Solving the right problem vs. what you were asked & the value of owning outdated projects (12:15)
  • Selling, pitching and building buy-in for new projects (16:21)
  • How do you know the problem defined isn't the problem you should solve? (20:56)
  • How to determine if you need a technical, process or organizational solution (27:35)
  • Building enduring tools & technology (30:23)
  • How to become a better storyteller (32:44)
  • Other examples of determining the right type of solution (36:08)
  • Lessons on project naming (38:44)
  • Rapid Fire Questions (40:10)
  • Takeaways (44:50)

Transcript

Melissa, welcome to the engineering leadership podcast. We are so excited to have you here. Thank you so much for joining us.

Melissa Binde: Thank you. I'm really excited to be here! Apologies for the mess behind me. This is also the studio where I make my pens. Which is my main out of work hobby outside of my family. So lots of chaos back there, but it's all in the pursuit of creativity.

Patrick Gallagher: I think that the creativity is inspiring. And it's really cool to see some of the behind the scenes work behind your craft, which is really exciting.

The origin story behind Amazon Apollo

We're really excited to have our conversation, Melissa, because when we first started talking, this all began because I couldn't help but notice a LinkedIn post that you shared about some of the technology that you built at Amazon and it being still critical now almost 15 years later.

And to me that is like a staggering timescale. Because if you think about agile and scrum processes and methodologies, like, I feel like code is revolving all the time and constantly being updated. And so to think about something that has been built and is enduring for 15 years...

Melissa Binde: 20 years at this point!

Patrick Gallagher: Oh, my

Melissa Binde: it was 2001.

Patrick Gallagher: I can't even, I can't even comprehend that! And I believe that was the Amazon Apollo project. And so I was hoping we can kind of open up this conversation to learn a little bit more about the story behind your involvement with Amazon Apollo and why that's been important to you. And how the heck does something like that endure even today, 20 years later?

So bring us into the story. What was the beginning of that project like?

Melissa Binde: Yeah, absolutely. So Apollo started because Amazon at the time, had a script called "Website Push" and it was a shell script or a Perl script. There was a comment at the top that said this won't scale past four onlines. And I'm not sure how many we had at the time, 50, a hundred. One of those numbers at the time seemed very big and today seems very small.

And what it did was it picked up the website binaries and copied them from the staging server off to websites dot A or websites dot B. And then once it was fully uploaded, there was a SIM-link "websites" that flipped between the two.

And this script was duplicated for the supply chain team. It was called "supply chain." Or for the fulfillment center teams, which called distribution centers back then.

And that was the deployment. And it was getting slower and there was this... when we pushed a bug, you had only one chance to fix it. Because if the new version had been uploaded to website dot A... the old version was in dot B. And if you tried to upload a fix, you stomped your working copy of the website.

So there would be con calls for 12 hours where people debated whether or not to roll back or roll forwards. And it wasn't that simple back then, because for example, let's say Oprah had just on her book launch. Well, the content had gone out. And so we would actually violate contracts if we did a roll back to the old version. But yet what were the odds of being able to fix it?

So it was very fraught and the fact that it took a long time to move the bits out to the site because the websites... I mean, this was all the content and everything that ran the Amazon website was all getting pushed out. The time to do that push was measured in hours. And the time to get it out to all the onlines and so should we add more staging servers? But the sheer limit of networks... you know, this was 2001... was in our way.

And so what do we do? And so my boss came to me and said, you know I'd like you to, first of all take over the script.

Our team was the infrastructure tools team. We'd previously done systems automation, we called it "Disco."

We had kind of cleaned up how we did password management you know, all sorts of like, kind of uncool things like that.

Patrick Gallagher: Disco's is a pretty fun name

Melissa Binde: It is, it is I justify that it stood for distributed software configuration. Or systems configuration. But really. I just wanted to call it disco. And we had a disco ball as are our mascot and stuff.

So I got asked to speed up the pushes and what they wanted me to do was to make website pushers bit torrent.

And this seemed like a fine solution if you define the problem as "the bits, aren't moving fast enough."

But I looked at it and I said, I don't think this is a bits moving problem. The issue really is this websites dot A and dot B thing. Like really, we should have more flexibility. And the analogy I used when explaining to people was that we should treat our compiled binaries like we treat our source code. That we love them. We care for them. We understand where each line came from. We have revisions, we understand there's like checkpoints and you know, you can look when you were using Perforce. I think Amazon may still.

But you know, it's actually fairly recently migrated from CVS. But we took good care of these. And maybe we should treat our binaries the same way.

And my management told me "No. don't go build something new. We really just want you to use bit torrent."

And I said, "Oh, okay... !

And then what happened was myself and two other engineers locked ourselves into a room.... it was Christmas. And at Amazon, no one pays attention to you at Christmas. You're either getting orders out the door or no one's paying attention to you.

So we actually had like this three month period where we could just be heads down, we locked ourselves in the conference room, covered the whiteboard with designs, developed this whole set of primitives, like you know, okay. The binaries of libraries, they have versions. We want to put these in collections. Those collections have versions... the whole idea of group and services was very new at the time. The code base was migrating away from OB Dos. And so we had this advent late. We could use this idea of major versions and minor versions and things were compatible or not. We could declare compatibility and it was this really exciting world! That wasn't just, this is the date stamped binary...

So we put all that together. January comes around, the company wakes up and discovers that I'm not in fact doing BitTorrent... and I said, "But we made it so the transport is pluggable. There's an API. So someone else could write a bit torrent client for it..."

I still actually have in my mind sitting in that meeting and getting the look from management and they said, "Okay, well, here's the deal like you have..."

I don't remember if it was two weeks or four weeks, probably four weeks... "so you have four more weeks and then you need to like ship or shut it down."

I said, "Oh, absolutely!"

Knowing completely in my head there's no way we could get it done in four weeks. But you know, forgiveness permission, all that.

So we worked another four weeks. It still wasn't quite ready. I think we pulled something together and pretended that it was like kind of an alpha version.

And we went after the team that was most ignored.

Oh, I'm sorry. I forgot another bit. All of the pushes because website push was a single thing had to go through a central team called Houston. As in Houston, we have a problem with the launch. This also confused new hires who thought that the team was actually in Houston, Texas.

So if you weren't supported... if you weren't the website, no one would pay attention to you and you wouldn't get good support from Houston. You wouldn't get features. You need to blah, blah, blah.

So in order to launch it, we went after THE MOST ignored team in the company, which was a supply chain team. And we said, "Hey, we've got this new thing. We will cater to you. We will do this for you. We will help this work for you... if you'll be willing to be our first customers."

And they said yes, because they were getting ignored by the rest of the company. And so we got them on board, got them successful. And I ended up meeting probably every single person at Amazon because I pitched this to... I did training for every engineer. I pitched this to project managers... TPMS, whatever we called them back then... which may still be what they call them... Pitched it to VPs and sold everyone on this. And then left Amazon about five years later, it was still going strong. And then realized it was still going as the years passed.

And it completely supplanted Houston. And as far as I know, they never did write a Bit Torrent client for it because the problem was not moving bits. The problem was configuration management.

Jerry Li: Thanks for sharing all that because I joined Amazon in 2008, out of college. And with no prior industry experience I was taking for granted of all the nice things Apollo offered. And not until I left Amazon four years later, realized " Wait a minute... This is not something that every company has?!

My current company, have to do all of that, all the heavy lifting to SSH into to the server and do deployment manually and writing scripts. I was so old-fashioned. And then I realized, well how advanced that Apollo is and how much it makes life easier for engineers for deployments. This is the first time I hear just the origin story how Apollo, the system came about. That's really fascinating.

Pitching the right features to the right audience

I think back then if we zoom out little bit, I think after Amazon started the service oriented architecture, then there creates a need for lot more deployments.

I guess that's the context before you were asked to work on that project.

Melissa Binde: Yeah. And if I had been smarter probably would have tried to sell it that way. Right? Which is that this would make services oriented architecture better. But you know, I'm not good at lying. Honestly, we weren't thinking in those terms.

I suspect that it was that we were inspired by the atmosphere around us. Of people breaking the binary into pieces. I mean, there were giant plotter graphs of the hairball that was OB Dos and they would update it as they made progress in the pieces that were being pulled into Groupa. And Apollo was, was somewhat you know, a good idea at the time.

It certainly could have been just as useful in the OB Dos era. But it really was powerful to help get the services oriented architecture out there as teams were reducing their dependencies on each other and being able to deploy individually.

And actually your, comment about traceability. That was one of the things we used to sell. So we sold it to devs says you don't have to block on Houston to get your code out. We sold it to TPMS as " Who did what and when?"

And we sold it to the execs as " who broke the website and at what time and with what code?" Cause there was full traceability all the way back to the lines of code.

So some of it was also you know, pitching the right set of features to the right audience.

Solving the right problem vs. what you were asked & the value of owning outdated projects

Jerry Li: Yeah, I think there are two things, at least that's very fascinating. And one is... you were asked to do one thing. That the one thing that never happened!

And you were able to sort of navigate through that objection or like very different direction how were you able to stick to your own idea and find out that the real problem is not a one you were asked to implement? And how do you working along with those kind of objection and still be able to make progress and eventually be able to demonstrate to folks and convince other people to jump board?

Melissa Binde: I think it would be easy for me to tell this as a hero's story. But it's not. And I think that it's that treating things as hero stories makes it harder for other people to actually learn how to get stuff done. So there were a lot of elements.

One was, I went with my intuition that this wasn't the right answer. But I did it by bringing others into the idea and being willing to share just a really fuzzy idea with some folks first. So it's not like I sat in my head and developed this brilliant idea of what we should build and then, birthed it on the world.

I sat down with two good brainstorming friends and we figured out what should we be doing together? What kind of problem are we actually trying to solve? I think the lone genius is very common in our environment.

The other thing is we didn't try to justify a big team. We built the first Apollo with the three of us and at a handful of other coders in our team. And we did it by me, I was the manager at the time, choosing to stop other work.

So I think also people have a tendency to want to build everything at once. They want to build this massive framework, they're thinking too big at the start. And we built something that solved a very clear isolated problem. We had all of the theory worked out, but we didn't build things we didn't have to build.

We were heavily influenced by an early project manager I worked with who called it, the "Jedi principle." You make just enough decisions to implement!

And so anytime we hit something. We would actually stop, if we were arguing, we'd stop and go, "Well, wait, hang on. Do we actually have to decide this now? Or can we kick this down the road?"

And so that helped us avoid getting too tied up in philosophical arguments.

We also had very different personalities. So I have a tendency to take any problem and expand it to include the known universe and theorize about the unknown universe. And I had someone on the project who was very good at going "yea but no, no, no. Let's focus on this one thing to solve."

And so that combination of personalities helped keep us on a good track.

And finally we didn't try to ever mandate the use of it. So we didn't run into the kinds of fighting you sometimes get. We just went after what customers wanted it.

And there's one other element that I didn't realize at the time that took me a while to learn.

I was initially asked to own website push, while solving the problem. And I thought, "What a terrible idea... why should I own the old thing? I am in charge of the brave new world!"

And there were a few things I learned from that. One, the old icky was once the new shiny. And recognizing that this thing that you're replacing was once humanity's greatest hope. It was the thing that was going to be amazing. Even if the creators knew that there were warts and stuff. This was a cool thing once and respecting that history matters.

And also, if you don't own the old thing, you can't learn the ways in which it's successful or not successful. And you can get too caught up in your own head. Solutions we build to last are not based on things we come up with in our head. They're based on actual observation and implementing solutions to that.

And finally we weren't precious about our code. I cannot imagine there were any original lines of code left in Apollo. Which is also now externalized as AWS Code Deploy.

If there are, I'm very sorry that there are any original lines of code left. As it needed to evolve, we evolved it. As it needed to change, we changed it. And we had some spectacular outages. We designed for as much resiliency as we could. We technically took down the website during code freeze one year over Christmas...

but yeah, so I think that covers it.

Selling, pitching and building buy-in for new projects

Patrick Gallagher: I was hoping we could talk a little bit about the process of selling the idea a little bit more and some of the first conversations and how that evolved into greater adoption.

Because I feel like, especially for huge internal tools for a company at the scale of Amazon, that adoption curve seems like it could be a long process and there's seems like a lot of stakeholders and a lot of hurdles, and a lot of opinions, that you have to navigate throughout that whole process.

So can you tell us a little bit more about the whole selling and pitching the idea and building buy-in for the Amazon Apollo solution?

Melissa Binde: Yeah.

I think one key thing we did was to not make it dependent on broad adoption. So one team could use Apollo and it didn't force other teams to. I think possibly due to sort of social networks and network effects, people often model software these days as, "Oh, it gets better the more people who use it!"

But that also means that your initial few customers who are going to be the people who are telling other people how awesome it is... don't get the full experience. So Apollo worked even if it was just a single team.

And we went after not the most shiny team politically, that would have been going after the website.

We could have had the attitude of, "we got to get the website on board or we'll never be successful."

Instead we went after the team that was completely ignored. The team that was struggling, the team that was desperate for someone to help them.

I think there was also an element of Amazon culture that helped us.

First of all, you could not mandate anything. So we never even had the option of saying, "Oh, well, we'll just get Rick Delzel to go tell everyone to do it." That was never an option.

But also no one had time. Also, this wasn't like that cool problem. How many people really want to write a piece of software that pushes software? So it's not like we had a ton of competition and wasn't that cool problem. And no one had time to be writing other solutions. This was largely an area that just needed to kind of work.

And then the other thing is we put huge time into it. I mean I did nothing else for months.

Everyone was in Seattle back then, but I traveled to every single building we had. I gave presentations. I would do one-on-ones. I would do whatever was necessary. And I tuned my pitch to the audience. So rather than talking about why I thought Apollo was good. I talked about what they would get out of Apollo.

So as a dev, you didn't care about your manager being able to trace why you had an outage. But you really cared about getting your code push.

And this was before all of this agile stuff was quite as cool, Dev Ops wasn't even a word. First time I encountered it, it was a friend saying, "Hey, this Dev Ops thing, I think that's what you were doing at Amazon."

Like, "Oh yeah! Yeah! That is what we were doing!" So you know, the idea of a dev being able to control their own future and push their code. That was so exciting!

And then of course, like everyone else panics. And so you point out all, but it's okay because you can roll it back with a push of a button too.

And I gather that Apollo has huge integrations and stuff internally. We didn't try to build all that on day one. We focused on a problem that needed to be solved and we didn't spend energy building frameworks or integrations, or the one thing we did was make the transport swappable... so that we could plausibly say that if someone wanted to build a Bit Torrent client they could. But the default was using SSH and SAP.

So yeah we didn't overbuild. And we sold focused on the people, focused on what value they would get out of it.

You know, There's an element of marketing that I think, tech folks can easily fall into... "I have the superior technical solution therefore people should use it."

And this was, "Let me go solve a problem for you and show you how I'm solving your problem."

Jerry Li: A big lesson. I hear from this is that being egoless, and also being the owner of the company. Like you are really doing the right thing. Not try to get more credit, just focus on solving the problem.

And helping your early customers to be successful. So I think that mindset has a lot to do with the results. That this is a still ongoing project is really important tool and it's very successful initiative...

Melissa Binde: We never dreamed it would be like this. We were optimistic. We picked the name Apollo. We had a little contest in the team for picking the name. And we wanted to pick one that we thought that people could pronounce, even if they weren't native English speakers. That would be recognizable, that wouldn't conflict with any internal names.

It turns out Apollo is a hard thing to spell. Which we did not anticipate. People got confused about how many P's and how many L's. But it did create a fun, additional mascot, which was that it is a slightly bastardized Spanish for to the chicken, "a pollo." Technically it al pollo but you know...

How do you know the problem defined isn't the problem you should solve?

Patrick Gallagher: I'm really fascinated by the, "how the problem defined doesn't always equal the problem that you should solve." And so I wanted to ask about, is there a way to get down to a binary of when do you solve the problem defined? Or when do you have to then solve a different problem?

And I guess maybe when you're reflecting on the experience, are there sort of questions or a framework for when you solve the defined problem? Or when you have to go out and actually redefine the problem that you should actually be solving.

Melissa Binde: First of all, why I love people management is because you never know if you're solving the right problem. I think that it is easy to have computers convince you that you always know, like debugging, right? It was working previously. It can work again. My job is to make it work. And I think that can lead us to think that design problems and people problems can be similarly solved and they can't.

Anytime that you're in a space that is not debugging, you may not have the right problem identified. And stepping back, and... this is where I think, for example, owning the previous solution helps. And being willing to admit that you may not have had the right solution. I didn't start Apollo going, "Yes! I'm going to go build this configuration management system."

Dug into it and was like, "Wait a minute. I don't think like, yeah, we can replace this with Bit Torrent but then all we're doing is moving bits faster. I don't think this is a bit moving problem."

So digging and looking at the characteristics what's failing today? Looking beyond the stuff in front of you, talking to people, actually engaging with the process.

How many engineers are writing code for problems that they have not themselves experienced?

Have you actually done the job that you were trying to write the tool for, I think is often missed. We're all very confident that we're these really analytical beings and we can conclude things. But the problem space is so big. So engaging with it and understanding it and identifying a problem that you can solve and then solving only that problem.

We could have been wrong. Right? And the good news is we had only spent like four months of development on it for you know, at most five or six people. That's a pretty cheap investment for something that ended up replacing all of Amazon software deployment in the space of... so we did 2001... less than seven years. Certainly it was ubiquitous before I left in 2006...

So I think it's being open and being curious, and this is why you know, sometimes I talk about big D diversity and little d diversity, and this is why our little d diversity comes in. People who think big, people who think small, people who come from different sorts of backgrounds, if you have everyone hired from the same college. Like if you're a startup in the Bay Area, you only have Stanford grads... you're probably missing a huge solution space just because people had different experiences and come at it with a different perspective.

You know, and also putting off any decisions you don't need to make. I think not boxing ourselves in, at the start was really key. And just saying, yeah we can kick that one down the road.

Patrick Gallagher: I'm going to remember Jedi, the acronym

forever...

Melissa Binde: Jedi was so powerful! The woman's name is Laura Cordon and she was just it was so, so game-changing to me to think about that. you know, again, I think as engineers, we have a tendency we want to be right.

I used to joke that not only was there no hill I would die on, but also few valleys and even some slopes I got into naming arguments constantly. And learning the power of going, "Yeah, we don't have to decide it's okay! Someone might be wrong on the internet, but let's, put it off for another day figure it out was very powerful.

Jerry Li: Your ability to see beyond the ask, but actually see the end users and their pain. And being able to see the full scope of the problem and start that. I think that helped to navigate to right solution. There's a lot of people they may start with their ask but and then just directly go from there. So they're missing a step.

Do you see that often? How do you combat or how do you help other people combat that?

Melissa Binde: Well, I think there's also an element of you know, one of the things I say at work is that "if I want a technical solution, I'll apply a software developer. If I want a process solution, I'll apply a project manager. And if I want an organizational solution, I'll apply a manager."

You don't always know what kind of solution you want upfront. When you look at Apollo, Apollo was not a technical solution. It was implemented in code. But it's almost a process solution, right? It's configuration management. It's changing the fundamental nouns and verbs we're using for how we distribute software from "copy and change a sim-link" to version, configure, roll back, roll forwards, opening up a whole bunch of different things.

Engaging your own brain in learning to find non-technical solutions to problems, I think first of all, makes you an incredibly powerful leader. It's why... I love hiring people who have had multiple different kinds of roles. Or who have double majors in interesting things. Or who have gone back and forth between tech lead and manager multiple times. Because that has trained your brain to look for solutions other than technical solutions.

BitTorrent was a technical solution. And the problem defined the solution, the solution defined the problem.

How to determine if you need a technical, process or organizational solution

Is there any way that determine the type of solution that you need? Like, are there certain questions that you ask that you introduce at the beginning of the process to help you seek out... is it a process solution? Technical solution? Or organizational solution? I'm just curious to know, like, if there is something that somebody could adopt to help them develop that thought process.

Melissa Binde: Sometimes I will force myself to come up with multiple different types of solution.

So right now at work we're trying to sort through a particularly thorny org question. And I was speaking with our, our fellow. And actually asked him can you write down for me what is the technical ideal world? What is the process and realistic ideal world? And what is the business ideal world?

And it's a very supposedly technical question who should own the protocol definition for a problem? But there's a whole bunch of different ways of looking at this and different sets of considerations. So laying those out.

When I'm coaching managers on learning how to design orgs, and how to do reorgs. I tell them "You should always produce at least three to four different solutions."

Because it's very easy... and look at all the different dimensions on what you can solve something.

Deliberately force yourself to... you know, we call it brainstorming and we picture people at a whiteboard throwing out dumb ideas. But I guess maybe it's structured brainstorming? It's deliberate looking for other dimensions in the problem.

I also play around with constraints. My manager gives me a hard time. He says, "Melissa, if you spent less time in the theoretical world, you'd solve things faster."

And I say, "I do I get practical when I need to?

He said, "Yes. But you can be assured by the time you're getting practical, it is time to get practical."

And I said, "Okay, then I'm not going to change. Because if I can sit in theoretical world and then identify my constraints. And then I can see which of those constraints I can change. Often we are being held into a thing by a constraint we don't realize we can change.

So people often design a reorg based on who they have in their team. And not realize, well maybe some of those folks can flex into different roles? Maybe some of them would love to do something different? Like there are constraints you're inherently imposing on yourself.

So when you're looking at a problem, trying to identify those and figure out which ones you can remove. With Apollo, one of my constraints was "My management told me how to solve it."

And I solved this by not telling them I was going to ignore them. So ask forgiveness rather than permission is a nice, constraint to remove.

Jerry Li: I think concept of identifying the constraints and try to remove some of the constraints... like doing the assumption testing I think that's a really practical tool for people to use in an ongoing basis

Do you have other examples other than Apollo that apply that exercise that helped to get a better result

Melissa Binde: Let me think on that as we take another question and come back to it.

Patrick Gallagher: I have another question to jump in Jerry.

Building enduring tools & technology

One of the things you had mentioned, Melissa, that really stood out to me was there's probably no code left on Amazon Apollo. And you mentioned that enduring tools are built based on observation.

So the big question that stems from that really is how do you build tools that solve an enduring problem and evolve with the company? And so I was wondering, if there were any, when you reflect on that experience or other projects, different elements that have helped tools endure these long timescales and to be able to evolve with the company and how it changes and at different scales and different problem sets.

Tell us more about that.

Melissa Binde: I think one element was that we didn't write more code than we needed to.

I think as engineers, we tend to want to build very general solutions and we know this is part of making code that's easy to refactor, rather than making code that is infinitely flexible.

When someone needed to change how things worked or change what it was doing. There was not a bunch of general purpose code they had to sort through. We knew we didn't know enough on day one. So we built for what we did know, we didn't try to build for what we didn't know.

The abstract concepts that we had that were there from day one, you know, the package, the collection of packages, things like that... Amazing. I forget what word we settled on for collection of packages, package group maybe?

Those were kind of inherent to the space. And so those were abstractions we could make, but we didn't try to ahead of time to decide whether one team could roll back another team's code. We didn't try to make general purpose, "could you delegate permissions?"

We kept as very, very sort of outcome oriented as we could, rather than trying to design generally.

And in terms of why it endured... man, I'm not entirely sure! Like maybe I shouldn't have quit Amazon? My best guess is that because it was able to shift organically rather than ever having to be a top to bottom rewrite; because it was at the right time for the services migration and help solve a very clear problem. Plus Amazon's culture of frugality and keeping everyone so busy they couldn't write competing things as easily. I mean, I've heard from plenty of folks. "Oh yeah, our team tried to rewrite Apollo!"

Or "Our team runs their own Apollo!"

Or so, I mean, there's certainly elements of that. But you know it, has endured.

How to become a better storyteller

Patrick Gallagher: Another question, Melissa, to go in a little bit of a different direction, because when I'm sitting and hearing the stories that you share... you have an incredible gift of storytelling. Especially for things that can be highly technical and there's a lot of detail around it.

So I think that the question becomes, how did you develop that gift of storytelling? And for engineering leaders who want to better develop that skill and advocate for the projects that they're working on for adoption...

What would your advice be for both how to acquire storytelling skills and to become a better engineering story-teller?

Melissa Binde: I will try to answer that. But as with many things, what I will be answering is from me personally. So it may not resonate with everybody.

One is I think recognizing the value at all. Engineers, but especially in Silicon Valley, tend to see code and engineering as the solution to every problem out there. And recognizing that the human element matters, storytelling matters, marketing matters, marketing... it's a four letter word, but it still matters. And not, dismissing those as ways of getting things done.

I sometimes get asked what publications I read to keep up to speed on my job. And the answer was I don't. I read fiction in my spare time.

I need to disconnect from work and I suppose that may also help sort of me keep in cadence with storytelling. Some of it may just be personality, right? Maybe, it's... if you're just not a storyteller, it's attaching to someone else. I suspect taking improv classes or doing something like that would be helpful.

Because then you're learning how to construct things, especially on the fly.

I do a lot of practicing. So this is why you know, Jerry's question of something else... I realized this framing that we're discussing here, isn't something that I've often framed. I don't have an answer to this because I haven't thought in this framing before. You can guarantee like a year from now, I'd be able to answer Jerry's question really quickly.

And I'll probably start saying it more. if... some of it is just practicing. You know, Apollo didn't spring from the ground. Our recruiting pitches they weren't perfect on day one.

When I was at Google I felt like I was terrible at writing presentations. And I was working with an exec coach and she said, "Okay, well, who writes good presentations?"

And I named a fellow down in Australia who I work with. She goes, "Okay, how long do you think it took him to write that?"

I'm like, "Okay, I'm going to sandbag here." And I said, "Two months!"

She was like, "Okay, why don't you go ask him?"

I asked him. Six years! And that was such a, 'in your face' realization that, we see the, end of everyone else's journeys. And if we get any stories, we hear the highlights and realizing that everything goes through way more evolutions. And we shouldn't give up on ourselves just because we're early on in that.

Apollo had a lot of places to fail, it succeeded. Was it my incredible brilliance? Was it perfect timing? Was it luck? You know, some combination of the three... and Rick Delzell not firing me when I lied to him about when it would launch.

Patrick Gallagher: As a student of storytelling, one thing I wanted to point out, as you've shared throughout this whole conversation, Melissa, is that almost every point or every insight that you've shared with us has been laced with like a really powerful personal story. And it's made it, so that, that point really resonates. That I've integrated it almost immediately.

And so, like one thing in my observation is your ability to link stories to those key moments or insights is really, really well done and artfully done.

So as an admirer student of storytelling, I just wanted to, share that with you, because I think it's great!

Melissa Binde: Awesome. I love it.

Other examples of determining the right type of solution

This is and this kind of spontaneous talking is what makes me happy. So... You know, in answer, I want to come back to Jerry's question about other things like Apollo...

You know, I think part of it is that I've, been a manager for so long that so much is organizational at this point.

But I think one thing recently from my current job is we've been trying to figure out how to make a high compliance environment work. And treating it as disjoint from our commercial cloud you know, or AWS running thing. And realizing that rather than treating those as two different problems, what if we set up a step wise progression?

What if we said, "Okay, we run pure dev ops here. When we talk aisle 6 we have to run pure admin ops.

But there's actually a middle step. Which is essentially private realms for customers. So what if we treated this as a continuum, rather than as a jump from A to B? And we said, "Okay, in the middle there, we have both devs and admin supporting it. And we can figure out the tooling as we go."

And one of the questions that came up was "Well, okay. If we're writing all this tooling, that's going to cost us money that we can't use to develope features. Is it worth the cost?"

And I said, "I would posit that that's an irrelevant question. Because it is a fundamental business decision of who we sell to. We will sell to them unless the cost is like I dunno 10 X our dev cost or something, which it isn't! So there's no value in figuring out what the cost is. We just decide, yes, this is the business model we're going to pursue."

Another example, although a little tangential is... people often worry about whether they're doing the right project. I would posit that all you have to do is make sure you're doing the top 25%. Something in the top 25% of the things you need to do.

And that every bit of energy you spend trying to figure out the most important, is probably irrelevant. You know, maybe it's one of the top 10, one of the top five... you know, it depends on how long your list is. But you know, as engineers, we care a lot about precision. And sometimes it's false precision.

So those are unrelated to Apollo, but two other examples of where, if you sort of dig into the dimensions and the question of what you're doing there may be other ways to look at the problem.

Jerry Li: Really minimize the number of decisions you've got to make and only make the ones that make sense at the moment.

Melissa Binde: And is it really worth it? I mean maybe sometimes it is right. Maybe you have time to do only exactly one thing and it's gotta be the right thing. Most of the time, you'll spend more energy trying to figure it out than you will ever gain back by picking the magical right one. And you might not even be right about what's the right one.

Lessons on Project Naming

The other thing I learned, I learned off the Apollo project was finally naming something that was not embarrassing to say in front of C levels. I also wrote "Old Fart" at Amazon which was just a script running in my home directory. And before that I had written the first monitoring for the fulfillment centers. Which, because I was reading Dante at the time and had just come out of college, I called "The Fiery Pits of Hell."

And there was a time about three or four years, when it was still the only monitoring that we had for our fulfillment centers. I was actually told not to build "The Fiery Pits of Hell" that we're going to do this whole big monitoring project.

I'm like, "but I'm getting paged and I want to know what's going on."

So I built this stupid little MRTG based system that just ran out of my home directory. And I had this fabulous call with a bunch of VPs and C-levels where they're like, "So this 'Fiery Pits of Hell' thing... could you explain it to us?"

Cause it was the only monitoring we had for order volumes that year going through the fulfillment centers. And one of them got very mad thinking I was calling the fulfillment centers 'Hell' and I had to point out no, no, no. It's the Circles of Hell of the software. There's one circle for each, piece of software.

Patrick Gallagher: I just really admire the flare and just the literary depth of the naming, the nomenclature of all the projects you've been working on. It's so cool.

Jerry, we need to be more creative.

Melissa Binde: Well, we also call it something LSD for lightweight software distribution, and then needed to get the DBA's to stop using it. And decided to call that Project Meth. So, um, I think a lot of it is just making sure you're entertaining yourself.

Rapid Fire Questions

Patrick Gallagher: Melissa, we have five rapid fire questions for you as we wrap up. So if you ready...

Melissa Binde: Yes.

Patrick Gallagher: Okay. Perfect.

Number one, what are you reading or listening to right now?

Melissa Binde: I have finally gotten around to listening to the Vorkosigan books by Lewis McMaster Bujold which I have been recommended to me for years. And I had never read so listening to.

Patrick Gallagher: What's the, quick plot line for

Melissa Binde: Oh, sci-fi!

Patrick Gallagher: Oh, I love scifi. I'm going to, I'm going to look it up. I just read the whole Three Body Problem trilogy. And a couple other ones. So...

Melissa Binde: Th there's like 14 of them. So it's great. If you'd like to go through series.

Patrick Gallagher: Yeah, there was a period of time where I think I read like 10 scifi books in the summer. Anyway, this is supposed to be rapid fire. So I'm sorry for interrupting. I get about sci-fi.

Number two. What tool or methodology has had a big impact on you?

Melissa Binde: Learning that methodologies are essentially religions. And lots of people think they will solve every problem and they won't. You need to understand what problem it's solving and whether you have that problem.

A lot of my teams use scrum. That's fine. I don't care. One of them is dying from trying to use scrum with a heavy ops team. I suggest they look at KanBan. Honestly, there's a lot of religion and most of the time it's not justified. Be flexible. You know, eventually you learn that there's not only one programming language. Methodologies are the same way. Different ones for different situation.

Patrick Gallagher: I'm going to print out that answer and put it above our board. That was an incredible way to question the methodologies when they're not serving you.

Number three, Melissa, what is a trend you're seeing or following that's really interesting to you right now, or hasn't quite hit the mainstream yet?

Melissa Binde: So I don't have a fabulous answer to that because as I said, I actually don't follow a lot of tech news cause I spend my evenings reading or with my family or making pens.

I will say that I think it's going to sound contrary... Cloud is overblown! And I say that not because there aren't a million Netflix stories out there. But I work at Splunk. I work in a heavily B2B company. I also worked at a cloud provider. As I sometimes say your cloud sales rep will tell you lift and shift, but lift and shift is bullshit.

And we are going to have on-prem private clouds, all sorts of other models for the indefinite future. So companies that can work across that and not just assume their customers are entirely in AWS or in a handful of commercial clouds I think will be more successful.

Patrick Gallagher: Wow... that seems like a big trend to call out. Thank you!

Melissa Binde: Maybe I'm wrong! But I worked at a cloud provider. I work at a company that does the Fortune 100 now. And I'll tell you, the commercial cloud is nice, but it is not one size fits all.

I mean, data's a black hole, right? Data sucks, everything towards it. And these companies have a lot of data. And especially if you talk, for example, manufacturing or anyone who has facilities that are not on the end of a massive pipe.

You know, so many things are in rural areas. And they need to get at that data. And assuming that that's just going to all run off AWS on some tiny little network pipe. That, that isn't sufficiently redundant. Like that's not going to happen. Kiosks are not easy to update the software in.

So yeah, I think we have a lot that folks out of Stanford assume is true about the world that just isn't in the rest of the world. And that's not even getting into developing countries which have a very different infrastructure problem.

Patrick Gallagher: Oh, man, we're going to have to have a future trends conversation, I think as a follow-up because those are some really, really awesome takes. Two quick more questions about quotes.

What's your favorite or most powerful question to ask or be asked?

Melissa Binde: I don't have an answer. I really love to talk. And I really love to think by talking. So honestly, my favorite is probably anything I haven't thought about before. Which is why this podcast has been great!

Because I love an opportunity to think out loud with smart people and to just talk around things and to figure out stuff together.

Patrick Gallagher: Is there a quote or mantra that you live by? Or a quote that's really resonating with you right now?

Melissa Binde: Yes. "Reality is that which when you stop believing in it, doesn't go away."

We often want our own beliefs and our own views of the world to trump what's actually happening. But, doesn't matter if you believe it or not. Reality is reality. And the sooner you come to terms with that, the better.

Patrick Gallagher: Wow.

Melissa Binde: Oh, that and "Don't be a good example. Be a horrible warning!"

Would probably be the other one.

Patrick Gallagher: That's great! That one's inspiring. I love that.

Melissa, thank you so much for spending the time with us and for an incredible conversation, with everything that we covered, you are an incredible storyteller.

So this was an absolute joy to be a part of it.

Melissa Binde: I'd love to do this again. This was great. Thank you. And this is, this is my happy place talking to smart people.

more to listen
Discover
EventsVideosSpeakers



Get social

Copyright © 2025 ELC. All Rights Reserved. / Privacy Policy / Code Of Conduct
ELC logo
Home for engineering leaders