Optimizing Software Engineering Pipelines

Applying "The Goal" to the business of software

The Goal

I recently finished “The Goal” by Eliyahu Goldratt. For a business book, it’s extremely entertaining. “The Goal” is written like a novel, with some business nuggets awkwardly dropped in. If you can get past some of the corny setup questions so that a business definition can be explained, then the book is an excellent read. The audiobook is also very well done.

“The Goal” is about maximizing profits for a factory. The factory has raw materials and needs to use its internal pipeline to generate revenue. Along the way are lots of lessons about bottlenecks, inventory, and efficiency. While interesting, it doesn’t immediately translate to software engineering or startups in general. After all, startups aren’t always about maximizing profits. Oftentimes, startups are actually losing money, on purpose. Obviously, there isn’t really any equipment to manage, since software development is basically all humans converting caffeine into code. “The Goal” actually can teach us a lot, we just need to do some translation. For an established company, we can just keep “profit” as the main goal. For a startup, we might pick a different metric: active users. For the rest of our thought experiment, let us assume active users are our main goal, and what we are actually doing is converting capital investments into active users.

In “The Goal”, one of the first lessons is about dependencies in your pipeline. The book uses a story about boy scouts walking in a single-file line to explain The Accordion Effect. But I find traffic to be much more relatable:

What is actually happening in this shockwave traffic jam? One car has to slow down for whatever reason, and every subsequent car must slow down. The cars can only speed up as fast as the car in front of them. It’s very difficult to return to the evenly spaced out place where the cars started, because every car is limited to the speed of the car in front of them, and no cars are perfect at matching the acceleration of the car in front them. If you are the first car behind the car that slowed down, you are only slowed down a little, but the second car is slowed down by the slow car, and the first car behind it, increasing their slow downtime. the third car gets the compounded effects from the slow car, the first car, and the second car. Every car thereafter will see a compounding effect until there is a literal standstill.

So what are some solutions to the accordion effect? “The Goal” solves this problem by putting the slowest cars in front. At first, that seems counterintuitive. In traffic, putting the slowest cars in front would make everyone go slower. However, in traffic, every car has its own destination, and so every car needs to optimize for its own speed. But what about a software development pipeline? We are all on a team, and we aren’t done until the slowest vehicle gets to its destination. If we put the sports cars in front of the semi-trucks, the sports cars will get to their destination the fastest, for sure. But sports cars are not perfect, they will make mistakes that will inevitably cause slowdowns for the semi-trucks. The sports cars will recover quickly and zip off, but the semi-trucks at the back will be fighting the accordion effect. Remember, we are only done when every vehicle gets where it needs to go. We have created a system where the slowest vehicles, which have the worst acceleration and reaction times have to deal with the most shockwave effects. Making the accordion effect worse for every vehicle behind them. Instead, we should put the semi-trucks in front, and let the sports cars handle the accordion effect. Sports cars are much more suited for the job. In summary: By putting the semi-trucks in front, we minimize the amount of acceleration and breaking they have to do. By putting the fastest cars in the back, we allow them to do what they do best, tailgate the crap out of the car in front of them. Ultimately a sportscar will be the last to arrive, but the whole team will get there much faster.

Translation to software

First, we need to acknowledge some facts about software development:

  1. Once a feature is in production, even if most users don’t like it, it’s very hard to remove.

  2. Features do have some overhead in terms of maintenance and integration with the rest of the system. So, every feature you implement will slow the whole pipeline down over time.

  3. Not all features lead to growth or active users, but that is usually the best way to get more users.

If we imagine that the cars are actually people working on features, and the feature isn’t done until all the cars are at their destination! Sorry, I mean the feature isn’t done until everyone has done their work on the feature! Maybe there are some benefits we can get here. Let’s use an example pipeline in order to get a feature out, we have a typical pipeline. I know, some of this is parallelizable, but for this example let’s assume there are direct dependencies at each step.

  1. Product - research and develop the idea for a feature

  2. Design - digest the research into something tangible, that backend and frontend developers can make into reality.

  3. Backend Developers - Do whatever hidden API, database, etc… work needs to be done so that a frontend developer can make a UI for end-users.

  4. Frontend Developers - Take the designs, create a GUI, and hook the GUI up to the system the backend developers created.

  5. QA - Check that everything works well before it’s released to customers.

How can we apply what we learned about traffic? It doesn’t seem like we can re-arrange our pipeline at all. How can we put semi-trucks in the front of the line?

Optimizing the pipeline

There are several strategies we can use to optimize our pipeline, but let’s start with the most obvious, and probably most difficult. Transform your organization to match our traffic pattern:

  1. Product - Our semi-trucks, should be the slowest part of the process, if your product team is churning out ideas faster than the rest of the pipeline can keep up, they are moving too fast, or the rest of your pipeline is moving too slow.

  2. Design - Our buses, should be right on the heels of the product. But is still the second slowest part of your pipeline. You want just enough design to turn everything the product team comes out with into a design ASAP. Remember the accordion effect. If the design gets behind, everyone else will feel the compounded effect.

  3. Backend developers - our pick-up trucks, need to be faster than the design stage so that they can hopefully make up for any slowdowns caused by design.

  4. Frontend developers - our sports cars, should be starving for everything backend declares to be done. At this stage, it’s might actually be OK to have people sitting on their hands, working on bugs, or whatever. Toward the end of the line is where you want extra capacity in order to make up for inefficiencies caused by our semi-trucks, buses, and pickup trucks.

  5. QA - Greased Lightning. QA needs to be blazing fast. If a feature is ready to go, but QA doesn’t have time to check how well it works, we have done a terrible job. We have a feature that customers want, sitting on the shelf waiting for someone to get to it. Think of all the time invested by the whole pipeline. Remember, we want growth! The faster this gets out the door, the sooner we get growth. If QA is not greased lighting or is actually the semi-truck, we have got problems.

Let’s take a moment to examine the effects of a perfect pipeline like this. The stars align, and we have this great pipeline. Let’s examine a single feature go through the flow. We should also keep in mind that features are done in parallel, so as soon as a team is done with one feature, they can move on to another feature.

  1. Day 0: Given that features ultimately have a cost, and we have a perfect pipeline, we can afford to have product people spend a lot of time, making sure the features we introduce are the high impact, glorious features that are sure to increase our growth. Product spends 10 days on research.

  2. Day 10: Design is faster than Product, and had finished their last feature a few days ago. As soon as the product team finishes a feature, the design team pounces on the feature and creates a great design while the product starts its slow meticulous grind on the next feature. They are able to create a beautiful design in 5 days.

  3. Day 15: Backend is even faster than Design, and so as soon as the design team finishes with the feature, and the backend team is all but prying it from the design team’s hands. They had long since finished the last feature, and start in on this feature immediately. They finish in 3 days.

  4. Day 18: Backend finishes their work, and the frontend team was so hungry to start, they have been doing prep work, fixing bugs, and cleaning up their codebase, waiting for the backend team to finish. Because the frontend team had extra time, to prep and see what’s coming, they finish in record time, 2 days.

  5. Day 20: QA is mostly automated and lives up to its team name “greased lighting”. QA is done in less than a day.

  6. Day 20: Feature is released

If you can create a high-quality, high-impact feature, every 20 business days, I’d like to invest in your startup. In reality, that timeline is clearly very exaggerated, and the product should more likely be spending a few weeks, instead of 10 days. It’s good enough for an example though.

We should examine a single flow of a bad pipeline to contrast, where the product team is grease lightning, and QA is a semi-truck. Hopefully, I have already convinced you this is a bad idea, but many organizations do operate this way. Let me know if this feels familiar: @ericwooley, or leave a comment.

  1. Day 0: The product team doesn’t think about the cost of maintaining features, and tries more of spaghetti on the wall approach. Churning out lots of mediocre, so-so ideas. They spend 2 days deciding if a feature is a good idea or not.

  2. Day 2: Design can’t start on the product design for 2 days, because they are still working on the last design. Finally, Design takes the feature and slaps a design together pretty quickly. Design often doesn’t take enough time to really do a great job, because in a day or 2, the product team will have a new thing for the design team to do, and they need to keep up! The design is mediocre but passable. With the 2 days of delays, they finish this design after 6 days.

  3. Day 8: The design team finishes with the feature, and so far, the feature is only 2 days behind due to delays from the previous feature. The backend team is also working on the previous feature since the design team got the design to the backend team late, and in this pipeline, the design team is faster than the backend team. With the delay of 4 days because of the previous feature and an additional 6 days to work on this feature, the backend team finishes in 10 days.

  4. Day 18: The frontend team is overwhelmed with features. Not only have the frontend team not finished the last ticket from the backend, but they are also just finishing the feature before that. It takes them 6 days to catch up with the feature before, and then they can start on this feature. Which takes an additional 6 days. It is 12 days total that the feature is stuck at this stage.

  5. Day 30: QA, the semi-truck of our pipeline, has been backed up for a long time now. This product has tons of mediocre features, that lead to a lot of quality issues. They are behind, QAing the two features that frontend delivered late, and ultimately it takes about 2 weeks before this latest feature is cleared for release.

  6. Day 44: The feature is released

Ooof, literally double the time to get a crappy feature out. Yes, I know, clearly a contrived example, but the effect is clear. You want your slowest teams at the front of the queue. The second pipeline will lead to technical debt and burnout. This is what people are talking about when they refer to a “Feature Factory”. Endless features to implement, never enough time to clean up or breathe. Ultimately, your work will stop feeling like it’s high impact and start feeling pointless, because it probably is a waste of time. The goal was to throw spaghetti at the wall right? Mission accomplished.

You probably can’t create the perfect pipeline

Unfortunately, humans are messy, imperfect, variable, and unpredictable. Creating that perfect pipeline is hard. Maybe someone quit, someone is sick, there is a shortage of talent, or upper management just philosophically disagrees. Either way, your pipeline is wack, and you can’t just rearrange it. Well, at least we can mitigate the damage. The first thing you need to do is model out your pipeline. The columns in your ticket board are probably a good start. You need a clear picture of what’s happening before you can optimize. Now you can identify the order of your cars. Find your semi-trucks. Now you can start mitigating the damage:

  1. You can’t move the semi-trucks to the front of the line for whatever reason, but you can optimize the cars around them. Make sure that those semi-trucks get slowed down as little as possible. In our example pipeline, perhaps we identify backend development as our semi-truck. What we can do is make sure that nothing slows them down. Design should always have something ready for backend development.

  2. Make sure backend development isn’t working on unnecessary things. Maybe your backend team is also creating e2e tests, analytics, or any other tangential thing. Move that work to a team with more capacity. Perhaps you can have the frontend look at the designs for any issues before it gets to the backend, to make sure the backend doesn’t work on anything that might change.

  3. Be on the lookout for ways to supe up your semi-truck. Maybe hire more people if that makes sense. Maybe you can hire juniors, or frontend developers that your current team can get to do some of the easier work. Try to automate some of their work away. Remove red tape, etc… If you try, maybe you can convert your semi-truck into Optimus Prime. Think outside the box!

Declaring Bankruptcy

Maybe you were having a good day, started reading this, and realized you have been destroying your velocity for a long time and your team moves at a snail’s pace. Your pipeline is backward and thinking about how much work it will be to reverse the damage is like thinking about how much work it would take to move a mountain.

Just declare bankruptcy. Start over. Back to square 1. I know, I know, never rewrite blah blah blah. There are times when you should rewrite. And this might be one of them. This could be its own blog post, but DHH already did a much better talk on this subject than I ever could. I’d highly recommend watching it before you decide that rewriting is a bad idea, based on an old Jedi philosophy, or anecdote about Netscape.

If you do rewrite, development speed should be your number 1 goal. It’s much easier to make fast development teams if the code base is built for speed, and QA is automated from the beginning.

Bonus Lessons from “The Goal”

Throughout the story, the protagonist is having all kinds of issues with his wife. He is obsessing about work and starts to lose his family. At first, I thought it was corny, but by the end of the book, I really appreciated the author for adding that storyline. It is important to look at the big picture, not of your business, but of your life.

On your deathbed, will you look back on your obsessions with efficiency and your business as time well spent? I think I’ll look back on all the great times I had and wish I had more. Doing well in business is just a means to an end to having a better life. Don’t trade in your family or your happiness for an endless amount of work, which may not even contribute to your well-being.