by Eric Ries, published 2008
Before he got famous for the Lean Startup book, Eric had (has?) one of the most engaging and valuable blogs for small team leaders. (http://startuplessonslearned.com/)
His coverage of the Toyota five whys approach isn’t the first or only information on it, but it was the one that made it a popular and important tool for a lot of startup leaders I know.
I’ve used it personally when things go wrong, and I need to ensure that we not only fix the problem, but identify and improve our culture, training, decision making, responsibilities and rules at other levels of the operation, to ensure against future issues.
By following this framework, you’ll be regularly improving your team at many levels, iterating towards a more polished, mature, reliable operation consistently. And damnned if that doesn’t feel good and make you look good 🙂
Here’s how it works. Let’s say you notice that your website is down. Obviously, your first priority is to get it back up. But as soon as the crisis is past, you have the discipline to have a post-mortem in which you start asking why:
- why was the website down? The CPU utilization on all our front-end servers went to 100%
- why did the CPU usage spike? A new bit of code contained an infinite loop!
- why did that code get written? So-and-so made a mistake
- why did his mistake get checked in? He didn’t write a unit test for the feature
- why didn’t he write a unit test? He’s a new employee, and he was not properly trained in TDD
So far, this isn’t much different from the kind of analysis any competent operations team would conduct for a site outage. The next step is this: you have to commit to make a proportional investment in corrective action at every level of the analysis. So, in the example above, we’d have to take five corrective actions:
- bring the site back up
- remove the bad code
- help so-and-so understand why his code doesn’t work as written
- train so-and-so in the principles of TDD
- change the new engineer orientation to include TDD
Go read the full article, it goes into a lot of depth and example:
Also check out Buffer’s post on how they use it, including more detailed process and meeting info: