by David Kirkpatrick @ FirstRound, interviewing Dave Zwieback
Dave Zwieback is the head of engineering at Next Big Sound, and has published Beyond Blame: Learning from Failure and Success. He was interviewed by FirstRound, their blog is always high value reading.
I’ve pulled out what I see as the most important parts below, he goes into much more detail, and I’ve highlighted where I lost interest in this in favour of more straightforward, less meeting- and time-heavy approaches.
Also check out John Allspaw, who gets credit for coining the ‘Blameless Post Mortem’ used here: https://codeascraft.com/2012/05/22/blameless-postmortems/
Identifying blame is the least-valuable part of any post-mortem, and is a failure to understand the situation, or to improve following it:
“Say there’s an incident and five minutes into the postmortem, we find out what happened and who’s responsible: Bobby and Susan screwed up. That feels good because there’s an unambiguous explanation: the so-called ‘root cause’. In this case, we’ve found our ‘bad apples,’ and can deal with them punitively so that such failures will never happen again. …
The truth is that the most critical learning has been left on the table because we’ve overlooked the deeper context of the incident.
We need deeper learning, and he lays out his approach to this (there’s another one I posted today in The Five Whys), but before moving forward you need to create safety in your culture. People won’t share the details if there’s a risk to themselves. You need to allow accountability, without blame or punishment. You won’t get honesty without safety.
Choose reconciliation over retribution when something goes wrong. You’re less likely to lose people and lessons.
Zwieback lays out a three-step framework to replace post-mortems with “learning reviews”.
Set the context
Learning reviews can be conducted after each experiment or iteration and are designed to facilitate learning from both failures and successes. “If we only wait for death and destruction — as the macabre ‘postmortem’ implies — we are grossly limiting our opportunities to learn. Failures just don’t happen frequently enough to learn at the rate that’s needed to really thrive in technology,”
They have tenets for the context, which I love, and I want to read some of the referenced material in the article. I’ve selected a few of the best tenets here:
Failure is a normal part of the functioning of complex systems. All systems fail—it’s just a matter of time. (See How Complex Systems Fail by Richard I. Cook, MD.)
Human error is a symptom—never the cause—of trouble deeper within the system (e.g., the organization). We accept that no person wants to do a bad job, and we reject the “few bad apples” theory. We seek to understand why it made sense for people to do what they did, given the information they had at the time. (From The Field Guide to Understanding Human Error by Sidney Dekker)
While conducting the learning review, we will fall under the influence of cognitive biases. The most common ones are hindsight, outcome, and availability biases; and fundamental attribution error. We may not notice that we’re under the influence, so we request help from participants in becoming aware of biases during the review. (Read Thinking, Fast and Slow by Daniel Kahneman)
This is where I lose interest…
They construct a timeline, often by compiling different points of view:
A good timeline shows not just what happened, but serves as the backbone of the conversation — a reference point — to keep the review on track.
My concern is that, at least as described in this interview, it seems like a heavy approach, and time consuming compared to the Five Whys, which should arrive at similar answers.
Could be I’m splitting hairs, and both take meeting time, but if they say the approach should be used frequently, and on more iterations and successes than just big failures, then it needs to be kept fast and effective, which they’re getting away from here.
Closing the loop
This is straightforward, and mirrors Five Whys.
1. Determine and prioritize remediation items.
2. Publish the review writeup as broadly as possible. ( <— I’d use our internal blog for this )