Pawel Brodzinski on Leadership in Technology

Whatever it takes to lead a team, build a product, and run a business

Tag: flow

Limit Work in Progress Early

We’ve just started another project. One of things we’ve set up at the very beginning was a Kanban board. It wouldn’t be a real Kanban board if we haven’t had work in progress limits. There are two common approaches I see out there in terms of setting WIP limits.

One is to work for some time without WIP limits just to see how things go, get some metrics and get a decent idea what work in progress limits are suitable in the context.

The other approach is to start with pretty much anything that makes sense, like one task per person plus one additional to have some flexibility.

Personally, I like the latter approach. First, no matter which approach you choose your WIP limits are bound to change. There’s no freaking chance that you get all the limits right in the first approach. And even if you did the situation would change and so should the WIP limits.

Second, you can get much value thanks to limiting work in progress early even if the limits are set in a quick and dirty mode. In fact, this is exactly why I’m biased toward this approach.

In our case one last stage before “done” has been approval. We’ve had an internal client (me), thus we haven’t expected any issues with this part. We weren’t far into the project when first tasks started stacking up as ready to approval.

As I initiated the discussion about us hitting the limit in testing (the stage just before approval) I was quickly rebutted “maybe you, dear client, do your goddamn job and approve the work that is already done, so we can continue with our stuff.” Ouch. That hurt. Yet I couldn’t ask for a better reaction.

Lucky me, I asked about staging environment where I could verify all the stuff that we’ve built. And you know what? There was none. Somehow we just forgot about that. So I eagerly attached blockers to all the tasks that were waiting for me and the team could focus on setting up the staging environment instead of building more stuff.

An interesting twist in this story is that setting up the staging has proven to be way trickier than we thought and we found out a couple of flaws in the way we managed our demo servers. The improvements we’ve done in our infrastructure go way beyond the scope of the project.

There are two lessons here. One is that implementing WIP limits is a great knowledge discovery tool that works even if the limits aren’t yet right. Well-tuned WIP limits are awesome as they enable slack time generation, but even if you aren’t there yet, any reasonable limits should help you to discover any major problems with your process.

There’s another thing too. It’s about task sizing. In general the smaller the tasks are the more liquid the flow is and higher liquidity means that you discover problems faster. If it took a couple of weeks to complete first tasks we’d find out problem only after that time. With small tasks it took a couple of days.

So if you’re considering whether to start with limiting work in progress on a day 1 of a new project, consider this post as an encouragement to do so. Also you may want to size first few tasks small and make sure that they go throughout the whole process so you quickly have a test run. Treat it as a way of testing completeness of the process.

August 21, 2013
(Sub)Optimizing Cycle Time

There is one thing we take almost for granted whenever analyzing how the work is done. It is Little’s Law. It says that:

Average Cycle Time = Work in Progress / Throughput

This simple formula tells us a lot about ways of optimizing work. And yes, there are a few approaches to achieve this. Obviously, there is more than the standard way, used so commonly, which is attacking throughput.

A funny thing is that, even though it is a perfectly viable strategy to optimize work, the approach to improve throughput is often very, very naive and boils down just to throwing more people into a project. Most of the time it is plain stupid as we know from Brook’s Law that:

Adding manpower to a late software project makes it later.

By the way, reading Mythical Man-Month (the title essay) should be a prerequisite to get any project management-related job. Seriously.

Anyway, these days, when we aim to optimize work, we often focus either on limiting WIP or reducing average cycle time. They both have a positive impact on the team’s results. Especially cycle time often looks appealing. After all, the faster we deliver the better, right?

Um, not always.

It all depends on how the work is done. One realization I had when I was cooking for the whole company was that I was consciously hurting my cycle time to deliver pizzas faster. Let me explain. The interesting part of the baking process looked like this:

Considering that I’ve had enough ready-to-bake pizzas the first setp was putting a pizza into the oven, then it was baked, then I was pulling it out from the oven and serving. Considering that it was almost a standardized process we can assume standard times needed for each stage: half a minute for stuffing the oven with a pizza, 10 minutes of baking and a minute to serve the pizza.

I was the only cook, but I wasn’t actively involved in the baking step, which is exactly what makes this case interesting. At the same time the oven was a bottleneck. What I ended up doing was protecting the bottleneck, meaning that I was trying to keep a pizza in the oven at all times.

My flow looked like this: putting a pizza into the oven, waiting till it’s ready, taking it out, putting another pizza into the oven and only then serving the one which was baked. Basically the decision-making point was when a pizza was baked.

One interesting thing is that a decision not to serve a pizza instantly after it was taken out of the oven also meant increasing work in progress. I pulled another pizza before making the first one done. One could say that I was another bottleneck as my activities were split between protecting the original bottleneck (the oven) and improving cycle time (serving a pizza). Anyway, that’s another story to share.

Now, let’s look at cycle times:

What we see on this picture is how many minutes elapsed since the whole thing started. You can see that each pizza was served a minute and a half after it was pulled out from the oven even though the serving part was only a minute long. It was because I was dealing with another pizza in the meantime. Average cycle time was 12 minutes.

Now, what would happen if I tried to optimize cycle time and WIP? Obviously, I would serve pizza first and only then deal with another one.

Again, the decision-making point is the same, only this time the decision is different. One thing we see already is that I can keep a lower WIP, as I get rid of the first pizza before pulling another one in. Would it be better? In fact, cycle times improve.

This time, average cycle time is 11.5 minutes. Not a surprise since I got rid of a delay connected to dealing with the other pizza. So basically I improved WIP and average cycle time. Would it be better this way?

No, not at all.

In this very situation I’ve had a queue of people waiting to be fed. In other words the metric which was more interesting for me was lead time, not cycle time. I wanted to optimize people waiting time, so the time spent from order to delivery (lead time) and not simply processing time (cycle time). Let’s have one more look at the numbers. This time with lead time added.

This is the scenario with protecting the bottleneck and worse cycle times.

And this is one with optimized cycle times and lower WIP.

In both cases lead time is counted as time elapsed from first second, so naturally with each consecutive pizza lead times are worse over time. Anyway, in the first case after four pizzas we have better average lead time (27.75 versus 28.75 minutes). This also means that I was able to deliver all these pizzas 2.5 minutes faster, so throughput of the system was also better. All that with worse cycle times and bigger WIP.

An interesting observation is that average lead time wasn’t better from the very beginning. It became so only after the third pizza was delivered.

When you think about it, it is obvious. Protecting a bottleneck does make sense when you operate in continuous manner.

Anyway, am I trying to convince you that the whole thing with optimizing cycle times and reducing WIP is complete bollocks and you shouldn’t give a damn? No, I couldn’t be further from this. My point simply is that understanding how the work is done is crucial before you start messing with the process.

As a rule of thumb, you can say that lower WIP and shorter cycle times are better, but only because so many companies have so ridiculous amounts of WIP and such insanely long cycle times that it’s safe advice in the vast majority of cases.

If you are, however, in the business of making your team working efficiently, you had better start with understanding how the work is being done, as a single bottleneck can change the whole picture.

One thought I had when writing this post was whether it translates to software projects at all. But then, I’ve recalled a number of teams that should think about exactly the same scenario. There are those which have the very same people dealing with analysis (prior to development) and testing (after development) or any other similar scenario. There are those that have a Jack-of-all-trades on board and always ask what the best thing to put his hands on is. There also are teams that are using external people part-time to cover for areas they don’t specialize in both upstream and downstream. Finally, there are functional teams juggling with many endeavors, trying to figure out which task is the most important to deal with at any given moment.

So as long as I keep my stance on Kanban principles I urge you not to take any advice as a universal truth. Understand why it works and where it works and why it is (or it is not) applicable in your case.

Because, after all, shorter cycle times and lower WIP limits are better. Except then they’re not.

December 10, 2012
Kitchen Kanban, or WIP Limits, Pull, Slack and Bottlenecks Explained

Have you ever cooked for twenty people? If you have you know how different the process is when compared to preparing a dinner just for you and your spouse. A few days ago I was preparing lunch for folks in my company and I’m still amazed how naturally we use concepts of pull, WIP limits, bottlenecks and slack when we are in situations like this.

I can’t help but wonder: why the hell can’t we use the same approach when dealing with our professional work?

OK, so here I am, cooking 15 pizzas for a small crowd.

Bottlenecks

If you read Eli Goldratt’s The Goal you know that if you want to make the whole flow efficient you need to identify and protect bottlenecks. Having some experience with preparing pizzas for a few people, I easily guessed that the bottleneck would be an oven.

The more interesting part is how, knowing what is the bottleneck, we automatically start protecting it. The very next thing I was doing after taking a pizza out from the oven was putting another one in. If I decided to serve the pizza first I would be making my bottleneck resource (the oven) idle, which would affect the whole process and its length.

Interestingly enough, protecting the bottleneck in this case resulted in longer cycle time and, with the first delivery, worse lead time too. That’s the subject for another story though.

The lesson here is about dealing with bottlenecked parts of our processes. One of the recent conversations I’ve had was about bringing more developers into a project where business analysis was a bottleneck. It would be like hiring waiters to help me serving pizzas and expecting it would make the whole process faster.

It’s even worse if you don’t know what your bottleneck is. In the story with business analysis I’ve mentioned the team learned where the problem is only after some time into the project. Before that they would actually be willing to hire more waiters and would expect that would improve the situation.

WIP Limits

Fifteen pizzas and one cook. If I acted as an average software development team I would prepare dough for all pizzas, then move to a tomato sauce, then to other ingredients. Three hours later I would realize that, because of a system constraint, I can’t bake in batch. I would switch my efforts to deal with a hungry and angry crowd focusing more on dealing with their dissatisfaction than on delivering value. Fortunately, eventually I would run a retrospective where I would learn that I made a mistake with the baking part so I would file a retro summary into a place no one ever looks again and congratulate myself a heroic effort of dealing with hungry clients.

Instead I limited amount of work invested into preparing dough and ingredients. I prepared enough to keep the oven running all the time.

Well, actually I prepared more. I started with WIP limit of 6 pizzas, meaning that I had 6 ready-to-bake pizzas when the oven was ready. Very soon I realized one obvious and two more, less obvious, issues with such a big WIP limit.

First, 6 pizzas take up a lot of space. Space which was limited. Even more so, when more people popped up in the kitchen waiting for their share. This is basically a cost of inventory. Unfortunately, in the software industry we deal with code so we don’t really see how it stacks up and take up space, until it’s too late and fixing a bug becomes a dreadful task no one is willing to undertake.

If only we had to find a place to store tiny physical zeros and ones for each bit of our code… The industry would rock and roll.

The other two issues weren’t that painful. If you keep unbaked pizza too long it’s not that good as it’s a bit too dry after baking. I also realized that I could easily manage to prepare new pizzas at a pace that doesn’t require such a big queue in front of the oven. I could prepare better (fresher) product and it still wouldn’t affect the flow.

So I quickly reduced my queue of pizzas in front of the oven to 4, 3 and eventually 2. Sure, it changed how I worked, but it also made me more flexible. I didn’t need so much space and could react to special requests pretty flexibly.

Surprisingly enough, WIP limits in a kitchen seem so intuitive. It’s often more convenient to work in small batches. Such an approach helps to focus on the bottleneck. If you’re dealing with physical inventory you also virtually see the cost of excessive inventory. Unfortunately, with code it’s not that visible even though it’s a liability too.

It doesn’t mean that the whole mechanism changes dramatically. Much unfinished work increases multitasking, inflicts a cost of context switching, lengthens feedback loops. It just isn’t that visible, which is why we don’t naturally limit work in progress.

Slack Time

When we are talking about WIP limits we can’t forget about slack time. Technically I could prepare an infinite queue of ready-to-bake pizzas in front of the oven. Of course no mentally healthy cook would do this.

Anyway, when I started limiting my pizzas in progress I was facing moments when, in theory, I could have been preparing another one but didn’t. I didn’t, even when there was nothing else to be done at the moment.

A canonical example of slack time.

I used my slack time to chat with people, so I was happier (and we know that happy people do a better job). I got myself a coffee so I improved my energy level. I also used slack to rearrange the process a bit so my work was about to become more efficient. Finally, slack time was an occasion to check remaining ingredients to learn what pizzas I can still deliver.

In short I was doing anything but pushing more work to the system. It wouldn’t help anyways as I was bottlenecked by the oven and knew my pace well enough to come up with reasonable, yet safe, WIP limits which were telling me when I should start preparing the next pizza.

There are two lessons for us here. First, learn how the work is being done in your case. This knowledge is a prerequisite to do reasonable work with setting WIP limits. And yes, the best way to learn what WIP limits are reasonable in a specific case is experimenting to see what works and what allows to keep the pace of the flow.

Second, slack time doesn’t mean idle time. Most of the time it is used to do meaningful stuff, very often system improvements that result in better efficiency. When all people hear from my argument for slack time is “sometimes it’s better to sip coffee than to write code” I don’t know whether I should laugh or cry. It seems they don’t even try to understand, let alone measure, the work their teams do.

Pull

And, finally, pull principle. As we already know the critical part of the whole process was the oven, let’s start there. Whenever I took out a pizza from the oven it was a signal to pull another pizza into the oven. Doing this I was freeing one space in my queue of pizzas in front of the oven. It was a pull signal to prepare another one. To do this I pulled dough, tomato sauce and the rest of ingredients. When I ran out of any of these I pulled more of them from fridge.

Pull all over the place. Getting everything on demand. I was chopping vegetables or opening the next pack of salami only when I needed them. There were almost no leftovers.

Assuming that I could replenish almost every ingredient during the time a pizza was being baked, I was safe. I could even base on an assumption that it’s close to impossible that I run out of all the ingredients at the same time. And even then I had a buffer of ready-to-bake pizzas.

The only exception was dough as preparing dough took more time. Dough was my epic story. This part of the work was common for a bunch of pizzas all derived from the same batch of dough. Same like stories derived from an epic. In this case I was just monitoring the inventory of dough so I could start preparing the next batch soon enough. Again, there was a pull signal but it was a bit different: there are only two pieces of dough left; I should start preparing another batch so it would be ready once I run out of the current one.

The lesson about pull is that we should think about the work we do “from right to left.” We should start with work items that are closest to being done and consider how we can get them closer to completion. Then, going from there, we’ll be able to pull work throughout the whole process as with each pull we’ll be creating free space upstream.

Once we deploy something we create free space so we can pull something to acceptance testing. As a result we free some space in testing and pull features that are developed, which makes it possible to pull more work from a backlog, etc.

When using this approach along with WIP limits we don’t introduce extensive amount of work to the system and we keep our flow efficient.

Once we learn that earlier stages of work may require more time than later ones we may adjust pull signals and WIP limits accordingly so we keep the pace of the flow.

Summary

I hope the story makes it easier to understand basic concepts introduced by Kanban. Actually, I’d say that if software was physical people would understand concepts of flow, WIP limits, pull or protecting bottlenecks way easier. They would see how their undelivered code clutter their workspace, impact their pace and affects their flow of work.

So how about this: ask yourself following questions.
Where is the oven in your team?
Who is working on this part of the process?
Do you protect them?
How many ready-to-bake pizzas do you have typically?
How many of these do you really need?
What do you do when you can’t put another pizza into the oven?
What kind of space do your pizzas occupy?
Do your pizzas taste the same, no matter how long they are queued?
Do you need all the ingredients prepared up front?
How much of ingredients do you typically have prepared?
How do you know whether you need dough and when you should start preparing it?

Look at your work from this perspective and tell me whether it helps you to understand your work better. It definitely does in my case, so do expect further pizza stories in the future.

November 15, 2012