Tag: measures

  • Hyperproductivity Myth

    If you are in broadly understood context of Agile you eventually has had to hear about being hyperproductive. Sources reporting a few hundred, or even as much as more than a thousand, percent productivity improvement aren’t unusual. In fact, 200% improvement seems to be “guaranteed” by some.

    That’s great! Good for them! They’re going to get hyperproductivity badge or something. Yay!

    What Hyperproductivity Is

    Let’s start with the basics though. What is this whole thing? When a team becomes hyperproductive? How much do they have to improve? Oh, and by the way, if a super-crappy team improves three times and guys that were already great only by 20% would that mean that hyperproductivity was reached by the former, the latter, or both?

    The most common metric I hear about in the context of hyperproductivity is velocity. Actually, I consider using velocity to measure productivity evil or dumb. How much should my team improve? By the factor of three? Nothing easier. Oh, and by the way, we don’t use our estimation poker card worth 1 that often anymore.

    Note: I don’t deny teams improve. I merely point that stating so purely on a basis of velocity improvement is naive at best. There are so many potential dysfunctions of such approach that I don’t even know where to start. How the scope of work is split into individual tasks? What is a distribution of point estimates? How has it changed over time? What do we understand as a task in the first place? How do we account for rework?

    In other words without understanding a specific context mentioning hyperproductivity is meaningless. Just a marketing fad, which it might have been in the first place.

    Efficiency As a Goal

    Even if we agreed on a reasonable proxy for measuring productivity there’s a bigger problem ahead. We are not in a business of writing most code, delivering most features or achieving best velocity. If you think you are, go talk to your clients, but this time try to actually listen to them.

    If you spend about 5 minutes looking for sources pointing how notorious software industry is in not building the right stuff you may change your mind. Is a half of the stuff we build utterly useless? How about two third? Oh, and by the way the rise of the methods that are literally aimed to avoid building things unless we know we’re going to need them tells a story as well.

    So yeah, focus purely on productivity and you’re going to achieve your goal:

    Processing waste more effectively is cheaper, neater, faster waste.

    Stephen Parry

    The most painful problem of software industry is not efficiency. If it was, we’d already be in haven, given how much easier it is to build a software app these days than it was a couple decades ago. The problem is we are building wrong stuff.

    We may as well be efficient, but unless we are effective in the first place, i.e. doing the right thing, there’s no glory waiting for us.

    How We Create Value

    This brings me to the utter failure of pursuing hyperproductivity. Let’s (safely) assume that our goal is to deliver value to our clients. We do that by building stuff. Except the value almost never is clearly defined. In almost every case software development is a knowledge discovery process.

    This has some serious consequences. If we go by this assumption we may take all the functional specifications with a tongue in a cheek. It’s just a sketch of a map and most of the time not even an accurate sketch. This also means that amount of artifacts, like code, features, etc., we produce is not nearly as important as figuring out where exactly the value is, which bits and pieces we should build and which should be ignored.

    This happens when we discuss the features, look for solutions, research options, prototype, A/B test, change stuff back and forth to see what works. This happens when we don’t score on velocity or any other productivity metric.

    But wait, to become hyperproductive we should rather avoid that…

    That’s exactly why I don’t give a damn about hyperproductivity.

    I use to say software development is a happiness industry. We thrive only as long as our clients are continuously happy. We don’t make them happy by delivering more stuff. We make them happy by delivering stuff that has value for them and their customers.

  • Maturity of Kanban Implementation and Kanban Kata

    One of interesting bit of work that is happening in Lean Kanban community is Hakan Forss’ idea of Kanban Kata. Kanban Kata is an attempt to translate ideas of Toyota Kata to Kanban land.

    A simplified teaser of Kanban Kata is that we set a general goal, a kind of perfect situation we unlikely ever reach. Then we set short term, well-defined, achievable step that brings us closer toward the goal. Finally, we deliberately work to make the step, verify how it went and decide on another step. Learn more about Kanan Kata from Hakan’s blog.

    Honestly, I was a bit skeptical about the approach. One thing that seemed very artificial for me was the advice how we should define short term steps that lead us toward the ultimate goal. “Improve lead time by 10% in a month.” What kind of goal is that? Why 10%? Why in a month? How should we feel if we manage to improve it only by 8%? Should we cease to continue improvements when reaching the goal after a week?

    I know that these questions assume treating the goal literally and not very much common sense but you get what you measure. If you set such measurements, expect that people would behave in a specific way.

    I think the missing bit for me was applying some sort of relativity to Kanban Kata. Something that would address my aversion to orthodoxy. Something that would make the application context broader. I found the missing link in David Anderson’s keynote at London Lean Kanban Day.

    Interestingly enough, the missing link is my own work on maturity of Kanban implementations. Yes, it seems I need David to point me usefulness of stuff that I did.

    The context of my work on depth of Kanban implementation is that instead of trying to use sort of a general benchmark I simply used “where we would like to be” as a reference point to judge where we are right now. In short: I’m not going to try to compare any of my teams to, e.g. David Anderson’s team at Corbis. Instead I want any team to understand where their own gaps are and work toward closing them.

    Such an approach perfectly suits setting the goal of Kanban Kata, doesn’t it?

    I mean, instead of having this artificial measure of improvements we have internally set end state which is resultant of opinions of all the team members. On one hand this approach let us avoid absolute assessments, which rarely, if ever, help as they ignore the context. On the other it helps to set meaningful goals for Kanban Kata-like improvements.

    Relativity requires a team to understand the method they are trying to apply, but I would argue that if the team doesn’t understand their tools they’re doomed anyway.

  • (Sub)Optimizing Cycle Time

    There is one thing we take almost for granted whenever analyzing how the work is done. It is Little’s Law. It says that:

    Average Cycle Time = Work in Progress / Throughput

    This simple formula tells us a lot about ways of optimizing work. And yes, there are a few approaches to achieve this. Obviously, there is more than the standard way, used so commonly, which is attacking throughput.

    A funny thing is that, even though it is a perfectly viable strategy to optimize work, the approach to improve throughput is often very, very naive and boils down just to throwing more people into a project. Most of the time it is plain stupid as we know from Brook’s Law that:

    Adding manpower to a late software project makes it later.

    By the way, reading Mythical Man-Month (the title essay) should be a prerequisite to get any project management-related job. Seriously.

    Anyway, these days, when we aim to optimize work, we often focus either on limiting WIP or reducing average cycle time. They both have a positive impact on the team’s results. Especially cycle time often looks appealing. After all, the faster we deliver the better, right?

    Um, not always.

    It all depends on how the work is done. One realization I had when I was cooking for the whole company was that I was consciously hurting my cycle time to deliver pizzas faster. Let me explain. The interesting part of the baking process looked like this:

    Considering that I’ve had enough ready-to-bake pizzas the first setp was putting a pizza into the oven, then it was baked, then I was pulling it out from the oven and serving. Considering that it was almost a standardized process we can assume standard times needed for each stage: half a minute for stuffing the oven with a pizza, 10 minutes of baking and a minute to serve the pizza.

    I was the only cook, but I wasn’t actively involved in the baking step, which is exactly what makes this case interesting. At the same time the oven was a bottleneck. What I ended up doing was protecting the bottleneck, meaning that I was trying to keep a pizza in the oven at all times.

    My flow looked like this: putting a pizza into the oven, waiting till it’s ready, taking it out, putting another pizza into the oven and only then serving the one which was baked. Basically the decision-making point was when a pizza was baked.

    One interesting thing is that a decision not to serve a pizza instantly after it was taken out of the oven also meant increasing work in progress. I pulled another pizza before making the first one done. One could say that I was another bottleneck as my activities were split between protecting the original bottleneck (the oven) and improving cycle time (serving a pizza). Anyway, that’s another story to share.

    Now, let’s look at cycle times:

    What we see on this picture is how many minutes elapsed since the whole thing started. You can see that each pizza was served a minute and a half after it was pulled out from the oven even though the serving part was only a minute long. It was because I was dealing with another pizza in the meantime. Average cycle time was 12 minutes.

    Now, what would happen if I tried to optimize cycle time and WIP? Obviously, I would serve pizza first and only then deal with another one.

    Again, the decision-making point is the same, only this time the decision is different. One thing we see already is that I can keep a lower WIP, as I get rid of the first pizza before pulling another one in. Would it be better? In fact, cycle times improve.

    This time, average cycle time is 11.5 minutes. Not a surprise since I got rid of a delay connected to dealing with the other pizza. So basically I improved WIP and average cycle time. Would it be better this way?

    No, not at all.

    In this very situation I’ve had a queue of people waiting to be fed. In other words the metric which was more interesting for me was lead time, not cycle time. I wanted to optimize people waiting time, so the time spent from order to delivery (lead time) and not simply processing time (cycle time). Let’s have one more look at the numbers. This time with lead time added.

    This is the scenario with protecting the bottleneck and worse cycle times.

    And this is one with optimized cycle times and lower WIP.

    In both cases lead time is counted as time elapsed from first second, so naturally with each consecutive pizza lead times are worse over time. Anyway, in the first case after four pizzas we have better average lead time (27.75 versus 28.75 minutes). This also means that I was able to deliver all these pizzas 2.5 minutes faster, so throughput of the system was also better. All that with worse cycle times and bigger WIP.

    An interesting observation is that average lead time wasn’t better from the very beginning. It became so only after the third pizza was delivered.

    When you think about it, it is obvious. Protecting a bottleneck does make sense when you operate in continuous manner.

    Anyway, am I trying to convince you that the whole thing with optimizing cycle times and reducing WIP is complete bollocks and you shouldn’t give a damn? No, I couldn’t be further from this. My point simply is that understanding how the work is done is crucial before you start messing with the process.

    As a rule of thumb, you can say that lower WIP and shorter cycle times are better, but only because so many companies have so ridiculous amounts of WIP and such insanely long cycle times that it’s safe advice in the vast majority of cases.

    If you are, however, in the business of making your team working efficiently, you had better start with understanding how the work is being done, as a single bottleneck can change the whole picture.

    One thought I had when writing this post was whether it translates to software projects at all. But then, I’ve recalled a number of teams that should think about exactly the same scenario. There are those which have the very same people dealing with analysis (prior to development) and testing (after development) or any other similar scenario. There are those that have a Jack-of-all-trades on board and always ask what the best thing to put his hands on is. There also are teams that are using external people part-time to cover for areas they don’t specialize in both upstream and downstream. Finally, there are functional teams juggling with many endeavors, trying to figure out which task is the most important to deal with at any given moment.

    So as long as I keep my stance on Kanban principles I urge you not to take any advice as a universal truth. Understand why it works and where it works and why it is (or it is not) applicable in your case.

    Because, after all, shorter cycle times and lower WIP limits are better. Except then they’re not.

  • Refactoring: Value or Waste?

    Almost every time I’m talking about measuring how much time we spend on value-adding tasks, a.k.a. value, and non-value-adding stuff, a.k.a. waste, someone brings an example of refactoring. Should it be considered value, as while we refactor we basically improve code, or rather waste, as it’s just cleaning after mess we introduced in code in the first place and the activity itself doesn’t add new value to a customer.

    It seems the question bothers others as well, as this thread comes back in Twitter discussions repeatedly. Some time ago it was launched by Al Shalloway with his quick classification of refactoring:

    The three types of refactoring are: to simplify, to fix, and to extend design.

    By the way, if you want to read a longer version, here’s the full post.

    Obviously, such an invitation to discuss value and waste couldn’t have been ignored. Stephen Parry shared an opinion:

    One is value, and two are waste. Maybe all three are waste? Not sure.

    Not a very strong one, isn’t it? Actually, this is where I’d like to pick it up. Stephen’s conclusion defines the whole problem: “not sure.” For me deciding whether refactoring is or is not value-adding is very contextual. Let me give you a few examples:

    1. You build your code according to TDD and the old pattern: red, green, refactor. Basically refactoring is an inherent part of your code building effort. Can it be waste then?
    2. You change an old part of a bigger system and have little idea what is happening in code there, as it’s not state-of-the-art type of software. You start with refactoring the whole thing so you actually know what you’re doing while changing it. Does it add value to a client?
    3. You make a quick fix to code and, as you go, you refactor all parts you touch to improve them, maybe you even fix something along the way. At the same time you know you could have applied just a quick and dirty fix and the task would be done too. How to account such work?
    4. Your client orders refactoring of a part of a system you work on. Functionality isn’t supposed to be changed at all. It’s just the client suppose the system will be better after all, whatever it means exactly. They pay for it so it must have some value, doesn’t it?

    As you see there are many layers which you may consider. One is when refactoring is done – whether it’s an integral part of development or not. Another is whether it improves anything that can be perceived by a client, e.g. fixing something. Then, we can ask does the client consider it valuable for themselves? And of course the same question can be asked to the guys maintaining software – lower cost of maintenance or fewer future bugs can also be considered valuable, even when the client isn’t really aware of it.

    To make it even more interesting, there’s another advice how to account refactoring. David Anderson points us to Donald Reinertsen:

    Donald Reinertsen would define valuable activity as discovery of new (useful) information.

    From this perspective if I learn new, useful information during refactoring, e.g. how this darn code works, it adds value. The question is: for whom? I mean, I’ll definitely know more about this very system, but does the client gets anything of any value thanks to this?

    If you are with me by this point you already know that there’s no clear answer which helps to decide whether refactoring should be considered value or waste. Does it mean that you shouldn’t try sorting this out in your team? Well, not exactly.

    Something you definitely need if you want to measure value and waste in your team (because you do refactor, don’t you?) is a clear guidance for the team: which kind of refactoring is treated in which way. In other words, it doesn’t matter whether you think that all refactoring is waste, all is value or anything in between; you want the whole team to understand value and waste in the same way. Otherwise don’t even bother with measuring it as your data will be incoherent and useless.

    This guidance is even more important because at the end of the day, as Tobias Mayer advises:

    The person responsible for doing the actual work should decide

    The problem is that sometimes the person responsible for doing the actual work can look at things quite differently than their colleague or the rest of the team. I know people who’d see a lot value in refactoring the whole system, a.k.a. rewriting from scratch, only because they allegedly know better how to write the whole thing.

    The guidance that often helps me to decide is answering the question:

    Could we get it right in the first place? If so then fixing it now is likely waste.

    Actually, a better question might start with “should we…” although the way of thinking is similar. Yes, I know it is very subjective and prone to individual interpretations, yet surprisingly often it helps to sort our different edge cases.

    An example: Oh, our system has performance problems. Is fixing it value or waste? Well, if we knew the expected workload and failed to deliver software handling it, we screwed this one up. We could have done better and we should have done better, thus fixing it will be waste. On the other hand the workload may exceed the initial plans or whatever we agreed with the client, so knowing what we knew back then performance was good. In this case improving it will be value.

    By the way: using such an approach means accounting most of refactoring as waste, because most of the time we could have, and should have, done better. And this is aligned with my thinking about refactoring, value and waste.

    Anyway, as the problem is pretty open-ended, feel invited to join the discussion.

  • Splitting Huge Tasks

    On occasions I deal with an issue small enough that it barely deserves a full-blown blog post, yet it is hard to pack it into 140 characters of a tweet. However, when I’m advising on such an issue for yet another time it is a clear signal that sharing an idea how to deal with it might be useful. So, following an experimentation mindset, I’m going to try short posts addressing this kind of issues and see how they are received.

    One of pretty common problems is with splitting tasks. For example a typical task for a team can take something between 4 hours and a couple of days. And then, there is this gargantuan task that takes 3 months. Actually, 3 months, 5 days and 3 hours. It is, however, quite a coherent work item. Basing on merits it does make sense to treat it as a single task.

    On a side note: for whatever reasons it happens more often in non-software development teams.

    The problem is, it heavily affects any metrics you may gather. Sometimes it affects metrics to the point when analyzing them doesn’t make much sense anymore. If you include this huge task in your metrics they all go mad. If you don’t, you basically hide the fact that a part of the team was working on something that isn’t accounted at all. So the question is: should you accept it and move on or do something with the task?

    I’m not orthodox but I’d rather split the task to smaller ones. Usually this is the point when new issues raise – for example the task can be split but for pieces so small that measuring them separately adds way too much hassle. An alternative can be grouping these tiny pieces into batches of a size that does make sense for the team.

    Anyway, I’d still go with splitting the task, even if division is artificial to some point. The knowledge you gain from metrics is worth the effort.

    In short: when in doubt – split.

  • Ball Flow Game

    If you visit Software Project Management on occasions you likely know that 100% utilization is a myth. A nice and simple experiment that shows impact of full utilization on effectiveness and at the same time presents value of WIP limits is a ball flow game.

    The rules are simple:

    • You get a group of people to process 20 balls.
    • Processing is just throwing a ball from one person to another.
    • The person who starts throwing the balls in is also a person who is last to touch the ball.
    • The ball should have at least minimal air time when changing its woner, i.e. it should be thrown not passed.
    • Everyone in the group should touch each ball.
    • The ball shouldn’t be thrown to any of two closest persons (the one on the left and the one on the right).

    The rest is pretty much team’s self-organization.

    The goal of the team is to process 20 balls as fast as possible.

    The team can arrange themselves in a way that is convenient for them, which most of the time means standing in circle. Then they set up the sequence: who is throwing to whom. And then, the fun begins.

    Following data is from the game I run with a group of 10 people.

    The first approach was no WIP limits at all, meaning that balls were thrown in as soon as the first person was idle. With this approach it’s not even the data that is most interesting but the looks of what’s happening. Balls are flying all over the place. People barely cope with coordination of passing the ball to the next person and coming back to the previous one to receive the next ball. Pretty often balls are dropped and left forgotten as new balls are waiting. It’s all chaos.

    And the clock is ticking.

    Simply by looking at the situation you may safely guess it is suboptimal organization. However it is how many teams still work these days. We decided to use it as a reference point.

    Cycle time for each processed ball looked like this.

    One thing is that it could be as much as 32 seconds to process a ball. Almost 3 seconds per person for something as simple as passing a ball. Another thing is that variability of cycle times was high – anything between 13 and 32 seconds meant that worst case scenario was 2,5 times longer than the best case. This left us in a place when we were hardly predictable in terms of time needed to process the next ball.

    One quick look at Cumulative Flow Diagram (measures were taken every 10 seconds) will show one of typical problems I see in teams: as the project goes further cycle times become worse (green part of the diagram becomes wider).

    Processing time of all the balls was 83 seconds.

    With the second round the team decided to limit work in progress. With no idea what WIP limit they should go they decided to try WIP limit of 5 balls, for the team of 10 people. Considering that processing time was very short – no one was expected to do anything special with balls – the crucial thing were handoffs. In ideal case each handoff requires 2 people, one passing the ball and another one receiving it, thus limit of 5. It also meant that for the processing time one of every two persons will be idle.

    First, the whole task was done in 63 seconds. Almost 25% improvement.

    Second, the way the group worked looked way better. Little chaos, no dropped balls, no collisions in flight, etc.

    Third, cycle time went down and became more predictable. This time it was anything between 9 and 15 seconds. Yay! We shortened our time to market.

    Cumulative Flow Diagram also looked better. Steeper curves mean better throughput and green part width (cycle times) are kept under control.

    With such good results a natural consequence is a discussion about the optimum. We know that WIP limit of 5 is better than infinity but should we test WIP limit of 4 or rather of 6? The group decided to go with WIP limit of 4 in round 3 and results were interesting.

    The end to end time was 61 seconds. Basically no change at all as I could potentially address 2 second difference to fluency with throwing balls.

    Was it simply the same as in round 2 then? Pretty much the opposite.

    The most interesting thing was what happened with cycle times.

    There was only one ball that was processed faster than in 9 seconds, which was the best result in round 2. However this time variability of cycle times was reduced hugely (8 to 10 seconds). The team became highly predictable.

    Considering that we were processing identical tasks, this was something we should expect, but it didn’t happen unless we introduced strict WIP limit. By the way, this predictability is neatly shown in CFD, which now looks very stable.

    There’s one more thing hidden here too. With more strict WIP limit we introduced more slack time. This time, even in ideal situation when every ball is passed we still have two people idle. Yet the end effect is still the same. The difference is that this additional slack time can be invested to improve the process or automate the part of it so eventually the team becomes even more effective.

    In short: considering that we have the same end to end team performance more strict WIP limit is better than looser one as it sets us on a better path toward improvement.

    A natural next step would be probably trying WIP limit of 3. However having chance to play in controlled environment the team did an experiment with limit of 2 to see what would happen.

    Basing on results so far outcome of round 4 could be somewhat predictable. Best cycle times went down even more with top result of 6 seconds.

    However it came at cost of bigger variability as worst cycle time remained the same (10 seconds). We drove predictability down.

    Cumulative Flow Diagram, again, looked neatly – nothing to worry about.

    From CFD you can guess the key statistic here. Overall processing time went up. 85 seconds. A tiny bit worse than without limits at all.

    However, again, comparing only rounds 1 and 4 I believe that the one with WIP limits is a way to go. Considering that the whole task was completed in the same time we had a lot of slack time that could be invested in improvements, there was less pressure and significantly less chaos. In other words: short-term results are similar, long-term ones should be better with WIP limits.

    Now, why am I telling you all this? First, to show you the mechanism. You should be doing exactly the same thing with your real Kanban implementation tweaking WIP limits and measuring outcome of these changes to find local optimum.

    Second, there’s underlying assumption made here. One that is super-important. You need to measure how you’re doing otherwise you won’t be able to tell whether after changes you are doing any better than you’ve done before. If you don’t have meaningful measures already in use then start with this, before you play with your WIP limits.

    And now that you asked, no, I don’t consider your gut feeling “a measure.”

    Big thanks to Karl Scotland, who I learned the idea from. If you want to play the game I strongly recommend using spreadsheets Karl kindly shared.