Are estimations overrated?
Are estimations overrated?
When I started working (not in IT), productivity was often measured in man-hours—like the classic scenario where one person takes two hours to finish a task. Years later, when I transitioned to IT support, productivity was measured by work orders (WO) per person. This brought the classic quality versus quantity dilemma. To balance this, other metrics like SLA (Service Level Agreement) and NPS (Net Promoter Score) were introduced.
Later, I entered the fascinating world of software development and Agile. At my company, we applied Agile principles beyond IT—spreading them to marketing, HR, finance, and even support. We started with basics like sprints, daily standups, reviews, retrospectives, and planning meetings.
During the planning sessions, two questions always came up:
1. What activities become “user stories”?
2. How do we know how many user stories we can fit into a sprint?
For the first, we used the INVEST concept, focusing initially on "Size." For the second, we relied on trial and error, as early sprints lacked metrics.
Initially, without metrics and estimates from the previous sprint, it was difficult to make a real estimate of how many items a sprint could accommodate, and so we began our conversation about estimates. As the teams evolved and matured in the basic concepts of agility, we always tried to introduce small changes into the team's daily routine so that it could be tested (in a kaizen style, instead of a kaikaku transformation).
To improve estimates we tryied T-shirt sizes (S, M, L) and I remember that one of the teams that had the most difficulty with estimations used to make the classic relationship table: A small activity takes a maximum of 4 hours to be completed, a medium activity takes 5 to 12 hours to be completed and so on. In an attempt to avoid this relationship between size and work hours, we changed the concept of T-shirt sizes to story points.
The first mention of story points are from Ron Jeffries in 1999 and came from XP (eXtreme Programming) where he suggested a way to estimate the time needed to complete a user story, based on “ideal days”. Since this caused a lot of confusion for stakeholders, he suggested the idea of “points”. This concept became widely used in Scrum, even though it's not explicitly in the Scrum Guide.
To add the cherry on top, I also bring the concept of “velocity”. Velocity is basically the number of story points completed in a period, usually in a sprint. Subjectively, this concept brings the idea of “the more story points I deliver, the better”. However, when we focus on velocity, we tend to forget what really matters: Are we delivering value or just rushing to deliver the highest number of story points? In the end, the concept of velocity is just a way to create pressure on a team to deliver more. Even if you don’t use the value to make comparisons between different teams, the comparison within the same team already brings the idea that we must deliver more.
Imagine a situation that we are making estimates based on hours: We have 14 days of work in a sprint (counting weekends). In those 14 days, you have a limited number of hours to dedicate to work, and no one expects you to deliver 10 hours of work in 8 hours of work. But in a team that works with story points, the sky is the limit. Questions start to arise, such as “Why doesn’t the team challenge itself to deliver more?” or “Why isn’t the team working hard enough?” In this model, it is believed that the velocity of a sprint is like the universe – always expanding.
When we work with story points, estimates, and velocity, we are basically trying to predict the future. Ultimately, no one knows exactly how long it will take to complete an activity. We can have an idea of how long it will take but everyone is “guessing” a number. Many times, requirements are unclear or likely to change. So, what is the point of arguing whether an activity is a 3, 5, or 8? Wouldn’t that time be better spent understanding requirements? Furthermore, imagine that you are a developer who is going to pull an activity: what difference does it make to know the number of “story points” for that activity? Will you stop doing something because it is 8? Ultimately, it is an irrelevant number for software development.
So, who uses story points? What is the purpose of this number? Common answers are:
1. To know how much I can allocate in a sprint
2. To understand when a story is not yet detailed enough
In the first case, we’re guessing how much can fit into a sprint, using a subjective and relative metric... There is no definitive method for assigning story points to an activity. For me, it may be that the execution time has a greater weight, for you, it may be that the complexity of the subject has a greater weight. Notice how many variables I am using to try to predict what will happen in the next sprint. There are so many points of possible failure.
Imagine that in your planning you reached a velocity number for a sprint and for some reason you do not deliver the "magic number" you committed to. What was the mistake? Did we underestimate or overestimate an activity? How can we correct this so that next time we are more accurate? Keeping this situation in mind, in the next planning poker we may end up “inflating” the story points, just as we do with hourly estimates. In the end, this leads to over- or under-estimation and inflates numbers over time.
In the second situation, where they say that estimates help to understand when we need more refinement, imagine a team with more experienced people, people with average knowledge and newbies. You are scoring a story, and the seniors give it a high value (an 8), the people with some experience give it a low value (say a 2) and the newbies will try to guess what the others will do. In a scenario like this, it is very likely that the story does not contain the necessary details to be executed, so it needs more refinement. Great, we met the expectation. The issue is that we arrive at that conclusion after a long technical discussion. But wouldn’t it be more efficient to refine requirements beforehand, involving only necessary team members?
Back to the story: We switched to story points... and what was the result? I can tell you that, out of all the teams that adopted story points, only 2 or 3 were able to achieve good predictability. Most struggled, questioning team dynamics and processes instead of focusing on delivering value.
And in the teams that managed to achieve predictability, we had slightly different problems. This team always delivers the same number (great), but why aren't they challenging themselves? Are they complacent?
Recommended by LinkedIn
To make things clearer a wrote down a pros and cons for Story points:
Pros:
1. Relativizes size – as opposed to absolute size and the concern of knowing exactly how long an item will take to be completed.
2. Collaboration – Creates an environment of collaboration and makes the team work together to reach a consensus. This helps to ensure that the entire team is on the same page.
3. Flexibility – Since it is relative, story points adapt to the needs of the team, encouraging discussion of what should be done.
4. It is better than nothing – when you are starting and do not have anything to work with, at least you have a guideline or an idea of how to start.
Cons:
1. Inconsistency – since they are relative and subjective, different people can give different sizes to the same story, even if they are aware of the scope.
2. Time-consuming – estimating with story points can consume a lot of the team's time, especially when we are talking about relatively large teams.
3. Lack of standard – there is no standard for choosing the size of a story in story points. There are some guidelines, but each team or individual has their own interpretation, and this makes the comparison between teams completely distorted.
4. Misuse – since it is a widespread and widely used concept, it did not take long for many people to use it incorrectly, comparing and demanding teams to have a specific performance based on story points.
5. Mushroom management reversed – Mushroom management would be a form of management where management purposely hides some information and communicates as little as possible. As a result, no one knows what is going on except management. However, in the case of story points, it would be the developers keeping management in the dark and making it less transparent how each person invests their time.
The moral of the story is that story points are a complex and difficult concept to understand. Complexity, uncertainty, execution time and risk can influence the score, but each of them alone is not enough to determine the amount of time needed to complete an activity. In my opinion, when we use story points, we end up getting into irrelevant or even unnecessary discussions for that quorum or for that specific moment.
Then you might ask me: Oh, but then what do I do? Estimates are quite useful and relevant to the world of software development, but they are not a measure of efficiency or effectiveness. They are just tools that can give you a direction instead of an absolute truth.
I realized not long after adopting story points, that we needed to change our approach. We gave up on story points and started working with metrics more focused on Kanban, such as Throughput, burnup/burndown, Lead time and Cycle time, among others. And we are doing very well. When I arrived at Philips, I remember that most teams worked with story points and we gradually phased that out, using both metrics for a while, until we were stable enough to continue using only Kanban metrics. Using metrics that rely on historical data remove the burden of estimations and foreseeing the future from the developers, creating an environment with less blaming and less pressure. But that's a topic for another conversation.
References:
Scrum Master @Philips | Agile Methodology, PSM I, Kanban System Design, SAFe, Non-Violent Communication
2wRequired reading