Technical Debt: Naughty or Nice?
ByOn January 7, 2013 At 12:46 pm
Responses : 3 Comments
Many years ago I was working with a very large customer, both from user-base and traffic perspective, with a pretty interesting business model at the time. Their MO was that, “time to market” is everything. And I mean everything. With the vast majority of their initiatives they were willing to launch that day, knowing that there was a potential that it would fail for 20% of the users, rather than tomorrow, knowing that those issues would be fixed by then. As a software developer, that mentality drove me insane. Although, as I learned more and more about their business, I understood that there can be tremendous value in that approach. That translates to: you’re willing to take 80% of the revenue today, and 100% tomorrow (when the issues are fixed), instead of just 100% tomorrow. Granted, you’re running the chance of losing 20% of users forever, but hey, they were a marketing company, they knew how to incentivize users to return. In essence, they were willingly accruing technical debt (and most of that large system was just that) in order to maximize the revenue.
Presently, that model is called “agile-like” and has proved to be pretty successful in the online world. However, returning back to the developer’s perspective, that model translates to a system with thousands of lines of “corner-cutting” code. Over the years, some of the aspects of the system were significantly refactored and cleaned up (in parallel with adding additional features), but a lot of the core initiatives still operate on the original, spaghetti-like, code. So, is the company running the risk of having the accrued technical debt outweighing the actual ROI? Or is there a statute of limitations on technical debt to the point where it becomes “a feature, not a bug?”
Let’s examine a simple example taken from the experiences with the same customer. Like most companies running large online applications, the company had a pretty extensive FAQ page, describing its operation and guiding people to the information they were seeking. Since the original application was built in the ‘90s, the FAQ page was a static page and, subsequently, any updates to it had to follow code deployment procedure. As the business grew, the application became increasingly complex, requiring a larger architecture and, subsequently, a more rigorous and complex deployment process. Unfortunately, because of the company’s growth and rapid feature deployment, the FAQ page had to be updated much more often than normal. So, when asked to update and redeploy the FAQ page second time in a week, I did what any lazy programmer would do. I built an admin page, moving the context to the centralized database, and allowing the customer to make changes to the FAQ page at will. Because I was in the middle of yet another “has to be done yesterday” initiative, I spent no more than 30 minutes on the new FAQ, quickly tested it for basic use cases (in the spirit of the customer philosophy) and deployed it for the customer to use.
Fast forward to present time. In the office, we have a whiteboard with a list of people who “broke” production for that customer in the past month. It can be anything from a bug found in deployed code, to mis-deployment, to an update to production database without “WHERE” clause (yes, I’ve done it in the past). To my surprise, I saw my name on that board. Normally, with my mantra on development, it wouldn’t be a surprise, except I haven’t worked on that customer’s code in many years. So I had to ask about it. What I learned is, when I was creating the database schema for the FAQ page, I forgot to add a primary key on question id, and by some weird, edge-case scenario, when the customer updated some questions, it updated every question with the same text. Developers currently working on the code had to restore content from the backups, and spend an hour looking for the problem and fixing it. And they found out that it was my code thanks to the trusted svn blame command, showing that the code in question was last modified by me on August 12th, 2004. Eight years ago. For eight years, the code (that happened to break pretty gloriously) was working in production on a high-traffic site as intended. The hour it took to fix the problem in 2012 seems to be a more than acceptable return on what was considered to be 30 minutes of technical debt in 2004.
So is there a statute of limitations on technical debt? And if there is, what is the reasonable time frame? The best answer I’ve heard to date is two years, since it is the average lifespan of a programmer working on a particular project. And even though the answer is as good as any (and probably much better), I would argue that the correct answer, in most situations, is “that depends.” It depends upon the complexity of technical debt, the time invested to maintain and hundreds of other factors. And even though the example above does not cover nearly enough of those factors, it does help to dismiss some of the common myths about technical debt.
1. The maintenance cost of technical debt will increase at a rate that will eventually outrun the value it delivers to customers.
As is clearly seen by the example above, this is not an axiom. The main factor that comes into play with this assumption is the complexity and frequency of changes in the code in question. “If it ain’t broke – don’t fix it” is appropriate here.
2. The cost of fixing technical debt increases, the longer it remains in the system.
This is actually untrue more often than not. Unless the technical debt component is actively modified or being expanded, the cost of fixing it will not grow. As it was eloquently put in response to the example, the time (to fix the problem) in 2012 is much cheaper than extra time (needed to add a primary key) in 2004 during the rush to market for the latest initiatives.
3. All technical debt should be avoided at all cost.
The key word is “cost.” If the cost of avoiding technical debt exceeds ROI, minus the management cost of technical debt – you’re losing money.
The bottom line is – not all technical debt is bad. It’s not good either. Sometimes amortization of the technical debt over time makes it irrelevant. As I’ve said before, technical debt should be weighted against the actual ROI, and managed accordingly. This way you’ll get the best bang for your buck. Literally.