For the last several years, I’ve worked on an inherited microservices system. My early and continued contribution to this effort has been to champion incremental refactoring. This has led to large-scale emergent improvements to the structure of the business logic in our core service. Though it may have been at one point, I do not consider code quality to be a major item of technical debt.

It turns out that not only has the code been inherited, but also the cultural norms and thought processes. Although the original development team is no longer with the organization, we do, after all, work within the same organization that allowed this mess to be created in the first place. And that makes solving some of the items below more difficult. The biggest items of technical debt that I see are:

  1. For some of our services, we depend on another development team to update the code (at least to merge our pull requests) and to deploy the services. We live with every conceivable permutation of code ownership and service ownership. In the ideal case, we would service our entire business-domain concern using services owned by our team based on code owned by our team. Any deviation from this is a huge waste of time.
  2. Our deployment pipelines are not fully automated. In most cases, we have a CI build, but we don’t do any CD. I have created a local script for deployment, which I have personally benefited from immensely; as have several new joiners. But I have been unable to convince my management to invest more into this. My understanding is that this system is designed for longevity — and even if not, it has already been in production for years. It’s a huge waste of time not to automate deployments.
    1. Somewhere on this list belongs the notion that production deployments have a lot of non-technical “red tape” to them. We have to go through a heavy-handed “change management process” in order to deploy to production. I have to call this out as a big-ticket item of technical debt. Production deployments are not routine, and so we cannot close the feedback loop quickly on any small issues (such as improving logging). The pain of abiding by the “change management process” leads to batching production deployments. The highest deployment frequency I have seen is once a month. Everything I’ve read about high-performing teams indicates that our at-best-once-a-month-production-deployment-frequency may be our number 1 technical debt item.
  3. We do not have a comprehensive (within each service), consistently-implemented (across all services), and reliable feedback mechanism for technical exceptions (errors) in production. For any system deployed in production, we must have awareness of all of our technical exceptions. We have failed if we are only finding out about technical exceptions because users have reported a behavior failing, unbeknownst to them, downstream of a technical exception. Beyond just a waste of time, it is lunacy to build systems without this entry-level monitoring of technical exceptions in place. This is paramount in any microservices deployment.
  4. Incomplete understanding of the domain and blurry lines of responsibility/ownership. We have done a lot on this front, but the nature of the poorly-drawn lines of ownership in the inherited system make it difficult for us to fully understand the domain. The lesson here is to organize teams around clearly-defined concerns within the domain.