tech debt – Chris Dwan

May 30, 2023

Overcoming Ops Debt

I would like to talk about tech debt’s sneaky sibling, something that I think of as “operations debt.”

Tech debt is the accumulated burden of shortcuts, approximations, defects, and hacks that creep into a product or system as the team focuses on production rather than perfection. A little bit of tech debt is a good thing. Shipping is a feature and your product needs it. There is no virtue in spending additional sprints putting a high shine on a cannonball. In excess, though, tech debt creates an ongoing burden on the team, both because brittle and imperfect code is harder to support – and also by an accumulative duct tape on duct tape effect that eventually necessitates the dreaded and much maligned rewrite.

Operations debt is a related concept, and might even be a subset of tech debt. It’s the ongoing burden that a team or individual endures when the broader organization continues to rely on them for help with stuff that either never was or is no longer supposed to be their job. The reward for making something useful is an endless parade of people who seek you out for just one little tip, trick, tweak, or bit of advice or support. If you make a useful open source tool, you wind up supporting the whole world unless you are -spectacularly- good at setting boundaries.

The debt metaphor is a good one: Let’s say, for example, that you want to have friends over to watch the big game on Saturday, but you don’t have a TV and you don’t have cash. You -really- want to have a party, so you buy the TV on credit. Everybody has a great time, but when you wake up on Sunday you are left holding a big TV, good memories, and also a monthly payment that cuts into your future finances until you pay it off.

I once had a report who – despite being a director and manager of managers by the time we met – would still come back to his office from time to time and find senior members of other teams sitting on his desk, wanting updates to that spreadsheet he made back when he was “Andy from the lab.”

Operations debt is insidious. Without a deliberate effort to surface it, ops debt doesn’t show up on the backlog, doesn’t get estimated or assigned story points, and can’t be distributed or re-assigned across the team. Your best contributors – the authors of the stuff that gets broad use, and the ones that everybody likes because they are so nice – simply become slower and less productive over time.

Most managers underestimate how little distraction it takes to bring an individual or a team’s velocity to zero.

Curing operations debt takes time and effort, but transparency and accountability – coupled with a commitment to keeping the team moving forward without abandoning the broader organization – can get the job done.

The virtue of having one (1) front door

My starting point for managing ops debt (as well as a host of related challenges) is fourfold:

Create a single ticket system for all requests.

Jira, RT, or even ServiceNow are all fine. I’ve even seen integrations between slack and google forms work – though I question the wisdom of creating yet another chunk of bespoke software that will eventually need to be supported. The particular technology matters less than the organizational and management commitment behind it.
Assign an Ops lead / quarterback who is responsible for triage, dispatch, and communication. This person should be technically competent, trusted by the organization, and comfortable holding their ground and pushing back.

It’s important to note that the ops lead is not expected to actually do all the requested work any more than the quarterback is expected to throw, catch, and block all at the same time. Their job is to -surface- the operational debt in a format that the team can track it along with all the rest of their commitments. If things go well and you have the resources, give the operations lead a modest team and ask them to constantly be reducing the most common requests to practice – writing playbooks that allow junior team members or contractors to do the work, or even scripting up self service solutions that nip the debt in the bud.
Tell everybody from the CEO on down that they have to use the ticket system or talk to the ops quarterback when make requests of the team. If it doesn’t exist in the ticket system, it doesn’t exist.
Practice radical transparency and gently but firmly allow senior leadership to do the job they should have been doing all along – establishing clear priorities for those one-off requests that are leeching the hours in the days away from individual contributors.

Radical Transparency

We are all familiar with trouble ticket systems that function as inscrutable tombs for requests and complaints. That’s why I recommend making the operations queue visible to the entire organization.

Radical transparency is the upper right quadrant where we both care deeply and also engage directly. It exists in contrast with ruinous empathy (care but don’t engage – thoughts and prayers), obnoxious aggression (don’t care, just here to tell you what’s wrong), and manipulative insincerity (bless their hearts). Radical candor can be scary at first, but it has become my go-to over the last decade.

Also, I believe that everybody makes better decisions when they have access to better information, so why not show people what’s really going on?

Questions like “what the heck else is the team working on?” and “where am I in the queue?” ought to be self service. At the very least there should be one (1) point of contact who is able to provide authoritative answers. That’s your ops lead / quarterback. Over time, radical transparency pushes questions of inter-departmental prioritization back up the chain of command – where they belong. It’s wildly unfair, yet utterly ordinary, for senior managers to abdicate their core duty and push these questions of priority down on team leads and individual contributors – who try diligently even as the strategic objectives slip away.

TL;DR: Don’t just go sit on Andy’s desk.

People Change Slow

Here’s the fun part: You’ve got to wait 3 to 6 months for people to stop hating the change.

Changing human patterns of behavior is unbelievably slow. Any time you restructure an organization or change an procedure, you can expect 6 months of confusion and complaint. People liked the old way better. They don’t understand this new thing. Why can’t they just sit on Andy’s desk until he updates the spreadsheet like he did last month? Who even uses this ticket system anyway?

Because Andy has a different job now, that’s why, and we’re making space for them to do it.

At about the six month point, you will see the first glimmers of daylight on improvements due to the new way of working. In about 18 months, people will have forgotten about the old way entirely. It really seriously does take 6 months to see the benefits of a reorganization and 18 months before it will feel right to the team. If you make further changes, it resets that clock and corrodes trust that management knows what the heck we’re doing.

That long timeline and inevitable complaint about complexity is one reason to keep it utterly simple. There is one (1) intake for operations support for the team. There is one (1) person whose job is to triage and communicate. There is one (1) master backlog of all the outside requests, and it is force ranked. You can go look at it yourself, and if you don’t like what you see – don’t take it up with Andy, take it up with me.

Trust me, the team will be happier and more productive for it.

All Posts, Management / Leadership

1 Comment

Tag: tech debt

Overcoming Ops Debt

The virtue of having one (1) front door

Radical Transparency

People Change Slow