Why Exciting Operations are Bad

A little excitement in your job is usually a good thing. It could be learning a new development language, preparing to release a new feature, or taking on new responsibilities as part of a promotion. That’s great for most jobs, but not operations. Let me tell you why.

What Exciting Means

After nearly a decade of development, I still get nervous when I hover over the hotkeys to execute a database command in production. I hope that feeling never goes away.

When you’re manually running SQL commands in production, it usually means some other process broke down. Ideally, a script runs the command and has already been run in lower environments before making its way to prod. Ideally.

So why, after so long, does my heart rate spike and stomach lurch with every SQL command or service restart? Do I just not have enough experience? No, it’s because prod is prod is prod.

Prod is where the customers are. Prod is what pays the bills. Prod is where all the critical data is. Prod is where the highest load is. Prod is the only environment that actually matters. I don’t need anything exciting related to prod in my life.

“Exciting” in prod means not knowing if your application is up right now. It’s wondering if you’re not getting alarms because the system is healthy or your instrumentation is broken. It’s wondering if the bleeding edge service you just started using (because it’s cool) is really production quality.

I’ll get my thrills elsewhere.

Too Much Excitement

Choose your tech carefully and recognize that you’ll operate your application far longer than you’ll develop it.

Boring Doesn’t Mean Bad

I’m not recommending you only use old tech in your stack. Hiring and retaining developers is hard enough. The goal with boring operations is twofold:

  1. Use established technologies for core development and operations

  2. Don’t get clever - Keep runbooks, documentation, and automation simple

Understanding Your Tech

As a developer, I always found it fun to try out the latest framework on GitHub or AWS service. Toy solutions are easy to build and require oh so little code. It’s almost magic how everything works.

As an operator, I’ve learned I don’t like magic. I remember one of my earliest authentic debugging sessions involved a tiny DB error. I ended up going down the following rabbit hole:

  • I was using Grails

  • Which is built on Spring Boot

  • Which uses GORM for object-relational management

  • Which uses Hibernate under the hood

  • Which ultimately connected to H2 locally and MySQL in the deployed environment

It turns out I chose a name for a column that was reserved in MySQL but not in H2, causing my deployed environment to fail. Along the way, I learned that I was relying on a lot of tech that I didn’t know a thing about.

Grails made development so much easier. The tradeoff I didn’t realize was that I didn’t know what was really going on under the surface, so that I couldn’t operate the app. That’s a dangerous place to be.

Don’t take this as a recommendation not to use cutting-edge technologies. My warning is that there’s a hidden operational cost to these abstractions that you only learn about after investing in development.

Choose your tech carefully and recognize that you’ll operate your application far longer than you’ll develop it. Optimize for operations, not development.

Don’t Get Clever

Another development trap is getting clever. How many operations can I fit into this one line? What if I shorten my temporary variable names? Sure, this may make development more straightforward, but development is not the majority of an engineer’s job.

Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code. ...[Therefore,] making it easy to read makes it easier to write.
— Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship

I’ve fallen into this trap many times. It’s a humbling experience to crack open the Git blame on a mess of a function and see your own name staring back at you.

I’ll add an operations corollary to the quote above:

Any code, instrumentation name, logging message, or runbook not understandable within 5 minutes of waking up at 3AM is too clever.
— Me, This Post

Do future you a favor: don’t get clever. Make your variable names clear (characters are free, the compiler shortens them anyways), make your metric names clear, make your logs clear, make everything just a little more verbose for clarity. Future you (and your colleagues) will thank you when an incident inevitably happens. The last thing you want to do is decipher code during a production outage at 3 AM.

Getting to Boring

So maybe you’ve been convinced by this article to make operations a little more boring. More likely, you’ve probably gotten burned by a particularly nasty production issue where you wished past you had spent a bit more time during the development phase. The good news is that no matter what you’ve done in the past, you can always invest more starting now.

Invest Time to Save Time

Development speed and productivity can feel like the pinnacle of engineering prowess. If you’re working on a production-quality product, you need to throw that out the window. The faster you write code, the faster you write bugs, too. Even if you can write code 50% faster than the next person, you’re less efficient if you spend half your time fixing bugs. As the Navy SEALs say, “Slow is smooth and smooth is fast.”

The faster you write code, the faster you write bugs, too

When you build features or fix bugs, take your time to do it right. Typically this means layering in tests and documentation, which I still encourage. For SaaS products, you have the additional responsibility of making your features operable. Invest time now to save yourself (and others) time in the future. I, for one, am more than happy to invest a few hours during the business day to save myself 15 minutes in the middle of the night. Wouldn’t you?

 

If you’re interested in learning more about getting to boring or my other offerings, send me a note at brian@connsulting.io or schedule a time to chat at https://calendly.com/connsulting.


Related Content

Previous
Previous

The 5 Stages of a Production Incident

Next
Next

SaaS Developer Priorities