Debugging is one of those tedious, but necessary aspects of any software engineering team’s work.
Before sending a product to market, you want to make sure it’s error-free.
Granted, there’s a balance to strike between writing clean code and delivery. Perfect can quickly become the enemy of shipped (for more on that, check out this article about avoiding unnecessary performance optimization).
But to create a quality product, debugging will have to be part of the process.
We sat down with Mitch Pirtle, VP of Engineering at Morning Consult, to delve into the significance of debugging. We also talked about work efficiency when tackling and solving an unknown problem.
He explained the fundamental steps of the debugging process and the common challenges involved. He also provided an insider’s perspective on the overall impact debugging has on operations and how it contributes to fostering the culture at Morning Consult.
The conversation below has been edited for length and content.
How do you approach debugging errors in a program?
The first thing I look at is the instrumentation.
Where are my logs?
Depending on what we’re building, there has to be something I can look at to see what happened and what might have caused the problem.
Based on this, the second question I ask myself is do we need to fix this?
For example, we fully automated our infrastructure on the cloud to the point where we’re using Spot Instances now because it’s more cost-effective for us.
I’d rather launch a bunch of cheap instances than a couple of expensive instances that never go down. That means you’ll have something get a little frisky and fail on you, but you won’t know if it’s your code or a problem with the hardware. The question then is – was this a transient failure or a repeating issue that actually needs to be resolved?
After looking at the evidence that it happened in the first place and justifying that there’s an issue to be looked at, I would say walk down the stack to see where the initial cause of that problem came from.
At Morning Consult, we’re in the process of kicking off an effort to automate distributed testing of our systems. For example, when one of our backend teams wants to deploy a new update to an API service that the front-end teams are relying on (one that builds and runs on staging), we want an automated system to grab all of the other staging environments and test on that API service to make sure that from an external testing perspective everything is still kosher and good.
We really rely on that testing and coverage. When something does go wrong, we can at least be informed of it and say this is something we need to look at. Typically, you also know where to look based on the error you’re getting.
What are some of the challenges you face while debugging?
Generally, the engineers are writing code. The code gets reviewed by two engineers. When the engineers say it’s good to go, it gets merged. Our CI pipeline picks it up and runs tests. Passing all the tests isn’t good enough. It has to check how much is being covered because you could be testing only 10% of the conditions. When everything is thumbs up, it goes live.
We’re going to be leveraging Kubernetes so that developers are no longer running the applications locally on their machines when they’re developing, but the experience remains the same.
If I’m editing a text file, save, go to my browser, and reload, I see the changes but none of that is actually running on my computer. It’s running on my own private namespace. As a manager, nothing makes me want to bang my head on the pavement faster than when developers say “but it runs on my laptop.”
So that’s another challenge: avoiding dependents on local configuration.
Once a system is deployed in any state, whether it be staging or production, we would rather have way too many logs than not enough. Then we build alerting based on the intelligence of those logs. That takes a lot of work and gumption to sit through. There’s a lot of white noise when you first set this up because it’s so noisy. After a while you just want to turn it off. That’s where most companies fall short.
If something breaks, we’re all just curious to know what happened, even if it’s totally useless knowledge in the end. Being rooted that way, we just stick with it until we get to the point where we only see alerts that are really relevant.
How has this debugging mindset fostered the culture at Morning Consult?
I think it’s due to practicality more than anything else.
Alex, our CTO, was pretty much a one-man army for the first several years of our existence as a business. He had to learn a lot of things the hardest way possible. He automated himself out of a job as fast as he could.
When I got here two years ago, I was thinking – I’ve got to dig deep to contribute here because they’ve already automated all the things. That was very rare, even in modern times, to walk into a startup this young and small. Usually there are really good engineers, but they can’t make anything go. This was the opposite. We didn’t really have many engineers, but we had everything fully automated on Amazon.
I would say that in a nutshell, engineering at Morning Consult means the KISS principle. It’s brutal minimalism as software engineering.
Only code what you have to code. Nothing more.
You don’t introduce bugs. You’re not predicting the future. You’re just doing what you need to do and then moving on.
It keeps things simple and that means extending them is simple, debugging is simple, and scaling things out is simple.
How does Morning Consult encourage efficiency differently than other startups?
In my eyes, part of what makes recruiting for Morning Consult so fun as a VP of Engineering is I know that the environment here gives you that opportunity for learning. It’s a very learning-rich, growth-rich environment without the craziness and toxicity that you see in a lot of startups.
Engineering-wise, and across the organization, we always gravitate toward team-first, low ego mentality. I joke that we’re all plant eaters. If I was to introduce a meat eater, it would be devastating.
If there’s a conflict, you’re not going to see engineers just taking potshots at each other or reverting each other’s commits and doing the whole passive aggressive arguing stuff.
They talk about it.
“Well, why do you think this is the best idea? I always thought it was that.”
It’s an honest conversation between two people and the goal of those conversations is always to get better. It’s never to shame or be better than someone else.
We strive to get better as a team.