Different techniques for debugging code
I run a lot of training workshops where people are learning new things, inevitably bugs will occur and being able to track down the source of these bugs is a very important skill. There are many effective debugging techniques and getting to know a few will greatly help your productivity. Many of the options I'll be talking about here are really language agnostic but I'll use Python for the examples.
- Print statements, perhaps the most simple method to see what's happening
- Using breakpoints to inspect state
- Dropping into an interactive shell, see my article about this here
- Setting up good logging (including logging unhandled exceptions and other situations that cause your programs to crash)
- Performance profiling
Print statements
There probably isn't a simpler way to start debugging code. Around the parts of the code you are wishing to analyze you just throw in a bunch of prints to see what's happening with the state. There's very little conceptual overhead to get started with this which is why a lot of people probably start by learning this technique. It's been so long that I can't really remember not knowing about this, but I'm fairly sure print statements are a huge improvement over nothing. Even now when I know a lot of more sophisticated approaches this is still useful for some quick-and-dirty debugging from time to time.
Since you have to choose what to print and when to print, there's definitely an art to doing this type of debugging well. You want to understand the way in which the flow of execution works in the language you are using, this will allow you to place the print statements at the most appropriate places and to print out the most appropriate state.
Generally speaking what you are trying to do is to pick two points at which you can see the change (delta) between a snapshot of some relevant state before and after the program has run some section.
One strategy I use is to try to determine what part of the state I want to test, for example foo==1
should be true for an entire block of code but it's somehow ending up as 2
. So I'll want to inspect the state of foo
and I'll try to place print statements such that I can narrow down the position in the code in which foo
is erroneously changed.
What goes wrong
Unfortunately print statements will only work if you have access to a terminal. This means that if you do something like deploy to a headless server then you will lose your output. This also means that if you deploy it to a client site you will lose the output from the print statements. Occasionally people will do things like screen-shares to debug things but this will only cover the situations where the bug is able to be replicated in the screen sharing session, this constraint on print based debugging is very limiting in the situations where you wanted to do some sort of post-mortem analysis of what went wrong.
The other big issue with print statements is that they don't really deal well with debugging any sort of concurrency issues. Print statement debugging works on the basis of seeing a pair of states that are before/after and being able to inspect the changes that have occurred in between. Because this is often a code level change you will want to be able to know that any change to the state in between the first and last print statement are from the code called between those print statements. In a concurrent situation the state may change in a different flow of execution that will not be reflected by the code that was run in between the start and end print statements.
Changing print functions into a logging call in Python
More than once in my career I've been consulting on a Python project that does logging via print
s.
A great first step is to start to introduce proper logging into a project in an incremental way.
For example you could overwrite the built in print
function to add a logging call to it if you have to deal with a large number of print
s that are de-facto logging calls.
import builtins
import logging
logger = logging.getLogger(__name__)
def logging_print(*args, **kwargs):
logger.debug(args)
builtins.print(*args, **kwargs)
print = logging_print
Incidentally this is one of the reasons why it's good that print is now a function rather than a statement since it allows exactly this behavior to be possible.
Logging
So print statements have some downsides, notably it's hard to control the levels of information that comes out of them since you really only have 2 options:
- Print the message
- Don't print the message
Sometimes you start seeing people prefix messages with a severity level and a timestamp and other file writing utilities, but eventually when you start adding all those features you really are just getting a defacto logging framework.
One of the best things about logging is that you can have different levels of severity for messages. There's more than one situation that calls for displaying some output from the runtime of code, for example when you ar developing a new feature you may wish to put a large number of logging calls in so you can get a good understanding of what's happening in your code as you are writing it. This is the sort of logging that is done for the benefit of the developer when developing new code, in these development cases you are often wanting a large amount of highly localized output. Take for contrast the case when you are trying to debug some issue that occurs in production you may wish to have a very different set of information presented to you, you want to know about anything that could have been more serious than a warning or anything thats an error, regardless of where it came up. Being able to filter based on severity will allow you to narrow in on problems in a far more directed manner and in the critical moments where you are trying to debug an important serious issue it's great to be able to remove all the debug and development related logging from your output as this greatly improves the signal vs noise.
The other thing that this allows you to do is to much more accurately get a sense of timing, good logging will allow you to see not just what happened but when it happened.
Interactive shells
Dropping into a debugging shell is a great way of being able to determine what has gone wrong. I have a whole article about interactive shells in Python here, the main distinction is between local vs remote shells.
Most major languages have a debugging utility that will allow you to interactively step though the execution of code. This is a very powerful way to see what the state is in the code that's being executed.
Performance profiling
Some code may do things like hang due to poor performance or non termination of a call.
These sorts of bugs are quite distinct from the ones that will crash the execution and may not even be detectable via static analysis tools.
For example you might have situations where going from a algorithm that's O(n * log n)
to O(n^2)
can make the code stall for an extremely long time.
There's a few ways in which you can debug these type of problems.
Statistical profiling is one way that you can get a sense of where the time is spent executing and example of this is gprof
.
There's other profiling that looks at the call stack and can give you a sense of which functions are calling which other functions.