Why Code Breaks

5 minutes, 1 link
From

editione1.0.1

Updated August 7, 2023

We canโ€™t predict every dependency between our systems, or even all the dependencies between pieces of our logic in the same system. This alone makes it difficult to avoid introducing breaking changes, but itโ€™s not the only thing that contributes to broken code.

โ€‹exampleโ€‹Letโ€™s look at other ways the code you write may break.

  • Untested code. You may think itโ€™s a small change and you donโ€™t need to test it, or you may be in a hurry to fix a bug, so you put your code up for review as soon as you finish writing it. While you may think youโ€™re working quickly, this is an easy way to introduce broken code into production because you didnโ€™t take the time to actually test it. While your code may look correct at first glance, itโ€™s possible your logic may have unintended behavior that you would never know about unless you actually ran it.

  • Unknown edge cases. The data you use during development and testing may be clean, structured, and made up of expected values, but production data is often messy and varies greatly. Your system will need to handle inputs and events from your users (or other systems) that you didnโ€™t know about or account for when writing your code.

  • Missing context. Oftentimes, the person who wrote the original code wonโ€™t be the person who is updating or fixing it. Perhaps the original author is out of the office on vacation, moved to a different team, or moved on to a new company. When this happens, you may not have the full context about how part of the system works when you need to make modifications to it. There may be a specific reason the logic was written a certain way, or the logic may account for an edge case that isnโ€™t apparent when first reading the code.

  • Hidden dependencies. As codebases grow, so does the dependency graph. You may deploy some code to production that you thought you had tested thoroughly, only to find that there is an obscure part of the codebase that also relies on the logic that was changed. Suddenly, you have another team asking you why their code, which they havenโ€™t touched in months, is throwing errors in production. Even worse, there could be bugs in a third-party or open-source library that your codebase relies on. These are often difficult to track down and can be difficult to fix if the maintainers arenโ€™t responsive.

  • Code rot. Also referred to as software rot, code rot describes how behavior or usability of a codebase degrades over time, sometimes even if the code itself has not been modified. The environment the code runs in will change over time, or your customerโ€™s usage patterns may shift. They may request new features that build on top of existing logic, which can introduce new bugs. Even routine software maintenance contributes to code rot. You may need to update third-party libraries to patch security vulnerabilities, only to find the newer version breaks existing logic.

  • Environment differences. As much as weโ€™d like to keep our development environments in sync with our staging and production environments, itโ€™s extremely difficult to match them exactly. Your local environment variables may differ from whatโ€™s running on production, which could lead to code that works locally but breaks on production. Differences in scale between environments can cause entirely new errors to occur in production systems that are hard to reproduce in smaller environments. Operating systems and their packages in higher environments may differ from your development environment. And finally, the hardware itself: while you develop your code on a personal computer, your code runs in production on powerful servers with different CPU and memory characteristics.

  • Third-party dependencies. You may have external dependencies that your product relies on, such as third-party APIs that your code calls out to. When those services have outages of their own, they may affect your system and cause errors on your end. Even cloud hosting providers such as Amazon Web Services or Microsoft Azure go down from time to time and could bring your system down with it.

โ€‹resourcesโ€‹

Murphyโ€™s Law

The above examples are just a few ways in which your code can breakโ€”the list keeps going. Murphyโ€™s Law states that โ€œanything that can go wrong, will go wrong.โ€ As a programmer, itโ€™s your job to identify all the scenarios in which your program can fail, and then to take steps to reduce the likelihood of those scenarios happening. In some cases though, your program will fail in unexpected ways that you never could have imagined, which makes it hard to plan for.

Here are more examples of ways that your system can fail. Remember, sometimes itโ€™s not just the code itself but other pieces of the system that can fail too.

  • Bad logic

  • Youโ€™re reading a preview of an online book. Buy it now for lifetime access to expert knowledge, including future updates.
If you found this post worthwhile, please share!