AI for Unit Testing

I work in an environment where people seek opportunities for using AI such as GitHub Copilot to perform daily development tasks.

While there is a general agreement that such tools can accelerate the development process, there is still some uncertainty about how they achieve this. Some possible explanations include quick refactoring options, automatic unit test generation, CI/CD YAML file generation, and assistance with integration testing. The last point refers to UI testing or end-to-end testing, while the second point narrows the focus to unit testing specifically. Although AI can help with most of these points, unit testing seems like an anti-pattern when looking into the details.

My Unit Testing process

Writing unit tests is a well-understood and daily practiced process. In fact, I have seen multiple projects accumulating tens of thousands of tests over their lifetime and developers create unit tests daily. There are numerous generic posts detailing the testing best practices or focusing on special aspects such as Performance testing within Unit tests bad idea.

In Line-of-Business, LOB applications the most common practice I have observed among developers is modifying the application code and then covering the changes with unit tests. Although I still follow this approach in rare cases, I have found that unit testing is still an active exercise. While writing tests, I actively discover new test cases and often follow the Devil's advocate technique to ensure that all relevant cases are covered. There are numerous times I find issues with the initial implementation or edge-cases that the initial implementation failed to properly address.

Replacing this active process with AI generated test cases takes away the self-validation aspect of the process. Although AI can generate unit tests, it only sees the unit being tested and does not have the context for the developer's intent. As such, test coverage metrics will be impressive, but it will test a unit that might not be implemented according to the intent or requirements.

Another argument that I often hear is that AI helps generate boilerplate code for tests. However, this argument inherently captures a key issue with the unit being tested: if a developer already complains about the amount of boilerplate required for testing, then it might be an indication that the code under test is too complex or not modular enough.

AI with TDD

Test Driven Development (TDD) has a long history, but in practice, I have rarely observed anyone applying the technique for line-of-business (LOB) applications. However, I have started using a variation of it more frequently over the past few years. The idea is to let the tests drive the implementation. Writing the tests first puts the developer on a red/green/refactor cycle. When a new test is created, the current functionality as implemented still fails the test. Then, a developer may implement the functionality to pass all tests. Finally, a developer is free to refactor the implementation as the existing tests will catch any errors introduced during the refactoring process.

Using AI-generated code seems to fit much better with this process:

  1. A developer writes new tests (red).

  2. AI generates an initial implementation of the functionality so that the tests pass (green).

  3. A developer is free to refactor the generated code as long as all tests still pass (refactor).

One additional argument that one can often hear is that AI-generated code usually still needs to be reviewed or corrected. This fits well with the above model as one of the steps is about refactoring the generated code.

Unit

A careful reader might have noticed the emphasis on the word unit in this article. A unit refers to a set of coherent code. Unit tests apply key behaviors of these tests, such as executing within the millisecond time range. It does not say that a unit has to be a method, function, or class. This means that creating new methods, functions, or classes during the refactoring is a natural part of the process. The new methods, classes, etc., do not require additional tests since they are already tested by the overall unit tests that are present prior to the refactoring.

Conclusion

Generative AI can help with writing (integration) tests and generating CI/CD pipelines. However, looking at it from the wrong angle diminishes the crucial aspect of active thinking in the unit testing exercise. Alternatively, one can use TDD, where integrating/refactoring the generated code fits perfectly with the process.