Unit tests as documentation
thecoder.cafe174 points by thunderbong 9 months ago
174 points by thunderbong 9 months ago
I share this ideal, but also have to gripe that "descriptive test name" is where this falls apart, every single time.
Getting all your teammates to quit giving all their tests names like "testTheThing" is darn near impossible. It's socially painful to be the one constantly nagging people about names, but it really does take constant nagging to keep the quality high. As soon as the nagging stops, someone invariably starts cutting corners on the test names, and after that everyone who isn't a pedantic weenie about these things will start to follow suit.
Which is honestly the sensible, well-adjusted decision. I'm the pedantic weenie on my team, and even I have to agree that I'd rather my team have a frustrating test suite than frustrating social dynamics.
Personally - and this absolutely echoes the article's last point - I've been increasingly moving toward Donald Knuth's literate style of programming. It helps me organize my thoughts even better than TDD does, and it's earned me far more compliments about the readability of my code than a squeaky-clean test suite ever does. So much so that I'm beginning to hold hope that if you can build enough team mass around working that way it might even develop into a stable equilibrium point as people start to see how it really does make the job more enjoyable.
Hah, I swing the other way! If module foo had a function bar then my test is in module test_foo and the test is called test_bar.
Nine times out of ten this is the only test, which is mostly there to ensure the code gets exercised in a sensible way and returns a thing, and ideally to document and enforce the contract of the function.
What I absolutely agree with you on is that being able to describe this contract alongside the function itself is far more preferable. It’s not quite literate programming but tools like Python’s doctest offer a close approximation to interleaving discourse with machine readable implementation:
def double(n: int) -> int:
“””Increase by 100%
>>> double(7)
14
“””
return 2 * n
> If module foo had a function bar then my test is in module test_foo and the test is called test_bar.
Same. This is a good 80/20 in my experience.
Testing the happy paths is already very rewarding.
You know what they say:
Naming things is one of the 2 hardest problems in computer science. The other one being cache invalidation and off by one errors.
Or the async version
There are three hard problems in computer science:
1) Naming things
2) Cachoncurr3)e invalidation
ency
4) Off-by-one errors
Test names should be sentences: https://bitfieldconsulting.com/posts/test-names
100%. Test names should include the word "should" and "when". Then you get a description of the expected behavior.
Not just complete sentences, test names should describe in plain English, with no reference to code or variable names, exactly what's being tested and exactly the expected outcome: "when [something happens], [the result is x]"
Shallow/meh, article. Demonstrates complete lack of familiarity with many popular testing frameworks/approaches, and proposes a subpar solutions for problems that have already been solved in superior ways.
Your test description/documentation should be sentences, but there is absolutely zero reason to try to encode that into the name of your test function. Not to mention this article then suggests using another tool to decode this function name into a proper sentence for reporting... ok now you completely lost the ability to ctrl+f and jump to the function... terrible advice all around.
Why not just use a testing framework that actually supports free-form sentence descriptions/documentation for your tests?
If my unit testing framework supports free-form sentence descriptions, I'll use it. But I won't use that feature as a wedge issue. It doesn't bother me all that much to have test functions with names like `test_transaction_fails_if_insufficient_funds_are_available()`. Other features of the test framework might have a much bigger impact on my developer experience.
Pretty much every major language/framework now supports free-form test names/descriptions, including JUnit, which is referenced in the article ^ (again, highlighting the author's ignorance). Just because something doesn't bother you personally doesn't mean it's a good thing to follow, especially when its clearly inferior to other options.
> It doesn't bother me all that much to have test functions with names like `test_transaction_fails_if_insufficient_funds_are_available()
I mean, that's one example where you have one outcome based on one parameter/state. Expand this to a 2-field outcome based on 3 state conditions/parameters and now you have a 100-character long function name.
You seem to be focusing on the most obtuse possible way of structuring descriptive test names for rhetorical purposes. I don't know if that's intentional or not, but either way it's not terribly convincing? If you name tests according to business domain concepts rather than tying it to individual parameters through some formulaic rubric, it's often possible to come up with test names that are both more concise and easier to understand than the specific way of naming tests that you've been consistently steering the conversation back toward throughout this thread.
> You seem to be focusing on the most obtuse possible way of structuring descriptive test names for rhetorical purposes.
No, I'm focusing on the most realistic and common ways this kind of pattern actually exists (based on my experience).
> If you name tests according to business domain concepts rather than tying it to individual parameters through some formulaic rubric,
While you say I'm focusing on 'the most obtuse possible way...', this kind of comment makes it seem like you haven't focused on any actual way at all. You're speaking in very ambiguous and vague terms, which actually can't be applied and enforced in practice. If you're actually trying to write a suite of unit tests around, say a function, with 3 parameters and multiple possible outcome states - you can't name your function the same for the different combinations of inputs/outputs, and you can't just handwave a 'business domain concepts' name into existence to cover each case - that just turns into an exercise of finding synonyms, abbreviations, and vague generalizations - it doesn't solve the fact that you still need all of the same test cases and they all still need to have unique function names.
You haven't actually thought through what you're proposing here.
When a unit test fails in code I'm working on I don't read the name of the test, I jump to the line in the file for the test and read the code so I never really understood what people find advantageous for this naming convention.
I've worked at companies that required this style naming for tests and it was an unholy mess, and it only works if the unit test is small enough that the name is still a reasonable length which at that point the code should be clear enough to understand what is being tested anyway.
Names, descriptions for tests are useful for many purposes, I'll leave it at that.
The point I'm making (and I think you are agreeing with me) is that trying to stuff a test description into a test function name is cumbersome and pointless. There are far better ways of adding descriptions/documentation for unit tests and pretty much every major language/testing framework supports these, nowadays.
Thanks for the hint about Knuth's literate programming! I hadn't heard about it before but it immediately looks great. (For those of us who hadn't heard about it before either, here is a link: https://en.wikipedia.org/wiki/Literate_programming)
About your other point: I have experienced exactly the same. It just seems impossible to instill the belief into most developers that readable tests lead to faster solving of bugs. And by the way, it makes tests more maintainable as well, just like readable code makes the code more maintainable anywhere else.
I’d rather leave a good comment instead of good test names. I mean do both, but a good comment is better imo. All I really care about is comments anymore. Just leave context, clues, and a general idea of what it’s trying to accomplish.
Four test failures in different systems, each named well, will more quickly and accurately point me to my introduced bug than comments in those systems.
Identifiers matter.
> ...increasingly moving toward Donald Knuth's literate style of programming.
I've been wishing for a long time that the industry would move towards this, but it is tough to get developers to write more than performative documentation that checks an agile sprint box, much less get product owners to allocate time test the documentation (throw someone unfamiliar with the code to do something small with it armed with only its documentation, like code another few necessary tests and document them, and correct the bumps in the consumption of the documentation). Even tougher to move towards the kind of Knuth'ian TeX'ish-quality and -sophistication documentation, which I consider necessary (though perhaps not sufficient) for taming increasing software complexity.
I hoped the kind of deep technical writing at large scales supported by Adobe Framemaker would make its way into open source alternatives like Scribus, but instead we're stuck with Markdown and Mermaid, which have their place but are painful when maintaining content over a long time, sprawling audience roles, and broad scopes. Unfortunate, since LLM's could support a quite rich technical writing and editing delivery sitting on top of a Framemaker-feature'ish document processing system oriented towards supporting literal programming.
> It's socially painful to be the one constantly nagging people about names, but it really does take constant nagging to keep the quality high.
What do test names have to do with quality? If you want to use it as some sort of name/key, just have a comment/annotation/parameter that succinctly defines that, along with any other metadata you want to add in readable English. Many testing frameworks support this. There's exactly zero benefit toTryToFitTheTestDescriptionIntoItsName.
Some languages / test tools don’t enforce testNamesLikesThisThatLookStupidForTestDescriptions, and you can use proper strings, so you can just say meaningful requirements with a readable text, like “extracts task ID from legacy staging URLs”.
It looks, feels, and reads much better.
With jest (Amonsts others), you can nest the statements. I find it really useful to describe what the tests are doing:
describe('The foo service', () => {
describe('When called with an array of strings', () => {
describe('And the bar API is down', () => {
it('pushes the values to a DLQ' () => {
// test here
})
it('logs the error somewhere' () => {
// test here
})
it('Returns a proper error message`, () => {
// test here
})
})
})
})
You could throw all those assertions into one test, but they’re probably cheap enough that performance won’t really take a hit. Even if there is a slight impact, I find the reduced cognitive load of not having to decipher the purpose of 'callbackSpyMock' to be a worthwhile trade-off.The `describe`/`it` nesting pattern is quite common (I currently use it in Jest and HSpec); but it doesn't solve the social problem. It's common to see tests like:
describe("foo", () => {
describe("called with true", () => {
it("returns 1", () => {
assert(foo(someComplicatedThing, true) === 1)
})
})
describe("called with false", () => {
it("returns 12", () => {
assert(foo(someOtherIndecipherableThing, false) === 12)
})
})
})
It's the same problem as comments that repeat what the code says, rather than what it means, why it's being done that way, etc. It's more annoying in tests, since useless comments can just be deleted, whilst changing those tests would require discovering better names (i.e. investigating what it means, why it's being done that way, etc.). The latter is especially annoying when a new change causes such tests to fail.Tests with such names are essentially specifying the function's behaviour as "exactly what it did when first written", which is ignoring (a) that the code may have bugs and (b) that most codebases are in flux, as new features get added, things get refactored, etc. They elevate implementation details to the level of specification, which hinders progress and improvement.
At the end of the day, someone has to shoulder the burden of holding their colleagues to higher standards. I don’t think there’s a technical solution to this social problem.
It could also be a symptom of something else, like I’ve seen this happen when someone goes overboard on unit tests and they become so burdensome that other engineers just want to get it out of the way. They may not consciously realize it, but subconsciously they know that it’s BS and so they don’t mind BS names to just move on with actual productive work.
Not saying it’s always the case, but it could be. Higher standards are not always better, they have diminishing returns.
It's all a spectrum of trade-offs with different people having different opinions.
There could be some sort of formula to explain this better to determine how much effort to spend on tests vs features and product quality and importance of quality compared to that.
This is part of the job of being a team lead or manager. You have a standard, you need to get people to follow it (or consequences..)
Yeah, it doesn't solve the problem of low quality code/laziness, but it's a better tool/approach for documenting your tests than encoding the description/documentation into it's name.
Encoding such information into the name makes about as much sense as encoding constraints into SQL column names.
Yup, I've not actually seen any tool that enforces these kinds of test names. But yeah, trying to encode test description/documentation into it's name is like one of the worst common ways of documenting your tests.
That's not the point of the article. The code should be readable no exception. The only reason we should be ysing x y z are for coordinates ; i should be left for index_what ; same goes for parameters ; they should also contain what unit they are on (not scale, but scale_float) only exception I see are typed languages ; and even then I'm occasionally asked a detail about some obscure parameter that we set up a year ago. I understand it can sound goofy, but the extra effort is made towards other people working on the project, or future self. There is no way I can remember keys or where I left the meaning of those, and there is no justification to just write it down.
Readability of the code makes a lot of it's quality. A working code that is not maintainable will be refactored. A non working cofe that is maintainable will be fixed.
I'm obviously replying to GP's specific comment on test names. I fail to see how your reply relates to my comment at all.
It's funny, you are asking what test names have to do with quality, and you proceed with mentioning a really bad test name, 'toTryToFitTheTestDescriptionIntoItsName', and (correctly) stating that this has zero benefit.
Just like normal code, test methods should indicate what they are doing. This will help you colleague when he's trying to fix the failing test when you're not around. There are other ways of doing that of course which can be fine as well, such as describing the test case with some kind of meta data that the test framework supports.
But the problem that OP is talking about, is that many developers simply don't see the point of putting much effort into making tests readable. They won't give tests a readable name, they won't give it a readable description in metadata either.
> It's funny, you are asking what test names have to do with quality, and you proceed with mentioning a really bad test name, 'toTryToFitTheTestDescriptionIntoItsName', and (correctly) stating that this has zero benefit.
Not at all. Those kinds of names are like a de-facto standard for the people that try to push this kind of practice. Obviously the example I used is not related to any real test.
> This will help you colleague when he's trying to fix the failing test when you're not around.
Really? Encoding what a test function does in it's name is your recommendation for helping someone understand what the code is doing? There are far better ways of accomplishing this, especially when it comes to tests.
> There are other ways of doing that of course which can be fine as well
'Can be fine as well'? More like 'far superior in every possible way'.
> But the problem that OP is talking about, is that many developers simply don't see the point of putting much effort into making tests readable.
Not at all, making a test readable and trying to encode what it does into it's name are completely separate things.
Kotlin has an interesting approach to solving this. You can name functions using backticks, and in those backticks you can put basically anything.
So it's common to see unit tests like
@Test
fun `this tests something very complicated`() {
...
}
You can do that in Java as well. Can't remember if it's exactly the same syntax
I don't think you can do it in Java specifically. But once upon a time it was rather popular to write test fixtures for Java code in Groovy, which does let you do it.
You can't put spaces in the function name, but you can set a display name for JUnit - https://junit.org/junit5/docs/5.0.3/api/org/junit/jupiter/ap...
It's important to this article because its claiming that the name is coupled functionally to what the code tests -- that the test will fail if the name is wrong.
I don't know if any test tools that work like that though.
That's not what the article claims at all.
It claims that, in order for tests to serve as documentation, they must follow a set of best practices, one of which is descriptive test names. It says nothing about failing tests when the name of the test doesn't match the actual test case.
Note I'm not saying whether I consider this to be good advice; I'm merely clarifying what the article states.
What kinds of things would you say are best as annotation vs in the test method name? Would you mind giving a few examples?
Also, are you a fan of nesting test classes? Any opinions? Eg:
Class fibrulatatorTest {
Class highVoltages{
Void tooMuchWillNoOp() {}
Void maxVoltage() {}
}
}Table tests can enable useful test naming without a bunch of clunky named test functions. I use them most often in Go but I’m sure other languages have support
Like others have already stated/provided examples of[0] - the test function names are generally irrelevant. Many testing frameworks use a single/same test function name, or a completely unnamed function/lambda, while providing any needed context/documentation as params or annotations.
Realistically, many unit tests are far more complicated (in terms of business logic) than functions where names actually matter, like 'remove()', 'sort()', 'createCustomer()', etc. I've worked in several places where people aggressively pushed the 'encode test description in test name' BS, which invariably always leads to names like 'testThatCreatingACustomerFromSanctionedCountryFailsWithErrorX'. It's completely absurd.
> Also, are you a fan of nesting test classes? Any opinions?
It really depends on the framework you're using, but in general nesting of tests is a good thing, and helps with organizing your tests.
> Like others have already stated/provided examples of[0] - the test function names are generally irrelevant. Many testing frameworks use a single/same test function name, or a completely unnamed function/lambda, while providing any needed context/documentation as params or annotations.
I think what you're focusing on is just syntax sugar. Those examples with the 'describe'/'it' pattern are just another way to provide names to test cases, and their goal is exactly the same. If you didn't have this syntactic support, you'd write the function names representing this.
It's exactly the same thing: documenting the test case in the code (so not a separate document), with its name.
The distinction between "comment" and "function name" becomes less relevant once one realizes a function's name is just another comment.
> I think what you're focusing on is just syntax sugar. Those examples with the 'describe'/'it' pattern are just another way to provide names to test cases, and their goal is exactly the same.
The goal may be the same/similar, but one of the approaches is clearly superior to the other for multiple reasons (as stated by me and other many times in this comment tree). Also, I don't think you quite understand what 'syntactic sugar' means.
> If you didn't have this syntactic support, you'd write the function names representing this.
It's not any kind of 'syntactic support'. Pretty much every modern language/testing framework supports adding free-form test descriptions and names through various means.
> It's exactly the same thing: documenting the test case in the code (so not a separate document), with its name.
It's very clearly not the same at all lmao. And a test name, test description, other useful test documentation/metadata are also not the same.
> The distinction between "comment" and "function name" becomes less relevant once one realizes a function's name is just another comment.
Huge differences between a function name, a comment, and an annotation. HUGE. Read the other comments in this thread to understand why. If you actually worked in an environment where stuffing a test description into a test name is the preferred approach for a non-trivial amount of time, you'd know that once you get past a certain level of complexity your test names explode to 100+ character monsters, if only to differentiate them from the other tests, testing a different combination of states/inputs and outputs, etc.
Sorry, I thought you were debating in good faith. I now see the tone of your responses to everyone here.
Good luck with that!
Good faith debating is when you actually try to honestly consider and understand the other side's point. Not when you make dismissive blanket (and factually incorrect) statements while refusing to rationally engage with the counter-arguments.
Tone is generally completely independent of good faith.
Go heal that ego and try again.
Hate jumping in, though both of you...
The uninviting tone discourages further discussion. I really appreciated where this convo was going until..
> Read the other comments in this thread to understand why.
This could be restated in a non aggressive way. Eg: 'Other comments go into more details why'
> If you actually worked in an environment
Presumptive statements like this are unhelpful. We need to remember we do not know the context and experiences of others. Usually better to root discussion from ones own experience and not to presume the others perspective.