Sunday, 27 May 2012

TDD - the power of learning and negation in analysis

This post is about how TDD informs analysis and stimulates learning. We'll go through a simple example and investigate rationale and mechanics of decisions taken.

The boards starts empty

Let's imagine we're writing a virtual Kanban-like board (this is also called a 'scrum board' AFAIK). If you don't know what a scrum board is: the board is used for task tracking and is split into vertical columns that represent task progress (like NOT STARTED, IN PROGRESS, DONE). It's also split into horizontal categories that represent bigger units of work (often user stories) that the tasks belong to. We usually add a task to the story and watch it pass through the vertical columns until we reach the DONE state.You can read more on scrum boards if you're unfamiliar with the topic, however, what I've written here is enough knowledge to push forward with our example.

There is only one issue: how to start? We could do proper requirements analysis or commonality-variability analysis or something along the way, but this time, let's just start by writing a spec (AKA unit test). What's the simplest behavior we can come up with? Let's see... it would be:

The board should start off as empty.

This is written in plain English (we'll investigate another notation in a second), but that's enough to get us started. Using the above statement, we can put together the first spec:

public void ShouldStartOffClean ()
  var board = new ScrumBoard ();
  Assert.IsTrue (board.IsEmpty);

The first time I wrote this spec, it seemed (wrongly, but that's coming in a minute :-)) silly to me. Why? Because:

  1. It didn't let me discover any new collaboration and any new abstraction other than ScrumBoard, which I already knew I was going to need.
  2. It verifies a public flag is set on the same object that is specified. It looks like cheating - we consider X did something when it tells us so
  3. It's only a flag, nothing more. No domain logic is needed to implement it
  4. I can't imagine this flag used anywhere in the application. Even when I write a GUI that displays the board, it's unlikely that it's gonna need to check whether the board is empty. In Scrum board domain, empty board is not a special condition of any kind and does not trigger any special behaviors or conditions.

So, did we just waste our time here?

Negative learning

Let's hold our horses for a minute before deleting this spec and try to write this spec in a Given-When-Then form:

GIVEN nothing

WHEN a new scrum board is created

THEN it should be empty

The Given-When-Then form has this nice capability that you can discover many things by simply negating what you already have (by the way, this is not the only way to proceed, but this is the one I want to show you). We can be smart about it, but for now, I suggest we take the brute force approach - we'll go through each part of the behavior description and try to negate everything we can. We expect to run into some VERY obvious conclusions as well as find some jewels.

Assumptions: Given nothing

Here, we can make only one negation in the following form:

GIVEN NOT nothing

What is "not nothing"? It's "something" :-). What can we learn from it? Not much. In fact, most of the app logic we're going to write will rely on assumption that "there is something". Nothing valuable, let's move on

Trigger: When a new scrum board is created

Let's start with negating the second part (the order doesn't matter, it's just that this part is more obvious):

WHEN a new scrum board is NOT created

What does it mean that the scrum board is not created? It's the central abstraction of our domain, so what's the point in not creating it? This may lead to a question: Is a scrum board the right abstraction? Maybe it would be more valuable to look at it as "a process", not "a place"? I such case, a better abstraction would be "sprint" (which is represented by the board on the GUI level), not board. Let's assume we decided to leave the "board" abstraction (although this thought is so valuable, that I'd probably add it to my TODO list to be able to revise my decision as the development goes on).

A nice thought from such a stupid negation, don't you think? Ok, let's dig further by negating the first part:

WHEN a NOT new scrum board is created

What does it mean "NOT a new board"? I means "an old board". Wait a minute, that's INTERESTING! Let's ask further question: what does it mean to "create an old board"? It rings a bell - it's about "loading an old board". Wait a minute! This is a perfectly valid use case, because no user would like the board to disappear when they turn off the app! As this is not the core functionality, while not letting our self esteem and enthusiasm about this discovery eat us, we add it to the TODO list for later and proceed with the next negation.

Outcome: Then it should be empty

Again, there is only one negation we can perform:

THEN it should NOT be empty

We thought knowing that the board is empty doesn't help the app at all. But what about us? Does it help us learn anything? Let's check it out by asking further question: when is the board not empty? The answer: when there is at least one task added. Yes, we discovered the next use case - adding a task. This looks like the next core functionality element we need!

Let's document this discovery with a Given-When-Then form:

GIVEN an empty board

WHEN a new task is added to it

THEN it should not be empty

And an executable specification:

public void ShouldNotBeCleanWhenTaskIsAdded()
  var board = new ScrumBoard();
  var task = Any.InstanceOf<Task>();
  Assert.IsFalse (board.IsEmpty);

Now THAT'S a discovery! Let's quickly recap what we've learned here: we've learned that there is an abstraction called "Task" that can be added to the board by invoking the "AddNew()" method (which needs to be implemented: another item for the TODO list!). We've also got another Given-When-Then behavior description that we can analyze the same way as the first one. Shhh... I'll tell you a secret: until now we've been proceeding asking question "What if not?". Another useful question to ask is "What else?", and we can apply it to the WHEN part by asking "What else happens when we add a task?". But that's another story.


What I tried to show you is how TDD can be often used to inform analysis. I've also tried to show how potentially pointless specs can help us learn more on our domain or ask the right questions about the domain. I'm not trying to say that TDD is a substitute for other analysis methods, such as commonality-variability analysis, but it sure adds much to your current toolset.

Ok, that's it, time for a walk :-). Bye!

Friday, 25 May 2012

MonoDevelop 3.0 - looks like it's going to rock!

Those of us who use Linux-based operating systems, have certainly heard about the default (and probably the best to run on our systems) IDE for Mono - MonoDevelop.

I just installed the recently released Monodevelop 3.0 and, among its features, there are two I'd like to highlight.

Semantic syntax highlighting

Don't know about you, but to my taste, Monodevelop's ability to highlight the syntax was kind of poor. Up to now. MonoDevelop 3.0 can do some semantic analysis, e.g. putting unknown types in red

This is great news for us TDDers who are doing Need Driven Design. We use types that don't yet exist VERY often in our executable specifications AKA unit tests - this way we discover new abstractions and collaboration models. We tend to create new classes and methods AFTER we use them, so it's great to distinguish the ones we already have from the ones we have just discovered.

And to create those types and methods, we can certainly use...

Better refactoring support

Once we have discovered all the types and methods we need, we create them using IDE refactorings. I'm delighted to discover that MonoDevelop's Create Class and Create Method - basic tools for every TDDer - work more reliable in 3.0 than in previous versions (I have encountered versions of MonoDevelop that didn't have these refactorings at all!).

Which are also available as quick fixes:

That's it, I just wanted to share this wonderful news with you, at the same time explaining the basic mechanics for ultraproductive use of Need Driven Design under the hood. Have fun!

How do I Test-Drive a looped execution?

There are times when we want to specify something happening to multiple items. There is a pattern I'd like to share about specifying such execution. The only precondition is that the loop behavior is just a variant of single "thing" happening to each of the multiple items. Let me clarify this with a little example

Thumbs up: the problem

Let's suppose we're uploading multiple images at once into a social network portal.

Each of the images is processed in the following way:

  • Size of the file is verified and if the verification passes:
    1. A thumbnail is made of the picture
    2. The picture is added to the album along with a thumbnail

If I was to specify the application of this processing to a batch of images in one spec, it would look like this (let's use NSubstitute as a mocking framework, since it's very readable even for guys working with other programming languages than C#):

public void 
  var anyImage1 = Substitute.For<Image>();
  var anyImage2 = Substitute.For<Image>();
  var anyImage3 = Substitute.For<Image>();
  var images = new List<Image> { anyImage1, anyImage2, anyImage3 };
  var anyThumbnail1 = Substitute.For<Thumbnail>();
  var anyThumbnail2 = Substitute.For<Thumbnail>();
  var anyThumbnail3 = Substitute.For<Thumbnail>();

  var sizeVerification = Substitute.For<SizeVerification>();

  var album = Substitute.For<Album>();
  var imageUpload = new ImageUpload(sizeVerification);

  imageUpload.PerformFor(images, album);
  album.Received().Add(anyImage1, anyThumbnail1);
  album.Received().Add(anyImage2, anyThumbnail2);
  album.Received().Add(anyImage3, anyThumbnail3);

There is something highly disturbing about this spec. It clearly points that the concept of the processing logic is mixed up with the concept of looping through a collection (this points out that the method is not cohesive, by the way). This gets even most evident when you try to specify a case when size verification does not pass. How would you write such a spec? Make the verification fail for all the items? For one out of three? What about the special case of first one and the last one? How many of these cases is enough to drive the implementation?

The proposed solution

How do I handle such cases? I break apart looping and processing logic by test-driving two public methods - one handling the collection and another handling only one item. The multi-element version of the method is only specified for how it uses the single-element version and the single-element version is specified for the concrete processing logic.

To make this happen, we need to use partial mocks. A Partial Mock is a variant of mock object that allows you to fake only chosen methods from concrete type, leaving the behavior of all other methods as in the original object. NSubstitute does not support partial mocks as of yet, but let's pretend it does by the means of PartialSubstitute class. Here's how our first spec would look like:

public void ShouldBePerformedForEachElementOfPassedBatch()
  var anyImage1 = Any.InstanceOf<Image>();
  var anyImage2 = Any.InstanceOf<Image>();
  var anyImage3 = Any.InstanceOf<Image>();
  var images = new List<Image> { anyImage1, anyImage2, anyImage3 };
  var album = Substitute.For<Album>();
  var imageUpload = PartialSubstitute.For<ImageUpload>(

  //mock only the single-element version:
  imageUpload.PerformFor(Arg.Any<Image>(), Arg.Any<Album>()).Overwrite();

  //invoke the multiple-elements version:
  imageUpload.PerformFor(images, album);
  imageUpload.Received().PerformFor(anyImage1, album);
  imageUpload.Received().PerformFor(anyImage2, album);
  imageUpload.Received().PerformFor(anyImage3, album);

This way we specify looping only, i.e. how the collection of images is handled related to single image handling. Note how the partial mock is verified only for calls made to single-element version by the multi-element version. To make this possible, the single-element version must be virtual:

public void PerformFor(List<Image> images, Album album) {}
public virtual void PerformFor(Image image, Album album) {}

Now that we got rid of the sequence processing, we can proceed with the image processing:

public void ShouldAddImageToAlbumWithThumbnail()
  var anyImage = Substitute.For<Image>();
  var anyThumbnail = Substitute.For<Thumbnail>();

  var sizeVerification = Substitute.For<SizeVerification>();

  var album = Substitute.For<Album>();
  var imageUpload = new ImageUpload(sizeVerification);

  imageUpload.PerformFor(anyImage, album);
  album.Received().Add(anyImage, anyThumbnail);

Here, we're dealing with one image only, so the spec is easy to follow and straightforward. Note that if we want to add a spec for image that fails size verification, we add it only for the single-element version. The looping logic is in both cases the same and we've got it specified already.

Another doubt that may come to your mind is this: we're exposing another method in the interface (the single-element version) that's not really used by the clients of the class, so aren't we violating the encapsulation or something? Well, in my opinion, not really. I mean, no one ever forbids you to call the already existing multi-element version with a list consisting of one element, right? So, you may treat this additional method as an alias for this particular case. This way you're not exposing any additional implementation detail that wasn't exposed before.

Ok, that't it! Have a good night!

Wednesday, 23 May 2012

How To properly implement "Any value except X" in C#?

This time it's going to be dead simple. I promise.

Prior to reading this post, take a look at this post by Mark Seemann, to understant the concept of anonymous values.

Sometimes, when writing specifications/unit tests, we want to say that we can use any enumeration value except some value "x". Let's take an example of reporting feature access denied for non-admin user:

var reportingFeature = new ReportingFeature();
var nonAdmin = Any.Except(Users.Admin);


As you can see, here, we say "Any user except Admin". So, how to implement such a facility? The first, naive implementation (do not use it!) goes would be something along these lines:

public static T Besides<T>(T excludedValue)
  Random random = new Random();
  var values = Enum.GetValues(typeof(T));
  T val = default(T);
    var index = random.Next(0, values.Length);
    val = (T) values.GetValue(index);
  } while(val.Equals(excludedValue));
  return val;

In plain English, we take random value until we get a value that is different than the passed one. However, this solution usually performs at least two iterations before reaching the target, especially when the enumeration consists of only two possible values. Theoretically, it may do much more. This is a better solution:

public static T Besides<T>(T excludedValue)
  var genericValues = Enum.GetValues(typeof(T));
  var values = new List<T>();
  foreach(var v in genericValues)
  var index = random.Next(0, values.Count);
  return values[index];

Here, we put all the possible values into a list and remove the one we don't want. Then we just take the random value from the list - any is fine. The foreach loop can be easily changed to Linq expression ( and the method can be easily extended to support many parameters (via params keyword), so you could write something like this:

var privilegedUser = Any.Except(Users.Intern, Users.ProbationUser);

See? I told you it was going to be dead simple this time :-)

Sunday, 20 May 2012

Perfect from the start?

Thanks go to Kuba Miara for inspiration

This is a follow up post to the TDD is a good teacher - both demanding and supportive and discusses the second argument from my recent debate.

Everything or nothing.

This argument states that you have to be "perfect from the start" or you die. Either you take everything or nothing. It's impossible for someone to take just a part of the benefits TDD has to offer.

Here's my opinion on the topic: what executable specifications (AKA unit tests) bring you is never for free. Unit tests are additional code and like every code, it must be added, removed and changed to reach its target. Moreover, judging from static code analysis of a piece of code I did once, the unit tests code has often the highest coupling, so it's likely to change often. It's just that doing "full" TDD gets you the most. Below is the summary of different levels of TDD adoption (how I describe "progress towards full TDD" is of course arbitrary, so bear with me :-)), together with some rating based solely on my experience. By the way, when I say "isolation", I mean breaking dependencies between tested object and its collaborators, not between tests.

1. No unit tests at all

Many great projects start as toys or experiments. We do them to learn something, to get play around with an idea or quickly to check the possibility of revenues (Kent Beck talked once about doing just this.).

It's quite fun while the design is pretty obvious and verifying the app requires just a few manual steps (like: run the app, click add and see a correct dialog open) - we can just "code and ship it".

Once we get into a situation where the workflow gets less obvious or the logic is hard to verify (e.g. the app sends something via a socket - to check it manually, we'd have to write a test client or plug in a packet sniffer), this gets kind of ugly. The we usually launch a debugger, where we spend a lot of time tracking issues and correcting them (and tracking issues introduced by those corrections) - first using manual scenarios, later black-box level tests that at some point get created.

Also, there are times, when we get stuck wondering "is it better to use inheritance here, or a tricky delegation? Should this method end up in class A or class B?". Sure, there are heuristics to evaluate design decisions, but it's still kind of arbitrary and ends with lengthy discussions and explaining the rationale all over again to each and every person questioning our design.

Build speed          
No need for debugging          
Executable Specification          
Measurable design quality          
Protection from regression issues          
Confidence of change          
Ease of writing new test          
Ease of maintaining existing tests          
Speed of transition from change in code to successful build          
Motivation to specify all code          

2. Some poorly-isolated "unit" tests (or rather integration tests, as they're sometimes called)

This level is usually attained when some of the rebellious developers are angry with both time for setting up the environment necessary to perform box level tests and the time of execution of such tests. The idea is to write some of the scenarios dependent directly on entities in the code, to bypass mechanisms such as web or database, still exercising the domain logic. This makes sense when the domain logic is complex of its own.

While shortening the time of setting up and running the tests, this approach makes the build slightly longer (additional code) and requires performing some isolation, at least from external dependencies (system clock, database, file system, network etc.), which, given a small amount of tests written this way, can be relatively cumbersome.

On the bright side, having those tests let's us reason in a limited way about the design of the product, by asking questions like "is it easy to plug out the real database and substitute it for a fake?".

Build speed          
No need for debugging          
Executable Specification          
Measurable design quality          
Protection from regression issues          
Confidence of change          
Ease of writing new test          
Ease of maintaining existing tests          
Speed of transition from change in code to successful build          
Motivation to specify all code          

3. Many poorly-isolated (coarse-grained) "unit" tests

This is what many projects end with. We have many tests exercising most of the scenarios in the code. It is safe to refactor, it is easier to see some intent documented by the tests. Because the tests are so many, the relative cost of writing helper fixtures and mini-frameworks to help us is relatively small.

This, however, is the level where the point of having the tests is most fiercely discussed. Because of the low isolation, many tests go through the same paths of the code. This leads to a situation where one change in the code leads to tens of tests breaking from reasons not known outright. Since the tests are usually long and use many helper classes and functions, we have often to debug each test to discover its reason for failure. I've been in a project where updating such test suite used to take twice as long compared to updating the code.

One more thing - the build slows down. Why? Because coarse-grained tests gather many dependencies. Let's say that we have 40 classes in our production code and each of our 20 tests is coupled to 30 of them - any change to any of those 30 classes makes all 20 tests recompile. This can cause a major headache, especially in slow-building languages such as C++.

Build speed          
No need for debugging          
Executable Specification          
Measurable design quality          
Protection from regression issues          
Confidence of change          
Ease of writing new test          
Ease of maintaining existing tests          
Speed of transition from change in code to successful build          
Motivation to specify all code          

4. Many well isolated unit tests (first real unit tests)

Someone joins our team who really knows what unit tests are about. Usually this is a person practicing TDD, but is unsuccessful to convince the whole team to use it. Anyway, the person teaches us how to use mock objects and usually introduces some kind of mocking framework. Also, we learn about dependency injection and something about FIRST or FICC properties. With isolation rising, the test suite becomes increasingly maintainable - a situation when two tests fail for the same reason is quite rare, so we have less tests to correct and less debugging to do (because fine-grained test failure brings us to the failure reason without a need to debug).

Everything starts to go more smoothly - the builds speed up, the documentation of intent does not double the one provided by box tests already, new tests are easy to write (since they usually test a single class - no supah long setup and teardown etc.), we can easily reason about the design quality with some heuristics and a simple rule: from two design choices, the best is the one that requires fewer good unit tests.

On the darker side: the team still didn't get the whole idea of test suite being an executable specification and is still stuck on the "testing" level. Because of this, they can discard writing unit tests for some of the implementation as "not worth testing", or "too simple". Also, since the tests are written after the code, this is usually not done with testability in mind, so each time a new part of code is written, it requires extra rework to be able to write good, isolated unit tests for. The last downside: the team does not benefit from all the analysis and design techniques that TDD provides.

Build speed          
No need for debugging          
Executable Specification          
Measurable design quality          
Protection from regression issues          
Confidence of change          
Ease of writing new test          
Ease of maintaining existing tests          
Speed of transition from change in code to successful build          
Motivation to specify all code          

5. Test Driven Development (with all the "bundled" tools)

This is the sweetest spot. The "testing" part is treated as a side effect of performing analysis and design with all the powerful techniques TDD brings. The code quality is superb (my experience is that code written using good TDD has often the best quality analysis results of all the code in the product). Code is written with testability in mind, so we don't pointlessly waste time reworking code we have already written just to enable isolation (we DO refactor sometimes, especially when we're doing triangulations, but this is not a waste - this is learning).

In TDD, a failing test/spec is the reason to write any code, so even constants and enumerations have their own specs. There is always motivation to write a test/spec, because there's no "too easy implementation" - there is always "an implementation that does not exist yet".

People coming to this level (especially in languages with extensive IDE support, like Java or C#) quickly become super-productive killers, who quickly notice gaps in requirements, write beatifully designed code, at the same time producing: test suite for regression, an executable documentation of responsibilities and design decisions and a set of examples on how to use each class.

Build speed          
No need for debugging          
Executable Specification          
Measurable design quality          
Protection from regression issues          
Confidence of change          
Ease of writing new test          
Ease of maintaining existing tests          
Speed of transition from change in code to successful build          
Motivation to specify all code          

Wrapping it up.

TDD is the most effective of all processes described above. It does not require perfection from the start, however, if you choose to forgo some of the practices and techniques, you obviously have to pay the price. What I try to teach people when I talk about TDD is that it is unnecessary to pay this price and by doing this, we introduce a waste. Which is what we'd like to avoid, right?

Good night everyone!

Friday, 18 May 2012

We are in desperate need of rapid feedback, whether we're aware of it or not

Today, I was adding some code to a codebase that had no unit tests (nor did I want to write any), which I haven't done for a long time, when I caught myself taking shortcuts and hacking bastardly on the code, making it less maintainable. How did it happen?

Me, who always strived (more or less successfully) for clean code and test driven development, ended up hacking all over the place.

After having a tea, I figured out what happened. I desperately needed rapid feedback. Taking shortcuts and writing bad code shortened my path to production environment where I was verifying whether the change was correctly made. "Just put this hack over there and back to debugging". What's more, it reminded me the times before I got into TDD - I was doing the same then. So it's not TDD that raised my need for this feedback - it has always been there. It's just that I have become aware of it.

And I think this is often the reason why knowledgeable engineers write crappy code. For them, this is a sacrifice on the altar of rapid feedback - if only they knew that by doing TDD, they would not have to sacrifice anything...

It's funny how TDD is accused of being unnatural ("what? Test First? That's insane!") while at the same time satisfying our most natural need.

Good night, everyone!

Thursday, 17 May 2012

TDD is a good teacher - both demanding and supportive

Thanks go to Kuba Miara for inspiration.

Last time I was debating my colleague on pros and cons of TDD, I made a mistake. I stressed too much that TDD is good, because it quickly shows you when the design is wrong and punishes you for not writing clean, properly encapsulated code. I did not talk so much about the support TDD provides, just about the punishment. By doing this, I got two counterarguments. Both of these are interesting and worth discussing. Today, I'd like to tackle the first one.

Teaching by punishment is too harsh.

The argument was like this: "If TDD is so demanding, it will be hard for people to satisfy these demands. There are better and worse developers. Some people are having hard times writing good code, now they would need to know how to do it plus how to write good unit tests. Since the latter is a challenge on its own, they're just gonna pay double.". This is a vision of TDD as a "strict teacher", that only makes demands and punishes you severely, when you're unable to make it. This is also a vision of a "bad teacher" that ONLY makes demands and it's the only kind of help it can provide.

This, however, is not true. TDD is a good teacher. How do you recognize a good teacher? Such teacher would:

  1. tell you when you're getting things wrong.
  2. support you in getting things right.

In TDD, the first is achieved by using various heuristics when looking at unit tests. If you're doing things wrong, your unit tests will tell you this (Amir and Scott have a nice summary in two parts of these heuristics). Now I'd like to concentrate more on the "supportive" stuff that I kind of missed during the original discussion - there are various ways TDD can support you in doing the right thing(TM). Among these things are:

Need Driven Design
helps in discovering new classes, methods and collaboration models as well as gaining high encapsulation and outside-in insight into your code (crafting method signatures from the view of their concrete usage, not from the top of your head), at the same time providing you a way to be more "lean" and write only the code you really need. I will do a separate post on Need Driven Design soon.
Test First
helping in many thing, but this time I'd like to stress that it lets you create classes with high testability by default - since you're writing the test before the code, you design the production code with testability in mind and it brings with itself stronger cohesion and encapsulation plus looser coupling.
Given When Then Analysis
helps you to think about the code in terms of behaviors, discover holes in requirements, prioritize the implementation and enables deliberate discovery of new behaviors the system must support.
TODO list
helping you in keeping track of the things left to specify, refactor and implement, so that you can know when you're done. Additionally, TODO list helps you in keeping focus on single behavior at once ("hmm, here, I'm passing two strings, but what if the first one is null? I'll just add it to TODO list and get to it after I specify current behavior"). Kent Beck's Test Driven Development By Example provides some good examples on how to work with TODO lists.
useful when driving implementation of a piece of code we don't have an idea how to implement. By specifying example after example of how we expect a single behavior to work from the outside and refactoring each time into something more general, we gradually gain understanding of the problem and drive towards the right implementation.

Each of these tools that come "bundled" with TDD has its own set of best practices that can make even a mediocre developer write good code quickly. There is only one requirement: motivation.

Ok, that's it for now, in the next post, I will try to discuss the second part of the argument. See ya!

Tuesday, 15 May 2012

Compiling... Linking... Unit Testing...

Today, I'd like to tell you two stories, draw a conclusion and then go on with the third story which I want to leave you with to think about.

Story 1: How unit test failures can look like compile time failures

Let me offer this as an introduction: ever heard of Continuous Testing for Visual Studio 2010? This is a nice little Visual Studio extension that runs all unit tests after each build automatically. No big deal, you can say, since adding a post-build action to a Visual Studio project will give you the same. This, however, is not the feature that made me think (and some nice heuristics did not as well). It does another thing that brought my attention.

It adds failed unit tests to Visual Studio compilation errors dialog.

So, each time a unit test fails, its results are displayed as you would expect compile time errors to appear. Only this time, instead of "Wrong syntax xyz", the message is "Assert.IsTrue(xyz) failed".

Story 2: How compile time checking becomes less important with unit tests around

Remember the post Bruce Eckel made somewhere in 2003 called Strong typing vs. strong testing? Bruce made a bold statement then, claiming that he doesn't need compile-time type checking when he's got a comprehensive suite of unit tests to back him up. This way he could move from Java to python without worrying too much about the type checking.

The conclusion

What's the conclusion? Let's paraphrase these two stories a little, then come up with the third story. The first story is about how unit testing output was "added" to the compiler toolchain. The second one is about how running unit tests replaced some of the program compilation benefits.

This begs the question: is there really such a sharp boundary between what we know as "compilation" and what we know as "testing"?

I wrote a compiler once. If we look at compilation phases, there are different kinds of checks involved: syntactic analysis, semantic analysis etc. What if we add another compilation step called "behavioral analysis" which consists of running your suite of executable specifications (AKA unit tests)? Sure, this kind of analysis is up to you more than any other, since you define the criteria for failure and success, but when it comes to running, it's just as any other compilation phase. As I mentioned, there are even tools that will add the result of this "behavioral analysis" in the same place as syntax errors (like misplacing a coma) or semantic errors (like trying to use a type that does not exist), so that it is indistinguishable. And, in my opinion, this is how running unit tests should be treated - as a part of build process. When unit tests are disabled, you disable one kind of analysis that is run on your code. Do you care? Well, when developing a program, if you could disable type checking in, let's say, C++, would you go for it? Even for a compilation speed up?

Ok, then, now that I've drawn my conclusions, it's time for the third story.

Story 3

Someone comes over to your desk and tells you "We couldn't get a successful build because of failing unit tests, so we disabled them".

Good night everyone and sweet dreams!

Saturday, 12 May 2012

Test First - why is it so important in TDD?

Some time ago, I've read a great post by Amir Kolsky and Scott Bain about why Test First technique is so important in TDD. I'd like to use it to elaborate on the few reasons I consider Test First to be an essential practice:

1. You don't know whether the test can ever fail.

This is one of the main points by Amir and Scott. When your write your test after the fact, you don't even know whether the test can ever fail even when the behavior described by the test is broken/changed.

The first time I encountered this argument (and it was long before the Amir and Scott's post), it quickly raised my self-defense mechanism: "But how can it be? I'm a wise person, I know what code I'm writing. If I make my unit tests small enough, it's self-evident that I'm describing the correct behavior. This is paranoid". However, this turned out not to be true. Let me describe, from my experience, just three ways (I know there are more, I just forgot the rest :-D) one can really put in a unit test that never fails:

a) Accidentially skipping addition to test suite.

However funny this may sound, it happens. The example I'm going to give is from C#, but almost every unit testing framework in almost every language has some kind of machanism of marking methods as "tests", whether by attributes (C#) or annotations (Java) or with macros (C and C++) or by inheriting from common class, or just a naming convention.

So, in my example, I'm using nUnit. In nUnit, to make a method "a test", you mark its class with [TestFixture] attribute, and every test must be marked with [Test] attribute in the following way:
public class CalculatorSpecification
  public void ShouldDisplayAdditionResultAsSumOfArguments() 
Now, imagine that you're doing post-factum unit testing in an environment that has, let's say, more than thirty unit tests - you've written the code, now you're just making test after test "to ensure" (as you see, this is not my favourite reason for writing unit tests) the code works.Test - pass, test - pass, test-pass. You almost always launch the whole suite, since it's usually painful to point out each time, which test you wanna run, plus, you don't want to introduce a regression. So, this is really: Test - all pass, test - all pass, test - all pass... Hopefully, you use some kind of snippets mechanism for creating new unit tests, but if not (and many don't actually do this, myself included :-( ), once in a while, you do something like this:
public class CalculatorSpecification
  //... some unit tests here

  public void ShouldDisplayZeroWhenResetIsPerformed()
And you don't even notice that this is not added to your test suite, because there are so many unit tests already that it's almost irrational to search for your new test in the test list. Also, this fact that you have omitted the addition, does not disturb your work flow: test - all pass, test all pass, test - all pass... So, what you end up is a test that not only will never fail - it will never be executed.

So, why does Test First help here?  Because, in Test First, a test passing right away is what DOES disturb your work flow. In TDD, the work flow is: test - fail - pass (ok, and refactor, but for the sake of THIS discussion, it doesn't matter so much), test - fail - pass, test - fail - pass... 

Once in a while, I stumble upon a situation where Test First saves me from this trap.

b) Misplacing mock setup

Ok, this may sound even funnier (well, honestly, every mistake sounds funny), but it happened to me a couple of times, so it's beneficial to mention this. The example I'm going to show uses manual mocks, but, just not to jump into "dynamic vs handmade mocks" discussion here, this can happen with dynamic mocks as well, especially if you're in a hurry.

So, without further ado, the code (yes, this is a stupid example I made up in two minutes, so don't criticize the design here, please ;-):
public void ShouldRecognizeTimeSlotAboveMaximumAllowedAsInvalid()
  var frame = new FrameMock();

  var validation = new Validation();
  var timeSlotAboveMaximumAllowed = TimeSlot.MaxAllowed + 1;

  var result = validation.PerformForTimeSlotIn(frame);
  frame.GetTimeSlot_Returns = timeSlotAboveMaximumAllowed;

Note how the tested method (which is PerformForTimeSlotIn()) is called BEFORE the mock is actually set up and the set up return value is never taken into account. So, how did it happen that, despite this fact, the call yielded correct result? This sometimes happens, and it happens most often in case of various boundary values (nulls etc.).

c) Using static data inside production code.

Once in a while, you have to jump in and add some tests and logic to code that's written by someone else. Imagine this code is a wrapper around your product XML configuration file. You decide to write your unit tests after applying the changes ("well", you can say, "I'm all protected by the suite that's already in place, so I can make my change without risking regression, then just test my changes and it's all good..."). 

So, you start writing a test. The test suite class contains a member like this:
XmlConfiguration config = new XmlConfiguration(xmlFixtureString);
What it does is to set up a global object used by all the tests. BUT, what you can also see, it uses the same fixture object every time. What you need to write tests for is a little corner case that does not need all this crap that already got into the global fixture. So, you decide to start fresh and write your own fixture. Your test begins like this:
string customFixture = CreateMyOwnFixtureForThisTestOnly();
var configuration = new XmlConfiguration(customFixture);
And it passes. Ok, what's wrong with this? Nothing big, unless you read the source code of XmlConfiguration class carefully. Inside, you can see, where the xml string is stored:
private static string xmlText; //note the static keyword!
What the...? Well, well, here's what happened: the author of this class coded in a small little optimization. He thought: "The configuration is only changed by the field group and to do it, they have to shut down the system, so, there is no need to read the XML file every time an XmlConfiguration object is created. I can save some cycles and I/O operations by reading it only once when the first object is created. Another created object will just use the same XML!". Good for him, not so good for you. Why? Because (unless your test runs first), your custom fixture will never be used!

This was the last way I wanted to mention, now to the second point.

2) "Test After" ends up as "Test Never"

Once in a while, I come into argument with a coworker of mine. What those guys usually say is: "Ok, I know that unit testing is good, I just don't buy the Test First part". When I raise the argument mentioned in point 1, they say: "I can make it without Test First - after I write each unit test, I modify production code on purpose to make sure this test fails - then I get the same value as you do, without the Test First absurd". What I usually do then is to take them to my desk and show them a code coverage report for a single file. All of the methods are covered in green as covered, and one single method is in red as not covered. Then I say: "Guess which parts of this code was written Test First and which is written Test After". I like it when they discover (or I tell them) that this uncovered code is Test After that ended up as Test Never.
Let's be honest - we're all in a hurry, we're all under pressure and when this pressure is too high, it triggers heroic behaviors in us, especially when there's a risk of not making it with the iteration commitment. Such heroic behavior usually goes by the following rules: drop all the "baggage", stop learning and experimenting, revert to all of the old "safe" behaviors and "save what we can!". If tests are written last, they're considered "baggage", since the code is already written, "and it will be tested anyway" by real tests (box testing, smoke testing, sanity testing etc. comes into play). It is quite the contrary when using Test First, where failing test is a reason to write any code. To write the code, you need the reason and thus, unit tests become irremovable part of your development. By the way, I bet in big corporations no one sane ever thinks they can abandon checking in the code to source control, at the same time treating unit tests as "an optional addition", but that's a topic for another post.

3) Not doing Test First is a waste of time.

One day, I was listening to Robert C. Martin's keynote at Ruby Midwest 2011, called Architecture The Lost Years. At the end, Robert made some digressions, one of them being about TDD. He said that writing unit tests after the code is not TDD. It is a waste of time. 

The first time I thought about it, I thought it was only about missing all the benefits that Test First brings you: the ability for a test to fail, ability to do a clean-sheet analysis, ability to do Need Driven Design etc., however, now I think there is more to it. If you're reading the Sustainable Test Driven Development blog regularly, you know that Amir and Scott value testability as design quality, along with cohesion, encapsulation and others. Also, they state that in order to make TDD (and even plain unit testing for that matter) sustainable, the code must have this testability quality on a very high level. How can we use this valuable insight to identify the waste? Let's see how testability looks like in Test First workflow (let's assume that we're creating new code, not adding stuff to dirty, ugly legacy code):

- Write unit test that fails (The code has high testability
- Write code that satisfies the test

Now, how does it usually look like in Test After approach (from what I saw in various situations):

- Write some code (probably spans few classes until we're satisfied).
- Start writing unit tests
- Notice that unit testing the whole set of classes is cumbersome and unsustainable and contains high redundancy.
- Refactor the code to be able to isolate objects and inject some mocks (The code has high testability)
- Write proper unit tests.

As you may have noticed, I emphasized few steps that are additional in Test After approach. What's their equivalent in Test First? Nothing! Doing these things is a waste of time! And, this is a waste of time I'm seeing done over and over again!

Anyway, because of this, let me say it clearly: if you've got a guy in your team that's doing TDD and he's the only one, and you're just "coding and shipping" carelessly even without unit tests, be informed, that this guy will ALWAYS have to corect your code to make it testable and, for that matter, better designed. So, if you really wanna stay on the dark side and not do what's right and ethical (which is to do TDD), at least buy him a soft drink from time to time.

Ok, that's it. Enjoy your weekend!

Saturday, 5 May 2012

Mockowanie funkcji języka C bez użycia C++


Jeśli ktoś kiedyś przeczyta mój poprzedni wpis, być może zastanowi się, czy nie można by przenieść mechanizmu nadpisywania funkcji z HippoMocks tak, żeby wykorzystywać go w zwykłym C.

Dobra wiadomość jest taka, że jest to możliwe i że mechanizm nadpisywania funkcji z hippo mocks bardzo łatwo jest wydzielić, natomiast zła wiadomość jest taka, że przeniesienie samego mechanizmu nadpisywania nie daje żadnej specjalnej zalety w porównaniu do np. słabych funkcji gcc.

W każdym razie, dla odważnych - jeśli zajrzycie sobie do źródeł ostatniego wydania HippoMocks (w moim poprzednim wpisie jest adres), na pewno zauważycie klasę Replace. To właśnie ta klasa służy do podmiany funkcji. Zademonstruję teraz na krótkim przykładzie jej działanie:

#include <stdio.h>
#include "hippomocks.h"

void fooA()

void fooB()

int main()
  HippoMocks::Replace replaceInThisScope(fooA, fooB);
  fooA(); // => "fooB"
  return 0;

Obiekty klasy Replace są dosyć sprytne, gdyż działają na zasadzie RAII (zdobywanie zasobów jest inicjalizacją), czyli w konstruktorze dokonują podmiany, natomiast w destruktorze przywracają stan domyślny. Oznacza to, że podmiana trwa tak długo, jak długo dany obiekt klasy Replace jest w zakresie.

Tak czy inaczej, nic nie stoi na przeszkodzie, żeby wyrwać tę klasę (i kilka jej zależności:  klasę Unprotect, typ e9ptrsize_t oraz funkcję horrible_cast (sic!)) i przerobić ten kodna kod C, korzystając z funkcji i makr (przynajmniej pod systemami linuksowymi powinno to zadziałać, nie próbowałem przerabiać wersji windowsowej, chociaż w nagłówku hippomocks.h również takowa jest.

Friday, 4 May 2012

Mockowanie funkcji języka C za pomocą HippoMocks


Dzisiaj pokażę Wam, jak za pomocą ostatniej działającej wersji szkieletu HippoMocks zastąpić funkcję z języka C swoją własną (innymi słowy, to będzie tzw. mock funkcji). Jako że życie to nie film, do takiego przedsięwzięcia będziemy potrzebować kompilatora C++ (innymi słowy - nawet jeśli Twój produkt jest cały pisany w C, to na potrzeby testów jednostkowych musisz kompilować go kompilatorem C++).

Moje środowisko (na wszelki wypadek, gdyby coś komuś nie chodziło):
1. Kompilator g++ 4.6.3
2. System operacyjny Ubuntu Linux 12.04 z zainstalowanym pakietem build-essentials
3. Nagłówek hippomocks.h ściąnięty ze strony (należy kliknąć ikonkę Download).
4. Platformę X86 (bodajże na x86-64 też powinno działać) - Sparc, sorry...

Teraz jedna mała uwaga: biblioteka HippoMocks w całości zawarta jest w jednym pliku nagłówkowym hippomocks.h. Ten plik musimy sobie dorzucić do swojej kompilacji. Natomiast sama kompilacja tego pliku może nie przejść!! U mnie wysypywał się na instrukcji prinff(), do której nie miał nagłówka:

../src/hippomocks.h:3990:94: error: ‘printf’ was not declared in this scope
make: *** [src/Main.o] Błąd 1

Na szczęście można temu szybko zaradzić, znajdując w hippomocks.h pierwszy ciąg instrukcji #include i dodając tam: #include <stdio.h>

No i na koniec mały przykład, jak to działa:

#include <iostream>
#include "hippomocks.h"

int ReturnOne()
  return 1;

int main()
  HippoMocks::MockRepository mocks;

  std::cout << ReturnOne() << std::endl; // 123
  return 0;