Sunday 10 February 2013

The Iterator Pattern in .NET

The Iterator Pattern provides a way to access the elements of an aggregate object (collection) sequentially without exposing its underling representation.

The ultimate goal of the pattern is to decouple the client from the implementation of the collection that always remains well encapsulated.

This is definitely my favourite pattern in particular for how it is supported by C#.
It's easy, amazing and extremely powerful.

The Client

In the .NET framework the pattern is fully supported by all the collections in the Base Class Library and the C# language provides the foreach keyword to iterate a collection in an extremely simple way. Every .NET developer knows it.


What is the beauty of the pattern?

The beauty is that the client does not need to know nothing about the internals of the collection in order to iterate through its elements. Replacing the List with an HashSet does not require to change the foreach loop in any way.


How does it works?

The foreach loop works only on collections that implement the interface IEnumerable or the generic version IEnumerable<T>.

All the .NET framework collections implement that interfaces included List and HashSet.


The IEnumerable interface contains a factory method that should return an implementation of the interface IEnumerator or the generic IEnumerator<T>.

An enumerator class is responsible to enumerate the collection. It is a read-only, forward-only cursor over a collection.

The property Current returns the current element in the collection; the method MoveNext() advances the enumerator to the next element of the collection; the method Reset() sets the enumerator to its initial position, which is before the first element in the collection.


Everything becomes clear when you see how the C# compiler translates the foreach loop using directly the enumerator. This is the magic!


Using a .NET disassembler like ILSpy or dotPeek however we can't see the translated version in C#. In order to have a confirmation that what I said is actually true we have to look directly at the generated IL.


It is easy to see that the compiler actually does something more. It wraps the enumeration in a try/finally construct in order to properly Dispose the enumerator. This is important because IEnumerator implements IDisposable.

Finally, this is the correct C# version of foreach.

The foreach statement simply provides a very elegant way to generically enumerate a collection.

How to Implement Enumerable Collections

OK, brilliant,we now know how to enumerate any collection using the foreach keyword and in particular how this is actually done under the cover.

It's time to create a custom collection. For example, a collection that is able to store any integer between 0 and N using internally a frequency array. This is actually pretty simple to do.

However, the following code does not compile. The compiler is not able to initialize and enumerate the collection.


We obviously need to implement the IEnumerable interface


A typical choice is to use a nested type to implement the enumerator. This allows to access the internal of the collection to accomplish the job. Note that the enumerator is intrinsically coupled to the collection. It turns out that implementing an enumerator it is not so easy as it may seems initially.


The complexity of implementing the enumerator arises because it is required to maintain the current state of the iteration between subsequent calls to MoveNext. In the example the variables pos and count are required in order to maintain the state of the iteration.

Can you understand the code inside MoveNext?

Yes, I know. You may understand it but it is definitely not intuitive.

Fortunately, the C# compiler comes in help here with the keyword yield that implements the concept of coroutines in .NET.

You can implement all in the following beautiful way without the need to manually implement the enumerator yourself.


This looks like really magical!

But we are curious, let's deep dive and see the code generated by the compiler.


OK, the compiler create an instance of a compiler generated enumerator class passing through a property a reference to the collection. The compiler generated enumerator is also a nested collection.



The compiler generates a state machine under the cover. It is really impressive what it does but all this complexity is hidden to the developer thank to the yield keyword.

The Iterator Pattern and LINQ to Objects

Everything we discussed so far has always been available since C# 2.0.

C# 3.0 added LINQ to the language expanding in a significant way the potential of the language. This is without any doubts the main reason why I love C#.

The power of LINQ to Objects in particular is related to the fact that iterators are highly composable.

Consider the code that print the double of the even numbers from the collection using LINQ to Objects.


How the enumeration process works in this case?

The first call to Where generate a enumerable object that wraps the source collection using the decorator pattern. The object itself however is also the enumerator of itself. The main logic is inside the MoveNext method that iterates the source collection and then apply the filter.


The same is for the Select method where the source collection in this case will be the result returned by the Where method.


The execution at run-time will be a chain of enumerations that is simply described by the following sequence diagram.


Conclusion


The iterator pattern these days is a fundamental pattern and it is fully integrated in the modern programming languages and libraries. In particular in C# you can leverage the pattern using the foreach statement in the client code, and implementing IEnumerable and IEnumerator when you create new collections. Don't forget the incredible support that the yield keyword can provide you when you implement your custom collections.


It's important to notice that the iterator pattern is a good example of the Single Responsibility Principle: "A class should have only on reason to change". The 
Enumerator class is a class with the single responsibility to traverse a collection.

The main benefit of the pattern is decoupling the clients from the internals of the collections. This allows to change the collection without affecting clients.

3 comments:

  1. Line 2:decouple (s) the client should be without 's' it is a intransitive verb

    ReplyDelete
    Replies
    1. Thanks! I appreciate this kind of input. Corrected.

      Delete
  2. I also find useful CTRL+SHIFT+G which opens up a "Navigate To" menu which is definitely easier to remember than the individual items

    ReplyDelete

What you think about this post? I really appreciate your constructive feedback (positive and negative) and I am looking forward to start a discussion with you on this topic.