Skip to content

Increasing loop performance by iterating two intersecting lists simultaneously

January 21, 2014

Disclaimer

This brief post covers a micro-optimisation that we employed recently in an Asp.Net Web Api app. If you’re looking to solve major performance problems or get a quick win on small tasks, this isn’t going to be very useful to you. However, if you’ve nailed all the big stuff and are processing a large batch (think millions) of many records together then these small inefficiencies really begin to add up. If this applies to you, then you may find the following solution useful.

It’s possible that many people have thought of this problem and provided a solution before… in fact I’m very sure they have as I’ve googled it and so many Stack Overflow posts came up that I’m not going to bother linking to any of them. However, no-one seemed to made anything that was simple, re-usable and easy to integrate… so this is my take on it.

Finally, I’ve attempted to do some napkin maths. It’s probably wrong in some way so please correct me.

Compound Iteration

How often have you written code like this? 


public List<string> Validate(IDictionary<string, object> data, List<Field> mandatoryFields)
{
var errors = new List<string>();
foreach (var field in mandatoryFields)
{
if (!data.Keys.Contains(field.Identifier))
{
errors.add(String.Format("The {0} field is required", field.Identifier);
}
}
return errors;
}

This simplified sample shows how you might validate an HTTP PATCH request against some metadata. It seems innocuous right?

But say you have 1000 fields to validate, and maybe half of them are present in the body of your request. In the worst case we’ll have 500 iterations of the outer fields loop where we’ll then have to iterate through 500 dictionary keys just to find out that the field doesn’t exist in the data set.

Even in an optimal case for the remaining fields that do exist, you’ll have to iterate through 250 keys on average before we find a match, so for an ‘average’ case we could be looking at:

(500 * 500) + (500 * 250) = 375,000

As an ‘average’ case, it could potentially be a lot less than this, potentially a lot more. Either way,  imagine trying to bulk validate 100,000 records and… yikes!

Sort your data, and enter the Efficient Iterator

Provided your numbers are big enough it’s much more efficient to sort your data first and then step through each collection simultaneously. If your field info is coming say from a SQL table with a clustered index and an orderby is essentially free then it’s even more possible that this will result in significant speedup.

Basically what such an algorithm does is to take the first item from each of the two lists, and compare them. If Item A comes before Item B in the sort order, you advance forward one item in List A – or vice versa – until the two are found to match (or you run out of items). You are able to take action on each step, in the case a value is a match, or an orphan on either side.

Now the worst case iteration is merely the sum of the elements in the two lists. So in our average case, just 1500. That’s a 250x reduction… over two orders of magnitude!

Show me the code

Without further ado, here’s a Gist that you can use to do this right now…


public class EfficientIterator<TOne, TTwo>
{
private Action<TOne, TTwo> always = (one, two) => { };
private Action<TOne, TTwo> match = (one, two) => { };
private Action<TOne> oneOnly = (one) => { };
private Action<TTwo> twoOnly = (two) => { };
private readonly Func<TOne, TTwo, int> compare;
public Action<TOne, TTwo> Always { set { this.always = value; } }
public Action<TOne, TTwo> Match { set { this.match = value; } }
public Action<TOne> OneOnly { set { this.oneOnly = value; } }
public Action<TTwo> TwoOnly { set { this.twoOnly = value; } }
public EfficientIterator(Func<TOne, TTwo, int> compare)
{
this.compare = compare;
}
public void RunToEnd(IEnumerable<TOne> listOne, IEnumerable<TTwo> listTwo)
{
using (var listOneEnumerator = listOne.GetEnumerator())
using (var listTwoEnumerator = listTwo.GetEnumerator())
{
var listOneIterating = listOneEnumerator.MoveNext();
var listTwoIterating = listTwoEnumerator.MoveNext();
while (listOneIterating || listTwoIterating)
{
int result = 0;
if (listOneIterating && listTwoIterating)
{
result = this.compare(listOneEnumerator.Current, listTwoEnumerator.Current);
// a and b are equal
if (result == 0)
{
this.always(listOneEnumerator.Current, listTwoEnumerator.Current);
this.match(listOneEnumerator.Current, listTwoEnumerator.Current);
listOneIterating = listOneEnumerator.MoveNext();
listTwoIterating = listTwoEnumerator.MoveNext();
continue;
}
}
// list one has run out, or it's current item is greater than list two
if (!listOneIterating || result > 0)
{
this.always(default(TOne), listTwoEnumerator.Current);
this.twoOnly(listTwoEnumerator.Current);
listTwoIterating = listTwoEnumerator.MoveNext();
continue;
}
// list two has run out, or it's current item is greater than list one
if (!listTwoIterating || result < 0)
{
this.always(listOneEnumerator.Current, default(TTwo));
this.oneOnly(listOneEnumerator.Current);
listOneIterating = listOneEnumerator.MoveNext();
continue;
}
}
}
}
}

Take a look at these MSpec tests for information on how to use it. You’ll also need to use nullable types if you want to work with non-reference types but that should be straightforward. Thanks to Tommy Carlier for his amendments to the sample to allow any type of IEnumerable and to support value types!

Questions are welcome in the comments… but please refrain from unhelpful critiquing the ‘design’ of the simplified problem sample ; ) Enjoy iterating efficiently!

Don’t forget that you’ll have to sort both lists before passing them to the efficient iterator!

Pete

Advertisement

From → Web API

One Comment
  1. cschleiden permalink

    In your example you could just use data.ContainsKey(…).. It’s a hashtable, after all.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: