Combinatorics

Published March 03, 2009
Advertisement
Such a cool word for a math concept. Anyways, I've been writing a Sudoku game lately, and I've been delving a little into set theory, with a little dash of combinatorics thrown in for good measure. I needed a method that would return all possible combinations of a given sequence, and the Enumerable class has finally failed me.

So here's my implementation of a combine function. Note that while this is similar to permutations, it's slightly different in that I only take a certain subset in each pass (controlled by the count parameter). I'm hoping someone, perhaps one of the functional guys, can show me a more elegant way of doing this, but for now it works. I think the use of the HashSet in the Filter function provides a performance boost , but I don't know for certain. HashSet.Contains is an O(1) operation, but the cost of creating a new HashSet every iteration may cancel out that performance gain.

/// /// Filters the specified sequence based upon the provided indices./// 

/// The type of the elements in the sequence.
/// The sequence.
/// The indices.
/// The filtered sequence.
public static IEnumerable Filter(this IEnumerable sequence, IEnumerable<int> indices)
{
HashSet<int> hs = indices as HashSet<int>;
if (hs == null)
hs = new HashSet<int>(indices);

return sequence.Where((t, i) => hs.Contains(i));
}

///
/// Generates all combinations (not permutations) of a given sequence.
///
/// The type of the elements in the sequence.
/// The sequence.
/// The number of items to pick for each combination.
/// The sequence of combinations.
public static IEnumerable> Combine(this IEnumerable sequence, int count)
{
List> results = new List>();
int[] indices = Enumerable.Range(0, count).ToArray();
int max = sequence.Count() - 1;

while (true)
{
results.Add(sequence.Filter(indices));

int n = count;
for (int i = indices.Length - 1; i >= 0; i--)
{
if (indices < max - (count - i - 1))
{
n = i;
break;
}
}

if (n == count)
break;

indices[n]++;
for (int i = n + 1; i < indices.Length; i++)
indices = indices + <span class="cpp-literal"><span class="cpp-number">1</span></span>;<br> }<br><br> <span class="cpp-keyword">return</span> results;<br>}<br></pre></div><!–ENDSCRIPT–><br><br>Well, maybe it's not all the interesting, but it does have that new LINQ-y, extension method smell, and it actually worked the first time I wrote it, so I'm happy.<div> </div>
0 likes 11 comments

Comments

Telastyn
I tried to think of something more elegant, but couldn't decipher what it actually did... (and don't have a compiler to try it out).

Tentatively I'd think down two paths. One where the permutation of indices is contained within a lambda so when the linq predicate checks it, the lambda side-effects the permutation. The other is doing essentially a join in linq for all of the permutations/combinations and culling them via some criteria that I'm missing since I know of those words as synonyms...
March 03, 2009 10:25 AM
Mike.Popoloski
It is my understanding that a true set of permutations would not have a cut off point (ie. it would have n! results), whereas combinations are subsets taken with a given size (count parameter in this case). The number of outputs for a combination of "n take k" would be the binomial coefficient (ie. n! / (k! * (n-k)!).

I'm sure my method isn't the best because I just wrote it off the top of my head, but what it does is build up a list of indices, and then iteratively run through them each "tick" and update each index until it reaches the maximum possible index into the sequence. Ordering doesn't matter; in other words, I don't want duplicates returned (ie. {A B C} and {C B A} would be duplicates).
March 03, 2009 10:59 AM
Telastyn
This could likely be cleaned up with some linq goodness or helper classes. I don't have access to msvs2008 at the moment. It should do what you want if I understand you correctly.


using System;
using System.Collections.Generic;
using System.Text;

namespace Combine {
    class Program {

        public static bool EnumerableHasAtLeast<T>(IEnumerable<T> Enumerable, int xElements) {
            int x = 0;
            foreach (T entry in Enumerable) {
                x++;
                if (x >= xElements) {
                    return (true);
                }
            }
            return (false);
        }

        public static IEnumerable<IEnumerable<T>> Combine<T>(IEnumerable<T> sequence, int elementsPerEntry) {
            if (elementsPerEntry <= 0) {
                yield break;
            }

            if (elementsPerEntry == 1) {
                foreach (T element in sequence) {
                    List<T> tmp = new List<T>();
                    tmp.Add(element);
                    yield return tmp;
                }
            } else {
                foreach (T element in sequence) {
                    
                    // manually select items after element from the original sequence
                    List<T> subset = new List<T>(sequence);
                    while (!subset[0].Equals(element)) {
                        subset.RemoveAt(0);
                    }
                    subset.RemoveAt(0);

                    if (!EnumerableHasAtLeast(subset, elementsPerEntry - 1)) {
                        yield break;
                    } else {
                        foreach (IEnumerable<T> leftovers in Combine(subset, elementsPerEntry - 1)) {
                            List<T> tmp = new List<T>(leftovers);
                            tmp.Insert(0, element);
                            yield return (tmp);
                        }
                    }
                }
            }
        }

        static void Main(string[] args) {
            List<string> seq = new List<string>();
            seq.Add("A");
            seq.Add("B");
            seq.Add("C");
            seq.Add("D");
            seq.Add("E");
            seq.Add("F");

            foreach (IEnumerable<string> combo in Combine(seq, 3)) {
                foreach (string element in combo) {
                    Console.Write(element);
                }
                Console.WriteLine();
            }
        }
    }
}


March 03, 2009 12:34 PM
Mike.Popoloski
Ah, my first instinct when I approached this problem was to use recursion, but I couldn't wrap my head around it so I opted for the looping approach. I'll run both methods through some performance tests and let you know what happens.
March 03, 2009 10:34 PM
Telastyn
I suspect that yours runs faster, but mine uses far less memory.
March 03, 2009 11:40 PM
Mike.Popoloski
Yeah, I bet you're right. Plus, I'm allocating HashSets all over the place, which can't be good for the GC.
March 04, 2009 08:37 AM
Telastyn
Enh, I end up allocating like 2n+1 lists per combination (where n is the # of elements in a result) so neither are likely good for the GC. Though mine could be reworked to reuse some of the lists at the expense of readability/maintainability.
March 04, 2009 09:03 AM
Mike.Popoloski
I think mine is more optimal than it looks because the Where() extension method acts using deferred execution, so the final result is simply a list of "wheres" that act on the original sequence once someone actually needs the data it contains.
March 04, 2009 11:48 AM
Telastyn
Which means the original sequences need to be kept around until used :/

So get with the profiling already!
March 04, 2009 12:50 PM
Telastyn
Getting the memory from the GC isn't a great metric, but I don't have a proper profiler handy. The populate properties just stuff 0->n into the string List. I have a few others with longer strings. In general though, it seems to not make much difference.

In the end mine looks faster too:


static void Main(string[] args) {
            int elements = 3;
            List<string> seq = PopulateSeq25;

            int x = 0;
            foreach (IEnumerable<string> combo in Combine(seq, elements)) {
                foreach (string element in combo) {
                    //Console.Write("{0} ",element);
                }
                x++;
                //Console.WriteLine();
            }
            Console.WriteLine("Tel:");
            Console.WriteLine("Sequence Size: 25");
            Console.WriteLine("Elements per result: {0}", elements);
            Console.WriteLine("{0} combos.", x);
            Console.WriteLine("----------");
            Console.WriteLine("Processor Time: {0}", System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime);
            Console.WriteLine("GC memory usage: {0}", System.GC.GetTotalMemory(false));
        }



(with the suitable changes for different test size/element pairs and a different project with the same main for Mikes)


Tel:
Sequence Size: 25
Elements per result: 3
2300 combos.
----------
Processor Time: 00:00:00.0468750
GC memory usage: 367220

Mike:
Sequence Size: 25
Elements per result: 3
2300 combos.
----------
Processor Time: 00:00:00.0625000
GC memory usage: 878176


Tel:
Sequence Size: 25
Elements per result: 10
3268760 combos.
----------
Processor Time: 00:00:23.1093750
GC memory usage: 572120

Mike:
Sequence Size: 25
Elements per result: 10
3268760 combos.
----------
Processor Time: 00:00:40.2968750
GC memory usage: 1180774852


Tel:
Sequence Size: 100
Elements per result: 3
161700 combos.
----------
Processor Time: 00:00:00.3281250
GC memory usage: 536524

Mike:
Sequence Size: 100
Elements per result: 3
161700 combos.
----------
Processor Time: 00:00:02.4375000
GC memory usage: 38762176


Tel:
Sequence Size: 100
Elements per result: 5
75287520 combos.
----------
Processor Time: 00:03:39.9843750
GC memory usage: 356340

Mike:
Sequence Size: 100
Elements per result: 5
DNF - OutOfMemory

March 04, 2009 05:08 PM
Mike.Popoloski
Ouch. Looks like your method is solidly better in every way imaginable [grin]

I admit defeat.
March 04, 2009 06:40 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement

Latest Entries

New Blog

1978 views

Progress Update

1537 views

Start of Project

1501 views

New Job

2174 views

The Downward Spiral

2826 views

Job Interviews

1434 views
Advertisement