Thursday, November 10, 2016


I want to take this thought a step further, and as implied by the post title, do a group by.
Starting, here is an order by % 2 giving us a list of even and then odd numbers:
  1. int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
  2.  
  3. var orderedNumbers = from n in numbers
  4. orderby n % 2 == 0 descending
  5. select n;
  6.  
  7. foreach (var g in orderedNumbers)
  8. {
  9. Console.Write("{0},", g);
  10. }

This is all pretty straight forward, order by numbers that when modded by 2 are 0 and we have the numbers 4,8,6,2,0,5,1,3,9,7.
But what if I want to simply have two lists, one with evens and one with odds? That’s where group by comes in.
  1. int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
  2.  
  3. var numberGroups = from n in numbers
  4. group n by n % 2 into g
  5. select new { Remainder = g.Key, Numbers = g };
  6.  
  7. foreach (var g in numberGroups)
  8. {
  9. if(g.Remainder.Equals(0))
  10. Console.WriteLine("Even Numbers:", g.Remainder);
  11. else
  12. Console.WriteLine("Odd Numbers:", g.Remainder);
  13. foreach (var n in g.Numbers)
  14. {
  15. Console.WriteLine(n);
  16. }
  17. }
with the output:
  1. Odd Numbers:
  2. 5
  3. 1
  4. 3
  5. 9
  6. 7
  7. Even Numbers:
  8. 4
  9. 8
  10. 6
  11. 2
  12. 0
What’s happening here is that LINQ is using anonymous types to create new dictionary (actually a System.Linq.Enumerable.WhereSelectEnumerableIterator>).
It is important to note here that the key here that everything is keyed on is the first value after the “by”.
Taking this one simple step forward let’s group a bunch of words. The following doesn’t work quite right:
  1. string[] words = { "blueberry", "Chimpanzee", "abacus", "Banana", "apple", "cheese" };
  2.  
  3. var wordGroups = from w in words
  4. group w by w[0] into g
  5. select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };
  6.  
  7. foreach (var g in wordGroups)
  8. {
  9. Console.WriteLine("Words that start with the letter '{0}':", g.FirstLetter);
  10. foreach (var w in g.Words)
  11. {
  12. Console.WriteLine(w);
  13. }
  14. }
giving us the output:
  1. Words that start with the letter 'b':
  2. blueberry
  3. Words that start with the letter 'c':
  4. Chimpanzee
  5. Words that start with the letter 'a':
  6. abacus
  7. apple
  8. Words that start with the letter 'b':
  9. Banana
  10. Words that start with the letter 'c':
  11. cheese
That’s because there is a bit of a red herring here. Remember that the first value after the by is what is used to group by. In our case w[0] for Chimpanzee is “C”, not c. If we change it to:
  1. string[] words = { "blueberry", "Chimpanzee", "abacus", "Banana", "apple", "cheese" };
  2.  
  3. var wordGroups = from w in words
  4. group w by w[0].ToString().ToLower() into g
  5. select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };
  6.  
  7. foreach (var g in wordGroups)
  8. {
  9. Console.WriteLine("Words that start with the letter '{0}':", g.FirstLetter);
  10. foreach (var w in g.Words)
  11. {
  12. Console.WriteLine(w);
  13. }
  14. }
then we get the results we expect with:
  1. Words that start with the letter 'b':
  2. blueberry
  3. Banana
  4. Words that start with the letter 'c':
  5. Chimpanzee
  6. cheese
  7. Words that start with the letter 'a':
  8. abacus
  9. apple
Taking this even one step further we can throw an orderby above the group and order things alphabetically:
  1. var wordGroups = from w in words
  2. orderby w[0].ToString().ToLower()
  3. group w by w[0].ToString().ToLower() into g
  4. select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };
So let’s now make this a bit over the top complex. Given the classes:
  1. public class Customer
  2. {
  3. public List<Order> Orders { get; set; }
  4. }
  5.  
  6. public class Order
  7. {
  8. public DateTime Date { get; set; }
  9. public int Total { get; set; }
  10. }
lets group a customer list by customer, then by year, then by month:
  1. List<Customer> customers = GetCustomerList();
  2. var customerOrderGroups = from c in customers
  3. select
  4. new {c.CompanyName,
  5. YearGroups = from o in c.Orders
  6. group o by o.OrderDate.Year into yg
  7. select
  8. new {Year = yg.Key,
  9. MonthGroups = from o in yg
  10. group o by o.OrderDate.Month into mg
  11. select new { Month = mg.Key, Orders = mg }
  12. }
  13. };
Whew! that took a lot to copy and paste from MSDN’s sample library! 😉
As mentioned previously the important part here is that the keys for these are the first value after the “by”. This just creates a bunch of dictionarys keyed embeded together keyed on the values after the “by”.
The GroupBy method that is a part of Linq can also take an IEqualityComparer. Given the comparer:
  1. public class AnagramEqualityComparer : IEqualityComparer
  2. {
  3. public bool Equals(string x, string y)
  4. {
  5. return getCanonicalString(x) == getCanonicalString(y);
  6. }
  7.  
  8. public int GetHashCode(string obj)
  9. {
  10. return getCanonicalString(obj).GetHashCode();
  11. }
  12.  
  13. private string getCanonicalString(string word)
  14. {
  15. char[] wordChars = word.ToCharArray();
  16. Array.Sort(wordChars);
  17. return new string(wordChars);
  18. }
  19. }
we can find all the matching anagrams. This is possible because the IEqualityComparer compares words based on a sorted array of characters. If you take “meat” and “team” they both become “aemt” when sorted by their characters.
  1. string[] anagrams = { "from", "salt", "earn", "last", "near", "form" };
  2.  
  3. var orderGroups = anagrams.GroupBy(
  4. w => w.Trim(),
  5. a => a.ToUpper(),
  6. new AnagramEqualityComparer()
  7. );
  8.  
  9. foreach (var group in orderGroups)
  10. {
  11. Console.WriteLine("For the word "{0}" we found matches to:", group.Key);
  12. foreach (var word in group)
  13. {
  14. Console.WriteLine(word);
  15. }
  16. }