Collections

We will talk about different types of collections, several collection interfaces and how they work under the hood.

The course will be as follows:

  • Introduction
  • Arrays
    • Inside Arrays
    • Array capabilities
  • Interfaces
  • Collection Types
    • Index-based lists
    • Other lists(Linked Lists, Stacks, Queues)
    • Dictionaries
    • Sets
  • Enumerators
  • Multidimensional and Jagged Arrays

Collection Introduction
  • What is a collection
    • Collection Operations
  • Types of collection
    • Lists
    • Dictionaries
    • Sets
  • Collections in .NET
    • A brief history
    • The current collection landscape
What is a collection

A collection is a Type responsible to manage groups of other objects in memory

Types of collection

All the collections provided by .net come under Lists, Dictionaries or Sets

In some collections sometimes the order of elements is important, this is List. These have indexes. Good for memory use, efficient for accessing elements. List Types in .NET

  • T[] - Array
  • List<T>
  • Collection<T>
  • ReadOnlyCollection<T>
  • ObservableCollection<T>
  • IList<T> - This is a definitive contract for all lists


When the order is not important, for example staff of an organization. They need not have an order, this is Dictionary. Accessing items by index is not required. Access is based on a Key.

  • Dictionary<TKey, TValue>
  • IDictionary<TKey, TValue> - This is a definitive contract for all dictionaries

Most of the dictionaries are implemented using a hash table.

For example: Declare a dictionary that allows looking up by a name:


var employees = new Dictionary<string, Employee>();
Or, for looking up by social security no. declare:

var employees = new Dictionary<SocialSecNo, Employee>();

Lists are faster to access than dictionaries


The focus is not on accessig a single element, but treating the whole group as one. This is Sets. This is same as the sets in set theory. To find out intersection, union etc. Sets are collections that support being combined to form new sets

  • HashSet<T>
  • ISet<T> - Contract for sets
Similarity between dictionary and Set is that both are often based on hashtable. Difference is that a dictionary lookup with keys. In sets there is no lookup.

Collection Operations
  • Reading
    • Single item Through index or keys
    • All items through enumerator using a foreach loop
      • elements in a list are enumerated in the index order
      • elements in the dictionary are not enumerated in the order, so dont rely for a order in a dictionary.
    • Sets don't support lookin up of an item, there are a few list types too that dont support a lookup(Linked lists, Stacks, Queues)
  • Writing
    • Add an item
      • In lists you might want to add an item at a particular order
      • In dictionaries it does not matter
    • Remove an item
    • Replace an item
  • Derived operation
    • Filter a list - this is reading all the items and fetching which satisfies the condition
    Usually this is done using LINQ
Collections in .NET
  • Generics in .NET 2.0

    List<T> was introduced. instead of list of a type earlier there was only ArrayList(of object) available that would be a list of your type, but you have to box and unbox that object to the type and vice versa to use it.

    But there was an Array that could hold a collection of types. But due to differences between Arrays and Generics, Generics prevailed.

    What came along was :

    • HashSet<T>, Stack<T>, Queue<T>, LinkedList<T>, List<T>, Dictionary<TKey, TValue>

  • LINQ in .NET 3.0

    Adds operations to anything enumerable - including collections

  • Concurrent Collections .NET 4.0

    Multi-threaded support (Not covered in this course)

  • Readonly Interfaces, Immutable collections
      ReadOnly Interfaces
    • IReadOnlyList<T>
    • IReadOnlyCollection<T>
Inside Arrays

This section will deal with "What arrays are", "how do they work", "C# syntax to use them".

Next module will cover the implementation of arrays in .NET, how arrays fit into the .NET type system and the rich functionality that the managed array type implements

  • What is an array
    • Basic Syntax
  • Arrays under the hood
    • Element access is very efficient
  • Declaring and initializing arrays
  • Enumerating(iterating) array contents
    • foreach loop
    • for loop
What are Arrays
  • Index based list
  • rich api
  • very lightweight
  • special c# syntax
  • fixed size - this is a drawback
An example for which arrays will suit best : Consider the days of a week where Order and Size are fixed.

type[] - declares array, not simple type

Iterating with foreach


foreach (var day in daysOfWeek)
            {
                Console.WriteLine(day);
            }

Notice we can replace an array item as such : daysOfWeek[4]="PartyDay"

.

Since arrays have a fixed size, we can not remove or add an item.

Arrays under the Hood
  • Special syntax in c# i.e. int[] iHaveSquareBrackets. No other type will use square brackets
  • T[] is a reference type
    
    void Main()
    {
        int[] x = new int[5]{1234,23,356,7,6};
        change(x);
        Console.WriteLine (x[0]);
    }
    
    // Define other methods and classes here
    
    public void change(int[] intArray)
    {
        intArray[0] = 5;
    }
    
    The output is 5;
  • Implemented in the CLR itself
  • Other Collections are implemented using generics. And internally they are mostly implemented using Arrays!

    For example, the generic collection List<T> is a wrapper around T[] and the wrapper adds additional features.

  • How arrays are stored
    • Consider an integer array
      
      int[] squares = {1, 2, 9, 16, 25}
      

      int occupies 4 bytes. This array is stored in a continous memory stream. Each array element is of 4 bytes.

      Since all elements are of equal size, consider the memory address of the 0th element at X. 1st element will be at X+4Bytes. 2nd element will be at X+2*4Bytes. Nth element will be at X+(N-1)*4Bytes

      Now, when we lookup an array element this direct formula can be applied. This demonstrates the speed of accessing an array element.

      int have a fixed size of 4 Bytes, but what about a complex type whose size can vary. In this case the formula becomes X+(N-1)*YBytes where Y in case of int was 4 but for comples types it can not be determined always.

      Are we forgetting something :). Complex types are reference types. Each reference points to a memory location which is Y Bytes. In case of T[] the continous memory stream is of references, not actual data and fortunately in 64 bit machine, this reference size is of 8 bytes. This makes the formula as X+(N-1)*8Bytes, and once again the jumping or array lookup becomes quick.

      Note that these references are actually memory addresses. Addresses are always integers

    • Let's try to understand why arrays are of fixed size:

      Here we can see that if we want to add some more elements to an array, the memory stream after the last element of the array might not be free. If it is not free, the way of accessing an array i.e. X+(N-1)*YBytes will not hold.
How Arrays are initialized
  • Since arrays are refernce types, the below code is not initialized yet.
    
    int[] x;
    Console.WriteLine(x[0].ToString());
    
    The above code will throw an error "use of unassigned variable". But if you just new it up and don't assign any value, just like a reference type, the variable will be filled with the default values.
    
    int[] x = new int[5];
    Console.WriteLine(x[1].ToString());
    
    This will output 0 as default value for int in reference types is 0.
  • Array Initializer

    • 
      string[] daysOfWeek = {
                              "Monday",
                              "Tuesday",
                              "Wednesday",
                              "Thursday",
                              "Friday",
                              "Saturday",
                              "Sunday"
                          };
      
      
      
    • 
      
      
      string[] DaysOfWeek = new string[7]{
              "Monday",
              "Tuesday",
              "Wednesday",
              "Thursday",
              "Friday",
              "Saturday",
              "Sunday"
      };
      
    • Without the size
      
      
      
      string[] DaysOfWeek = new string[]{
              "Monday",
              "Tuesday",
              "Wednesday",
              "Thursday",
              "Friday",
              "Saturday",
              "Sunday"
      };
      
    Any expression that will evaluate at run time can come in the initializer

  • Consider [] as the Array constructor
  • There is no way to initialize the array in the Array constructor
  • Only thing that can be given in the Array constructor is the size of an array
Enumerating an Array
Enumerating through foreach and for. Difference as a developer is that index is available in for. But internally c# compiler converts foreach to for.

using System;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {

            int[] x = { 1, 2, 3, 4, 5 };
            foreach (int item in x)
            {
                Console.WriteLine(item);
            }
            Console.WriteLine();

            for (int i = 0; i < x.Length; i++)
            {
                Console.WriteLine(x[i] );
            }
            Console.ReadLine();
        }
    }
}

For Arrays the compiler internally converts foreach to for but for other collection types it treats it differently.

Usually for other collections its much easier to use foreach, so why have for at all.

  • Access index
  • Conditions based on index
  • foreach can not run for specific element because the index is not available.
  • foreach is a readonly. i.e. we can not change the value of an element inside a foreach . But in a forloop we can
    
    using System;
    
    namespace BasicArrayOps
    {
        class Program
        {
            static void Main(string[] args)
            {
    
                int[] x = { 1, 2, 3, 4, 5 };
                for (int i = 0; i < x.Length; i++)
                {
                    if (x[i]==3)
                    {
                        x[i] = 5000;
                    }
                    Console.WriteLine(x[i]);
                }
                Console.ReadLine();
            }
        }
    }
    
    
    
    using System;
    
    namespace BasicArrayOps
    {
        class Program
        {
            static void Main(string[] args)
            {
    
                int[] x = { 1, 2, 3, 4, 5 };
    
                foreach (var item in x)
                {
                    if (item==3)
                    {
                        item = 5000;
                    }
                    Console.WriteLine(item);
                }
                Console.ReadLine();
            }
        }
    }
    
    
    This is because the iteration placeholder variable cannot be assigned to. It is a readonly.

The reason why the iteration placeholder variable cannot be assigned to can be explained by the below code. Remember we said that internally foreach is replaced by a for for arrays.


using System;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {

            int[] x = { 1, 2, 3, 4, 5 };

            foreach (var item in x)
            {
                if (item==3)
                {
                    item = 5000;
                }
                Console.WriteLine(item);
            }
            Console.ReadLine();
        }
    }
}

The similar for equivalent can be

using System;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {

            int[] x = { 1, 2, 3, 4, 5 };
            for (int i = 0; i < x.Length; i++)
            {
                int item = x[i];
                if (item == 3)
                {
                    item = 5000;
                }
                Console.WriteLine(x[i]);
            }

            Console.ReadLine();
        }
    }
}


This shows that element can not be replaced by another value. In case of immutable type's arrays the value is copied and can not be changed. In case of mutable reference type's arrays the reference is copied and can not be changed. BUT the reference can be modified.

using System;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {

            Employee[] employees = new Employee[]{
                new Employee(){EmpId=1, Name="susingh"},
                new Employee(){EmpId=2, Name="scott"}
            };

            foreach (var employee in employees)
            {
                employee.Name = "Ben";
                Console.WriteLine(employee.Name);
            }

            Console.ReadLine();
        }
    }

    public class Employee
    {
        public int EmpId { get; set; }
        public string Name { get; set; }
    }
}

But the below will throw an error

using System;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {

            Employee[] employees = new Employee[]{
                new Employee(){EmpId=1, Name="susingh"},
                new Employee(){EmpId=2, Name="scott"}
            };

            foreach (var employee in employees)
            {
                employee = new Employee() { EmpId = 5, Name = "Ben" };
                //employee.Name = "Ben";
                Console.WriteLine(employee.Name);
            }

            Console.ReadLine();
        }
    }

    public class Employee
    {
        public int EmpId { get; set; }
        public string Name { get; set; }
    }
}

Module Overview
  • Arrays and the type system
    • Reference Type
  • Storing derived Type Instances
    • Covariance
  • Array capabilities
    • Copying Arrays
    • Sorting elements
    • Finding elements
Arrays as Reference Types
Proof that Arrays are reference types

using System;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            int[] x = { 1, 4, 9, 16 };
            var y = x;
            Console.WriteLine("Reference Equals(x,y): " + ReferenceEquals(x,y));

            Console.ReadLine();
        }
    }

}

Since Arrays are Reference types, == is same as ReferenceEquals.

All Collections are Refernces.

Storing Derived Instances in Arrays
In case we create an array of a Type Base, any derived type can be stored in this array.

using System;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            Animal[] animals = new Animal[] {
                new Animal(){TailLength="10"},
                new Dog(){Name = "Bruno", TailLength="8"},
                new Doberman(){Name="Jimmy", TailLength="2"}
            };

            foreach (Animal animal in animals)
            {
                Console.WriteLine("AnimalType: "+animal.GetType().FullName+", TailLength: "+animal.TailLength);
            }

            Console.ReadLine();
        }
    }

    public class Animal
    {
        public string TailLength { get; set; }
    }

    public class Dog : Animal
    {
        public string Name { get; set; }
    }

    public class Doberman : Dog
    {

    }
}

This applies to all Collections and not just Arrays

What is the Type of an Array
Any new Array type T[] is derived from System.Array which is derived from System.Object

Any new Array type is dynamically created at runtime

Array Covariance
You might assume that an array of Derived might be directly related to an array of Base. But this is not the case. Both Derived & Base derive from System.Array which in turn derive from System.Object

We know Type of a Derived array does not have a inheritance relationship with Type of a Base Array. But look at the below code

This works! even though we are assigning a completely different type to another type. This is called Implicit casting. Derived[] can be implicitly casted to Base[]. This feature of Arrays is called Array Covariance

Notice that this doesn't works for an array of value types.


using System;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            int[] daysOfWeek = { 1, 2, 3, 4, 5, 6, 7 };
            object[] objArr = daysOfWeek;

            foreach (var item in objArr)
            {
                Console.WriteLine(item);
            }


            Console.ReadLine();
        }
    }

}

Array Covariance is a problem. In the code above to above notice objArr. After assigning daysOfWeek to objArr we can still change objArr[0]=Foo and not a string. This won't throw an error at compile time but it will throw an error at runtime. The reason is that when daysOfWeek was assigned to objArr, the type of objArr was created. Now when some other type was tried to insert into objArr it threw a System.ArrayTypeMismatchException This only says to us "don't use Array Covariance"

Covariance is supported by Array and IEnumerable<T> and IEnumerator<T>. The later throws a compile time error unlike arrays, so it is useful and good.

Covariance except the ones described above are not supported by any other Collection. i.e. List<Derived> can not be assigned to List<Base>



using System;
using System.Collections.Generic;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            List<string> lstString = new List<String>(){
                                     "Monday",
                                     "Tuesday"
                                     };
            List<object> lstObject = lstString;

            Console.ReadLine();
        }
    }

}

What Arrays can do
  • Go to the msdn page for arrays to find the full set of functionalities
  • Array Features
    • Copying the array- x.CopyTo(y,0) where x & y are arrays of same types or Array.Copy
    • Rearranging elements -Array.Reverse, Array.Sort for reversing inplace. use x.Reverse() of System.Linq to return a reversed array and leaving the old array as it is.
    • Finding elements - Array.IndexOfor Array.BinarySearch
    • Sorting

      Lets talk about IComparer. An IComparer<T> says that i know how to compare 2 objects of type T.

      Take the example of comparing int

      
      using System;
      
      namespace BasicArrayOps
      {
          class Program
          {
              static void Main(string[] args)
              {
                  int x = 5;
                  int y = 10;
                  Console.WriteLine(y.CompareTo(x)); //outputs 1
                  Console.WriteLine(x.CompareTo(y)); //outputs -1
      
      
                  Console.ReadLine();
              }
          }
      
      }
      
      
      

      Another example of string

      
      using System;
      
      namespace BasicArrayOps
      {
          class Program
          {
              static void Main(string[] args)
              {
                  string x = "Monday";
                  string y = "Wednesday";
                  Console.WriteLine(x.CompareTo(y)); //outputs -1
      
                  Console.ReadLine();
              }
          }
      
      }
      
      
      As per the definition in msdn it says "Compares this instance with a specified Object and indicates whether this instance precedes, follows, or appears in the same position in the sort order as the specified string. It compares alphabetically. a comes before b so a.CompareTo(b) will give -1. It actually compares based on a lot of factors as mentioned here. See the "Remarks" section.

      But what about Complex types. How do we compare them? Usually we don't make a direct comparison of two complex types. But consider the case of Sorting a collection of the type. In this case we would need a way to compare them. Array.Sort accepts the array to be sorted and an IComparer. This IComparer will be of your type. Basically an IComparer says that i know how to compare 2 objects of a type.

      
      using System;
      using System.Collections.Generic;
      
      namespace BasicArrayOps
      {
          class Program
          {
              static void Main(string[] args)
              {
                  Employee e1 = new Employee { Name = "Suyash", EmpId = 2 };
                  Employee e2 = new Employee { Name = "Scott", EmpId = 1 };
      
                  Employee[] employees = new Employee[] { e1, e2 };
                  Array.Sort(employees, new EmployeeComparer());
      
                  foreach (var employee in employees)
                  {
                      Console.WriteLine(employee.Name);
                  }
      
                  Console.WriteLine();
      
      
                  foreach (var employee in employees)
                  {
                      Console.WriteLine(employee.Name);
                  }
      
      
                  Console.ReadLine();
              }
      
          }
      
      
          public class Employee
          {
              public string Name { get; set; }
              public int EmpId { get; set; }
          }
      
          public class EmployeeComparer : IComparer<Employee>
          {
              public int Compare(Employee x, Employee y)
              {
                  if (x.EmpId < y.EmpId)
                  {
                      return -1;
                  }
                  else if (x.EmpId>y.EmpId)
                  {
                      return 1;
                  }
                  return 0;
              }
          }
      
      }
      
      
      

      Notice that "Suyash" and "Scott" were not sorted in the first output, but sorted based on the employee id in the second output

  • LINQ extension methods - using System.Linq

Collection Interfaces
Module Overview
  • Interface hierarchy

    • Core generic interfaces
    • Read-only interfaces
    • Older non-generic interfaces
  • Specific interfaces
    • IEnumerable<T>
    • ICollection<T>
    • IList<T>
    • IDictionary<TKey, TValue>
    • ISet<T>
  • Explicit Implementation
Core Generic Interfaces
  • IEnumerable<T>. It says "You can iterate my elements". Linq hooks in here because most of the Linq extension methods are on IEnumerable<T> and its derived types.
  • ICollection<T>. It says "I know how many elements i have". A Type which implements this interface is really a collection. This is what represents an in memory data structure that has one or more elements.

    This differs from an IEnumerable<T>. A collection may have an enumerator through which we can iterate over the elements of a collection. It is possible to have a collection which may deny iteration over all the elements. It may only allow other Collection operations like "Looking up for an element" or other Writing operations.

    ICollection<T> has a property Count because it is supposed to know and do all the operations on a collection. However, Enumeration is just one of the operation that a collection can have. IEnumerable<T> does not have a Count property.

  • ICollection<T> doesn't say anything about the type of collection. Is it an IList<T>, an IDictionary<TKey, TValue> or an ISet<T>
ReadOnly Interfaces
These are new in .NET4.5
  • IReadOnlyCollection<T> It too derives from IEnumerable<T> but remember it is only for enumeration and nothing else like modification of collection.
  • IReadOnlyList<T>
  • IReadOnlyDictionary<T>
  • No Read Only collection for Set
Old Non-Generic Interfaces
  • IEnumerable Only thing worth noticing is that IEnumerable<T> is derived from IEnumerable. The reason for this was that you should always be able to cast a generic interface to the old non generic interface as such:

    Why would we want to do that? Well the answer is backward compatibility.

  • ICollection
  • IList
  • IDictionary
IEnumerable<T>

This just has one member.


IEnumerator<T> GetEnumerator()

Only purpose of a type which implements an IEnumerable<T> is to provide an enumerator. This enumerator is reponsible to enumerate over the elements

foreach can enumerate over an IEnumerable<T>

Note: foreach is not restricted to enumerate just the types which are IEnumerable<T>

ICollection<T>

ICollection<T> has a property Count and IEnumerable<T> has an extension method Count() provided by System.Linq. This method internally refers to Count property if available.

ICollection<T>.IsReadOnly lets you know if it is "OK" to add/remove/modify an element


using System;
using System.Collections.Generic;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            string[] daysOfWeek = {
                                      "Monday",
                                      "Tuesday",
                                      "Wednesday",
                                      "Thursday",
                                      "Friday",
                                      "Saturday",
                                      "Sunday"
                                  };
            ICollection<string> daysOfWeekCollection = daysOfWeek;

            if (!daysOfWeekCollection.IsReadOnly)
            {
                daysOfWeekCollection.Add("SlaveDay");
            }
            else
            {
                Console.WriteLine("daysOfWeekCollection is a readonly");
            }

            foreach (string dayOfWeek in daysOfWeekCollection)
            {
                Console.WriteLine(dayOfWeek);
            }

            Console.ReadLine();
        }

    }

}


Add on an ICollection<T> is available but may fail if the collection actually is a readOnly. However ICollection<T>.IsReadOnly does not restricts the modification of the collection element.


using System;
using System.Collections.Generic;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            string[] daysOfWeek = {
                                      "Monday",
                                      "Tuesday",
                                      "Wednesday",
                                      "Thursday",
                                      "Friday",
                                      "Saturday",
                                      "Sunday"
                                  };
            ICollection<string> daysOfWeekCollection = daysOfWeek;

            (daysOfWeekCollection as string[])[5] = "Slaveday";

            foreach (string dayOfWeek in daysOfWeekCollection)
            {
                Console.WriteLine(dayOfWeek);
            }

            Console.ReadLine();
        }

    }

}

Be careful about IsReadOnly property

Explicit Interface Implementation
Collection such as an array needs to be explicitly casted to an Interface to have some methods available.
IReadOnlyCollection<T>
IList<T>

It says "You can look up my elements with an index". So IList<T> is a collection that allows element lookup via index. Also the enumerator is available.

Arrays implement IList<T>

IReadOnlyList<T>

IDictionary<TKey, TValue>

It says "You can look up my elements with a key"

An IDictionary<TKey, TValue> is essentially a collection of KeyValuePair<TKey, TValue>

IReadOnlyDictionary<TKey, TValue>

indexer is obviously only readonly

ISet<T>
Lists
Overview
  • List<T>
    • Extensive API
    • Like array but with adding/removing elements
  • ReadOnlyCollection<T>
    • Read-only wrapper for lists
  • Collection<T>
    • Allows lists to be customized
  • ObservableCollection<T>
    • List with change notifications
List<T>

It is intended to give you all that an Array offers i.e. High performance, Index based. But with the bonus of Adding and Removing elements

I mentioned earlier that internally List<T> uses an T[]. What happens is that by default List<T> references to a T[] of size 8. This can be displayed by the List<T>.Capacity Property. When we start entering values in this List<T>, the List<T>.Count property increases it's value. When List<T>.Count wants to be greater than List<T>.Capacity then the old reference to T[] is discarded, starts pointing to another array T2[] and the values from T[] is copied over to T2[]. Now, List<T>.Count increases by 1 and List<T>.Capacity becomes 16. Let's see this in code.


using System;
using System.Collections.Generic;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            var squares = new List<int> { 1, 4, 9, 16, 25, 36, 49 };
            Console.WriteLine("squares.Count: "+squares.Count.ToString());
            Console.WriteLine("squares.Capacity: "+squares.Capacity.ToString());
            Console.ReadLine();
        }

    }
}


Lets Add 2 more values to the List<int> and see the List<int>.Count & List<int>.Capacity values change.


using System;
using System.Collections.Generic;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            var squares = new List<int> { 1, 4, 9, 16, 25, 36, 49 };
            Console.WriteLine("squares.Count: "+squares.Count.ToString());
            Console.WriteLine("squares.Capacity: "+squares.Capacity.ToString());
            Console.WriteLine();
            squares.Add(64);
            squares.Add(81);
            Console.WriteLine("squares.Count: " + squares.Count.ToString());
            Console.WriteLine("squares.Capacity: " + squares.Capacity.ToString());
            Console.ReadLine();
        }

    }




}


This may lead an idea of retrieving a value at an index that is not filled, but for which the capacity exists. This will throw an ArgumentOutOfRange exception.

When we remove an element from the List<T> the values after that value iterate till the end and gets copied over to previous index. If the value to be removed is at around the end, the operation will be faster. But if the value to be removed is at beginning, it will take a lot of time.

To avoid increasing List<T>.Capacity when we keep adding elements to the list, in case we already know an approximate number of elements that the list is about to have, we can initialize our list with an initial value of Capacity.


using System;
using System.Collections.Generic;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            var squares = new List<int>(12) { 1, 4, 9, 16, 25, 36, 49 };
            Console.WriteLine("squares.Count: "+squares.Count.ToString());
            Console.WriteLine("squares.Capacity: "+squares.Capacity.ToString());
            Console.ReadLine();
        }

    }




}


AsReadOnly() and ReadOnlyCollection<T>

AsReadOnly() can return a List<T> which will be readOnly. Let's see this in code.


using System;
using System.Collections.Generic;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            var squares = new List<int>(12) { 1, 4, 9, 16, 25, 36, 49 };

            var copy = squares.ToArray();//This provides a copy of my list but consider a list of million elements that will take a lot of time to get copied over

            //Instead use AsReadOnly() which without copying will just make the list Read Only for copy2

            var copy2 = squares.AsReadOnly();//Notice this returns a ReadOnlyCollection<int>
            var copy3 = new ReadOnlyCollection<int>(squares);//Another way to initialize a ReadOnlyCollection<int>

            //Trying to change a value
            copy2[0] = 0; // Throws a Compile time error.

            Console.ReadLine();
        }
    }
}


ReadOnlyCollection<T> is actually a Read Only List<T> so the name ReadOnlyCollection is a bit misleading.

Collection<T>

List<T> methods can not be overriden. But Collection<T> makes it possible to override some operations like what else should happen when an element is added/removed etc. Let's take an example of a Collection that only allows Non-Blank strings. Let's see this in code

InsertItem & SetItem are virtual methods.

The Interface methods Add(T item), Insert() etc. that come as a contract from the interfaces like IList<T> internally call InsertItem(int index, T item) which is provided as an extensibility point by Collection<T>. So i can create another class that inherits from Collection<T> and override these extensibility points. Lets see this in code.


using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            NonBlankStringList lst = new NonBlankStringList();
            lst.Add("Item added at index 0");
            lst.Add(String.Empty);//Throws exception
            lst[0] = " ";//Throws exception

            Console.ReadLine();
        }
    }

    public class NonBlankStringList : Collection<string>
    {
        protected override void InsertItem(int index, string item)
        {
            if (String.IsNullOrWhiteSpace(item))
            {
                throw new ArgumentException("Elements of NonBlankStringList must not be null or whitespace");
            }
            base.InsertItem(index, item);
        }

        protected override void SetItem(int index, string item)
        {
            if (String.IsNullOrWhiteSpace(item))
            {
                throw new ArgumentException("Elements of NonBlankStringList must not be null or whitespace");
            }
            base.SetItem(index, item);
        }
    }
}


ObservableCollection<T>

It is designed to be a List that provides change notifications if anything in the list changes.


using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            ObservableCollection<string> presidents = new ObservableCollection<string>{
                "Jimmy Carter",
                "Ronald Reagan",
                "G W Bush"
            };

            foreach (var president in presidents)
            {
                Console.WriteLine(president);
            }

            presidents.CollectionChanged += presidents_CollectionChanged;

            presidents[0] = "Bill Clinton";

            Console.ReadLine();
        }

        static void presidents_CollectionChanged(object sender, System.Collections.Specialized.NotifyCollectionChangedEventArgs e)
        {
            Console.WriteLine();
            Console.WriteLine("President Changed");
            Console.WriteLine();
        }
    }


}


ObservableCollection<T> actually uses the extensibility points offered by Collection<T>

Linked Lists, Stacks and Queues

They are lists but they don't provide index based element lookup because they are designed for specific data structures which are Linked Lists, Stacks and Queues

LinkedList<T> are used for the scenarios where a lot of Adding or Removing of elements happen

LinkedList<T> allows to insert an element in between. But so does List<T>.Insert method. The reason we would favor LinkedList<T> is that it takes very few operations (4) to insert an element in between

LinkedList<T> & LinkedListNode<T>

LinkedListNode<T> is required for adding items to a linked list

Collection Initializer is not available for LinkedList<T>

Stack<T> is a wrapper around T[]

Queue<T> is a little more complex wrapper around T[]

Dictionaries
  • Dictionary<TKey, TValue>
    • General purpose dictionary
  • ReadOnlyDictionary<TKey, TValue>
  • SortedList<TKey, TValue> & SortedDictionary<TKey, TValue>
    • Dictionaries that sort their elements
  • KeyedCollection<TKey, TValue>
    • Customizable
  • Hash tables
    • Hash codes
Dictionary<TKey, TValue>

Just a basic implementation of IDictionary<TKey, TValue>

General Working, Let's see the code


using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            var harryPotterMovies = new Dictionary<String, HarryPotterMovie>()
            {
                {"PS",  new HarryPotterMovie{Name="Harry Potter and the Philosopher's Stone", ReleaseYear=2001}},
                {"CS",  new HarryPotterMovie{Name = "Harry Potter and the Chamber of Secrets", ReleaseYear=2002}},
                {"PA",  new HarryPotterMovie{Name = "Harry Potter and the Prisoner of Azkaban", ReleaseYear=2004}},
                {"GF",  new HarryPotterMovie{Name="Harry Potter and the Goblet of Fire", ReleaseYear=2005}},
                {"DH1", new HarryPotterMovie{Name="Harry Potter and the Deathly Hallows: Part 1", ReleaseYear=2010}}
            };

            foreach (var item in harryPotterMovies)
            {
                Console.WriteLine(item);
            }

            Console.ReadLine();
        }
    }

    public class HarryPotterMovie
    {
        public String Name { get; set; }
        public Int32 ReleaseYear { get; set; }

        public HarryPotterMovie()
        {

        }

        public HarryPotterMovie(String _name, Int32 _releaseYear)
        {
            Name = _name;
            ReleaseYear = _releaseYear;
        }

        public override string ToString()
        {
            return String.Format("{0}, Release: {1}", Name, ReleaseYear);
        }
    }
}


Notice the output is specially formatted with square brackets. This is because the element in the foreach loop is a KeyValuePair<TKey, TValue> whose ToString() method is overridden to display key and value within square brackets.

In case we want to access somthing that is not available in the Dictionary, a KeyNotFoundException exception is thrown. To avoid this exception we use the TryGetValue method. Let's see this in code.


using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            var harryPotterMovies = new Dictionary<String, HarryPotterMovie>()
            {
                {"PS",  new HarryPotterMovie{Name="Harry Potter and the Philosopher's Stone", ReleaseYear=2001}},
                {"CS",  new HarryPotterMovie{Name = "Harry Potter and the Chamber of Secrets", ReleaseYear=2002}},
                {"PA",  new HarryPotterMovie{Name = "Harry Potter and the Prisoner of Azkaban", ReleaseYear=2004}},
                {"GF",  new HarryPotterMovie{Name="Harry Potter and the Goblet of Fire", ReleaseYear=2005}},
                {"DH1", new HarryPotterMovie{Name="Harry Potter and the Deathly Hallows: Part 1", ReleaseYear=2010}}
            };

            foreach (var item in harryPotterMovies)
            {
                Console.WriteLine(item);
            }

            Console.WriteLine();

            Console.WriteLine(harryPotterMovies["DH2"]);// KeyNotFound Exception

            HarryPotterMovie movie;
            var found = harryPotterMovies.TryGetValue("DH2", out movie);
            if (found)
            {
                Console.WriteLine("{0}, Release: {1}", movie.Name, movie.ReleaseYear);
            }
            else
            {
                Console.WriteLine("Movie not found");
            }


            Console.ReadLine();
        }
    }

    public class HarryPotterMovie
    {
        public String Name { get; set; }
        public Int32 ReleaseYear { get; set; }

        public HarryPotterMovie()
        {

        }

        public HarryPotterMovie(String _name, Int32 _releaseYear)
        {
            Name = _name;
            ReleaseYear = _releaseYear;
        }

        public override string ToString()
        {
            return String.Format("{0}, Release: {1}", Name, ReleaseYear);
        }
    }
}


Adding/Modifying/Removing the Dictionary<TKey, TValue> Let's see this in code.


using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            var harryPotterMovies = new Dictionary<String, HarryPotterMovie>()
            {
                {"PS",  new HarryPotterMovie{Name="Harry Potter and the Philosopher's Stone", ReleaseYear=2001}},
                {"CS",  new HarryPotterMovie{Name = "Harry Potter and the Chamber of Secrets", ReleaseYear=2002}},
                {"PA",  new HarryPotterMovie{Name = "Harry Potter and the Prisoner of Azkaban", ReleaseYear=2004}},
                {"GF",  new HarryPotterMovie{Name="Harry Potter and the Goblet of Fire", ReleaseYear=2005}},
                {"DH1", new HarryPotterMovie{Name="Harry Potter and the Deathly Hallows: Part1", ReleaseYear=2010}}
            };



            harryPotterMovies.Add("DH2", new HarryPotterMovie { Name = "Harry Potter and the Deathly Hallows: Part2", ReleaseYear = 2011 });// This will add the value.

            harryPotterMovies["DH2"] = new HarryPotterMovie { Name = "Harry Potter and the Deathly Hallows: Part2", ReleaseYear = 2011 };// Indexer can add a value

            harryPotterMovies["DH1"] = new HarryPotterMovie { Name = "Harry Potter and the Deathly Hallows: The First Part", ReleaseYear = 2011 };// Indexer can also change the value

            harryPotterMovies.Add("DH1", new HarryPotterMovie { Name = "Foo", ReleaseYear = 2015 }); //This throws an ArgumentException because a dictionary must have unique Keys

            harryPotterMovies.Remove("DH1");//This will remove the Dictionary element with the Key "DH1"

            foreach (var item in harryPotterMovies)
            {
                Console.WriteLine(item);
            }

            Console.ReadLine();
        }
    }

    public class HarryPotterMovie
    {
        public String Name { get; set; }
        public Int32 ReleaseYear { get; set; }

        public HarryPotterMovie()
        {

        }

        public HarryPotterMovie(String _name, Int32 _releaseYear)
        {
            Name = _name;
            ReleaseYear = _releaseYear;
        }

        public override string ToString()
        {
            return String.Format("{0}, Release: {1}", Name, ReleaseYear);
        }
    }
}


Note that a dictionary must have unique keys.

Dictionary<TKey, TValue>.Keys & Dictionary<TKey, TValue>.Values are enumerable collections. They can be used when you are working with a dictionary provided by a third party

Keys of Dictionary<Tkey, TValue> are case sensitive

Comparing Keys with IEqualityComparer<T>

Keys of Dictionary<Tkey, TValue> are case sensitive. So trying to access a dictionary element by providing a key with wrong casing will throw an exception.

To fix this we need to use an overload of Dictionary<TKey, TValue> which accepts an IEqualityComparer<T>. Microsoft provides one such comparer by the StringComparer class. StringComparer has several static methods one of which is StringComparer.InvariantCultureIgnoreCase which creates a Dictionary<TKey, TValue> with case insensitive keys. Let's see this in code


using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            var harryPotterMovies = new Dictionary<String, HarryPotterMovie>(StringComparer.InvariantCultureIgnoreCase)
            {
                {"PS",  new HarryPotterMovie{Name="Harry Potter and the Philosopher's Stone", ReleaseYear=2001}},
                {"CS",  new HarryPotterMovie{Name = "Harry Potter and the Chamber of Secrets", ReleaseYear=2002}},
                {"PA",  new HarryPotterMovie{Name = "Harry Potter and the Prisoner of Azkaban", ReleaseYear=2004}},
                {"GF",  new HarryPotterMovie{Name="Harry Potter and the Goblet of Fire", ReleaseYear=2005}},
                {"DH1", new HarryPotterMovie{Name="Harry Potter and the Deathly Hallows: Part1", ReleaseYear=2010}}
            };


            Console.WriteLine(harryPotterMovies["ps"]);//No exception thrown when dictionary is constructed using StringComparer.InvariantCultureIgnoreCase


            Console.ReadLine();
        }
    }

    public class HarryPotterMovie
    {
        public String Name { get; set; }
        public Int32 ReleaseYear { get; set; }

        public HarryPotterMovie()
        {

        }

        public HarryPotterMovie(String _name, Int32 _releaseYear)
        {
            Name = _name;
            ReleaseYear = _releaseYear;
        }

        public override string ToString()
        {
            return String.Format("{0}, Release: {1}", Name, ReleaseYear);
        }
    }
}


But in any case lets create our own IEqualityComparer<T>. Before we we implement our own IEqualityComparer<T> let's see the interface once:

We see a method GetHashCode. To understand this we first need to understand how a Dictionary<TKey, TValue> works internally.

  • A Dictionary<TKey, TValue> has a lot of buckets in which the KeyValuePair<TKey, TValue> are kept.
  • When a new KeyValuePair<TKey, TValue> is added to the dictionary, it is passed through Some Algorithm which decides the bucket for this KeyValuePair<TKey, TValue.
  • One bucket can contain many KeyValuePair<TKey, TValue> and the Dictionary<TKey, TValue> can contain many buckets
  • These buckets internally implement LinkedList<T> so overall adding/removing of a KeyValuePair<TKey, TValue> is very fast
  • On Key Lookup the key is passed to the Some Algorithm to identify the bucket. Key lookup happens in the bucket one by one and when found it is returned.
  • This Some Algorithm actually uses the Object.GetHashCode() to identify the bucket. In fact every object in c# has GetHashCode() so that it can be used to insert and identify an object in hash-based collection such as the Dictionary<TKey, TValue>, the Hashtable class or a type derived from the DictionaryBase class. HashCode is a numeric value.

Coming back to our IEqualityComparer<T>. Lets see the code.


using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            var harryPotterMovies = new Dictionary<String, HarryPotterMovie>(new UncasedStringEqualityComparer())
            {
                {"PS",  new HarryPotterMovie{Name="Harry Potter and the Philosopher's Stone", ReleaseYear=2001}},
                {"CS",  new HarryPotterMovie{Name = "Harry Potter and the Chamber of Secrets", ReleaseYear=2002}},
                {"PA",  new HarryPotterMovie{Name = "Harry Potter and the Prisoner of Azkaban", ReleaseYear=2004}},
                {"GF",  new HarryPotterMovie{Name="Harry Potter and the Goblet of Fire", ReleaseYear=2005}},
                {"DH1", new HarryPotterMovie{Name="Harry Potter and the Deathly Hallows: Part1", ReleaseYear=2010}}
            };


            Console.WriteLine(harryPotterMovies["ps"]);//No exception thrown when dictionary is constructed using StringComparer.InvariantCultureIgnoreCase


            Console.ReadLine();
        }
    }

    public class HarryPotterMovie
    {
        public String Name { get; set; }
        public Int32 ReleaseYear { get; set; }

        public HarryPotterMovie()
        {

        }

        public HarryPotterMovie(String _name, Int32 _releaseYear)
        {
            Name = _name;
            ReleaseYear = _releaseYear;
        }

        public override string ToString()
        {
            return String.Format("{0}, Release: {1}", Name, ReleaseYear);
        }
    }

    public class UncasedStringEqualityComparer : IEqualityComparer<String>
    {

        public bool Equals(string x, string y)
        {
            return x.ToUpper() == y.ToUpper();
        }

        public int GetHashCode(string obj)
        {
            return obj.GetHashCode();
        }
    }
}


But this code throws an exception "KeyNotFound". What happened?

Well we know that "PS" & "ps" are essentially 2 different strings. So, ("PS").GetHashCode() & ("ps").GetHashCode() will be different. When we added PS to our Dictionary, the Some Algorithm used ("PS").GetHashCode() and moved the value to Bucket1. On Lookup Some Algorithm will produce a different value for ("ps").GetHashCode() and will lookup in another Bucket2. So a KeyNotFound exception.

So, in our custom IEqualityComparer<T> we will implement GetHashCode() as such:


return obj.ToUpper().GetHashCode();

And now if we run our code, it works just fine!

Something about HashCodes

  • If two objects are equal, they must return the same hash code

ReadOnlyDictionary<TKey, TValue>

Wrapper around Dictionary<TKey, TValue> with only read only properties

Add method is not exposed

To initialize it pass the Dictionary<TKey, TValue> to the constructor.

SortedList<TKey, TValue>

Similar as dictionary, but on enumeration it will output elements in a sorted order of the Key.

The name contains "List", but it is actually a "Dictionary". The name contains "List" because remember List is something that has something to do with the order.

Modifications are slow because on adding/removing the list needs to get sorted again.

Pass other collections to initialize

Constructor accepts an IComparer<T>. We have already seen how to create an IComparer<T>

SortedDictionary<TKey, TValue>

In terms of functionality SortedDictionary<TKey, TValue> is exactly same as SortedList<TKey, TValue>

The difference is in the internal implementation. SortedDictionary<TKey, TValue> uses a data structure called "Balanced Tree"

Faster than SortedList<TKey, TValue> in terms of adding/removing/modifying/lookup

KeyedCollection<TKey, TValue>

Notice the Key is being repeated in both the Key and the Value. Change the code as per below:

This is not the only reason why we use KeyedCollection<TKey, TValue. We want to use it because our Dictionary is now customizable. See the overridable methods.

It is not only a Dictionary. It is a List too!!. This means as well as looking up for element by Key, we can also look up by key. This results in a problem when the key is int. If we cast it to a List and then use the indexer for lookup using int indexes, it works.

Sets
Module Overview
  • HashSet<T>
  • SortedSet<T>
  • The ISet<T>
    • Set Operations
    • Set comparisons
  • Uniqueness of Elements
HashSet<T>

HashSet<T> is a set as it implements ISet<T>. But is is important for another reason. It represents a collection that has no exposed property of order or location of an element. It just represents what a collection contains and what it doesn't. Well, this can be done by a List<T> too. But remember a List<T> can have duplicate values; wheras a HashSet<T> will not. It provides uniqueness.

Its internal implementation is somewhat like a Dictionary<TKey, TValue. i.e. buckets. Only difference is that the buckets have values not keys.

Let's see the code


using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            var cities = new List<string>()
            {
                "Delhi",
                "Mumbai",
                "Chennai",
                "Kolkata"
            };

            cities.Add("Kolkata");

            foreach (var item in cities)
            {
                Console.WriteLine(item);
            }

            Console.WriteLine();

            var hashCities = new HashSet<string>()
            {
                "Delhi",
                "Mumbai",
                "Chennai",
                "Kolkata"
            };

            hashCities.Add("Kolkata");

            foreach (var item in hashCities)
            {
                Console.WriteLine(item);
            }

            Console.ReadLine();
        }
    }


}


If you notice adding another "Kolkata" didn't throw an error. a HashSet<T> checks the bucket if the value already exists or not. Adds if not. So there is Good performance for checking if an element is already in the set.

So there is a guarantee that there can not be a value more than once in a HashSet<T>

The HashSet<T>.Add method returns a Boolean that represents if the value was added or not(based on if it already existed in the bucket or not)

Another business issue can be that "Mumbai" & "mumbai" are 2 different strings for HashSet<T>. To avoid this there is an extensibility point. Provide an IEqualityComparer<T> to the constructor.

Intersection and IntersectsWith()

IntersetsWith() modifies the set inline and accepts any collection as the 2nd parameter

Intersection is a System.Linq extension method. It returns a new Enumerable. This works with any collection and not just HashSet<T>

Union, Difference and Symmetric Difference

Union & UnionWith

SymmetricExceptWith: Find values that are in ONLY ONE collection

ExceptWith, Except: Find values that are n ONLY THE FIRST collection

Comparing Elements and SetEquals()

Set Comparisons and Subsets
SoretedSet<T>

Instead of using a Hashset, it uses a balanced tree

Enumerating elements gives them in order

To distinguish between upper case and lower case, pass a IComparer<T>

Enumerators

Notice for every collection, foreach works in the same way.

  • Iterating under the hood
    • IEnumerable<T>
    • IEnumerator<T>
  • the foreach loop : How it works using enumerators
  • Enumerating collections that change
  • Writing your own enumerators
  • Enumerator covariance

Enumerators and IEnumerator<T>

The only contract of IEnumerable<T> is GetEnumerator(). It will be the job of this enumerator to enumerte over.

MoveNext() will iterate and Current will get the current element. MoveNext() returns a Boolean. true if it has moved successfully to the next item

There is no specific order in which the items will be returned.

foreach loop

But if the collection is an array, it internally uses a for loop.

Why Don't Collections Enumerate themselves

Why is the enumerator a completely separate object from a collection. Each collection internally has its own collection implementation. There could have been a method called enumerate in the collection interface itself. In the implementation of that method, use a for loop to iterate over the internal implementation.

The reason is : "What happens if you want multiple clients to enumerate the same collection". Example: 2 threads calling the same collection. Or a nested foreach to the same collection. You don't want MoveNext() of the internal foreach to make a change for the outer foreach. This is why a collection should spit a new enumerator instance.

Modifying while Enumerating

Enumerators can only be used to read the contents of a collection. They can not be used to modify which elements are in a collection. IEnumerator<T> does not exposes any method that allows you to modify a collection. But it is possible to modify the collection by some other code even when the collection is being enumerated. Actually if the collection is modified while being enumerated, it throws an exception saying that the collection can no longer be enumerated over because the collection is modified.

Most collections internally store a version number that describes that a collection was changed.

This works for arrays. because they are not enumerated over using IEnumerator<T>.

Writing your own Enumerator

using System;
using System.Collections;
using System.Collections.Generic;
using System.Collections.ObjectModel;

namespace BasicArrayOps
{
    class Program
    {
        static void Main(string[] args)
        {
            AllDaysOfWeek daysOfWeek = new AllDaysOfWeek();
            foreach (var day in daysOfWeek)
            {
                Console.WriteLine(day);
            }


            Console.ReadLine();
        }
    }

    public class AllDaysOfWeek : IEnumerable<string>
    {

        public IEnumerator<string> GetEnumerator()
        {
            yield return "Monday";
            yield return "Tuesday";
            yield return "Wednesday";
            yield return "Thursday";
            yield return "Friday";
            yield return "Saturday";
            yield return "Sunday";
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }
    
}


Enumerable Covariance
  • Array covariance is allowed implicitly.
  • Other collections covariance is not allowed directly
  • but it is allowed for IEnumerable<T> just for reading. It is not allowed for writing.
  • It is also allowed for readonly interfaces.
Multidimensional Arrays and Bounds
  • Multidimensional Arrays
    • What they are
  • Rank and Bounds
    • Different array index ranges
  • Jagged Arrays
    • What they are
Multidimensional Arrays

  • More than one index
  • Only available on array collection
  • Syntax : float[,] floatArray2d = new float[4,3] or float[,,] floatArray3d = new float[2,3,4] and so on.
  • The number of indices required is the dimension or the Rank of an array.

Length and Rank
  • Array.GetLength(index)
  • Array.Rank
Bounds, GetLowerBound() and GetUpperBound()
  • Array.GetLowerBound(), Array.GetUpperBound()
Jagged Arrays

Array of Arrays

float[][] floatArrayJagged

float[][] tempsGrid = new float[4][3] Throws a compiler error. This is because i am declaring the outer array here. The outer array only knows that it is to store arrays of floats. It doesn't know anything about the length of the inner arrays because array length is not part of the type, so remove the 3. float[][] tempsGrid = new float[4][]

Jagged vs Multidimensional

concept applies to arrays

concept applies to all other collections