Filtering and Sorting Collections with LINQ

For as long as databases have existed, there have been ways to query the information in those databases. Queries are used to filter for specific records or sort the output in a particular order. If you have a table in your database with customer information, you may want to view all the customers in a particular state. You may want to sort the data so that the customers that spend the most money show up first.

LINQ provides developers with a way to perform the same sorting and filtering functions with collections. Perhaps you have a list of employee names that you want to sort alphabetically. Maybe, you have a dictionary with inventory items, and you only want to display items with an inventory greater than zero. LINQ will allow you to do these actions without requiring you to loop through the collection.


In order to use LINQ in your code, make sure you have the following using statement at the top of your class.

using System.Linq;

There are two types of syntax you can use when using LINQ. You can interchange them in your application if you choose, but ideally, you should stick with one syntax to improve the readability of your code. You learn about the different syntax types later in this post. First, let’s discuss the Query Operators you can use with either syntax.

Query Operators

Query Operators are technically extension methods of System.Linq classes. For the sake of simplicity, you can think of them as keywords that provide the functionality to your LINQ queries. The type of functionality they provide depends on the operator. Following is a limited list of the more common operators. Code examples of the queries can be found later in this post.


For filtering data, the WHERE operator is the most common. For grouping data, the GROUPBY operator is used. You can join collections together into a single query with the JOIN operator.


There are several operators for sorting. You can use ORDERBY or ORDERBYDESCENDING to determine how to order the collection. THENBY and THENBYDESCENDING will allow you to add additional sorting options. You can sort the collection in the opposite order that the data was entered with the REVERSE operator.


You can aggregate the data in your collection with the AVERAGE, SUM, and COUNT operators. MIN and MAX will return the smallest or largest value in the collection. If you only want to return a single record, you can use FIRST, FIRSTORDEFAULT, LAST, or LASTORDEFAULT. The “…ORDEFAULT” operators will return the default value for the collection element if it can’t find a record to return.


Now that you understand some of the things you can do with LINQ queries, it’s time to learn how to write them.

The VAR data type

Before we get into the details of LINQ statements, we need to take a moment to discuss the VAR data type.


C# is a strongly-typed programming language, which means that you must declare the data type for the variable whenever you declare it. That being said, there are times when you may not be entirely sure what type of data a variable will hold at the time you declare it. For those instances, the VAR data type can be used. There is a constant debate among developers regarding when and how often to use VAR. For the sake of this blog, I’m only going to recommend using it when it aids you in understanding a concept without going too deep into concepts that are beyond your skill level at this time.


Learning about LINQ is one of those times. The results of LINQ queries are usually of an IEnumerable or IQueryable type. Instead of going into details about those types at this time, I will be using the VAR keyword and would advise you to do the same.


When you use the VAR datatype, Visual Studio will assign the data type at the time the variable receives its value.


For example:

 var age = 12;
 var name = "Scott";
 var bDay = DateTime.Now;

At the moment these variables get assigned a value “age” becomes an integer, “name” becomes a string, and “bDay” becomes a DateTime.

Query Syntax

FROM Clause

When querying databases, the familiar syntax to use is SQL, which stands for Structured Query Language. The Query syntax for LINQ is very similar to SQL. When using Query syntax, the statement always starts with a FROM clause. This clause tells the application from which collection you are retrieving data. There are four parts to a FROM clause.


FROM Keyword + element variable + IN keyword + collection variable


  • FROM and IN are keywords used in the statement.

  • The collection variable is the name of the collection you are querying

  • The element variable is a variable that will store the element that LINQ is evaluating. It is very similar to the variable you create in a FOREACH loop.

List<string> inventoryItems = new List<string>()
{ "rope","wood stick", "metal hook", "paper" };
 
 var rITems = from s in inventoryItems

Query Operators

After the FROM clause, you place the query operators to be used in filtering and/or sorting the data.


The most common one is the WHERE operator. This operator is used to filter the collection based on a condition you determine.

 var rITems = from s in inventoryItems 
 where s.StartsWith("r") 

SELECT Clause

After adding the operators you want to use for the query, you need to add the SELECT clause. The SELECT clause tells LINQ what value to store in the result variable.

 var rITems = from s in inventoryItems 
 where s.StartsWith("r")
 orderby s descending
 select s;

Translation

If you were to translate this query to English, the translation would be


“Using the inventoryItems collection individually assign each element to the “s” variable. Once it is assigned to “s”, check to see if it starts with the letter “r”. If it does, add the value of the “s” variable into the rItems variable.”


Enhancing the Query

You can add multiple conditions to your WHERE clause.

 var rITems = from s in inventoryItems 
 where s.StartsWith("r") || s.EndsWith("r") 
  select s;

You can also add more query operators to the statement.

 var rITems = from s in inventoryItems 
 where s.StartsWith("r") || s.EndsWith("r")
 orderby s descending
 select s;

You can chose to only add part of the value into the new variable

 var rITems = from s in inventoryItems 
 where s.StartsWith("r") || s.EndsWith("r")
 orderby s descending
 select s.Substring(0,2);

Method Syntax

Although using the Query Syntax is generally easier read. The Method Syntax tends to be easier to implement. One reason for this is because some of the extension methods provided by LINQ need very little additional code.


For example, if you have a list of items and you wanted to reverse their order, you would simply use the Reverse method.

 List<string> inventoryItems = new List<string>()
 { "rope","wood stick", "metal hook"};
 
 inventoryItems.Reverse(); 
 Console.WriteLine(inventoryItems[0]);

When the code is executed, “metal hook” is displayed. Now that I’m using LINQ, I don’t even have to use the zero index to display the first elements of the list. The First method will do this for me. If I were unsure whether or not inventoryItems had any elements, I would use FirstOrDefault.

 // Will display the first element 
 // Will crash if no elements exist
 Console.WriteLine(inventoryItems.First());
 
 // Will display the first element 
 // Will display an empty string if no elements exist
 Console.WriteLine(inventoryItems.FirstOrDefault()); 

You can use the Min and Max methods to find the highest and lowest values in a collection.

 List<double> prices = new List<double>()
  { 5.2,77,235.33,42.8,12,.5};
 
 Console.WriteLine($"The highest price is {prices.Max()}.");
 Console.WriteLine($"The lowest price is {prices.Min()}.");

You can use the Sum and Average methods to aggregate the information.

 Console.WriteLine($"The sum of all prices is {prices.Sum()}.");
 Console.WriteLine($"The average price is {prices.Average()}.");

Using these methods like this are great if you want the values for the entire collection. However, sometimes you only want to use a subset of data. For example, you may want the first item in your inventory that starts with the letter “b”. Maybe you want to count the number of prices that are greater than 10. To use the Method Syntax to resolve these scenarios, you will need to use a Lamda expression.


Lambda Expressions

In simplified terms, when dealing with LINQ, a Lambda Expression is a shorthand version of the query syntax you read about earlier in this post. Instead of using multiple statements to perform your query, a single expression is used.


The general syntax for lambda expressions is:


Collection.Method(Element Variable => Condition Expressions)


If we take our query syntax from earlier:

 var rITems = from s in inventoryItems 
 where s.StartsWith("r")
 select s;

Which translated to “Using the inventoryItems collection individually assign each element to the “s” variable. Once it is assigned to “s”, check to see if it starts with the letter “r”. If it does, add the value of the “s” variable into the rItems variable.”


We can perform the same query using the method syntax.

 var newItems = inventoryItems.Where(s => s.StartsWith("r"));

Although it may seem a little more confusing to read, it performs the same function with fewer lines of code. You are still storing the element into a variable “s”, and you are providing the same condition check. You are then storing the value of “s” in the new variable.


Enhancing the Expression

Just as with queries, you can enhance a lambda expression by adding conditions checks, providing sorting instructions, or returning only part of the value.

 List<string> inventoryItems = new List<string>()
 { "rope","wood stick", "metal hook","paper"};
 
 var rItems = inventoryItems
 .Where(s => s.StartsWith("r") || s.EndsWith("r"))
 .OrderByDescending(s => s)
 .Select(s => s.Substring(0,2));

When this code is executed, LINQ will filter the collection for any item that starts or ends with “r”. It will then order them in reverse alphabetical order. After that it will insert the first two letters of the string into rItems.


This is the same functionality of the following query syntax

 var rITems = from s in inventoryItems 
 where s.StartsWith("r") || s.EndsWith("r")
 orderby s descending
 select s.Substring(0,2);

When the FOREACH runs, it will only display two items “ro” and “pa”.

 foreach (var item in rITems)
 {
 Console.WriteLine(item);
 }

Combining Methods

The true power of LINQ comes when you combine the methods to find the information you are seeking. Let’s add some logic to the methods we talked about at the beginning of this section using the following lists.

 List<string> inventoryItems = new List<string>()
 { "rope","wood stick", "metal hook"};
 
 List<double> prices = new List<double>()
  { 5.2,77,235.33,42.8,12,.5};

How would we display the first item in our inventory that starts with the letter “b”?

string firstBItem = inventoryItems.Where(s => s.StartsWith("b")).FirstOrDefault();

You must use the OrDefault methods when you are filtering your data, for the potential of not finding a matching record is there.


How would you find the number of prices than 10?

int numberOfHighPrices = prices.Where(n => n > 10).Count();

Notice that I used the letter “n” in this lambda expression. It doesn’t matter what you call the variable that will hold the element variable. Generally, a single character is used to reduce the amount of typing needed when writing an expression. There are different schools of thought about what those characters should be.


Personally, I use “s” if I’m dealing with strings, “n” for numbers, “d” for dates, and “o” if I’m using a collection of objects.


Why use LINQ

Hopefully, after reading this post, you understand the benefit of using LINQ. If you still don’t see the value take a look at the code that would have to be written to replace this LINQ statement:

int numberOfHighPrices = prices.Where(n => n > 10).Count();

Without LINQ, your code would look something like this:

 int numberOfHighPrices = 0;
 foreach (double price in prices)
 {
 if (price>10)
 {
 numberOfHighPrices++;
 }
 }

As you can see, even a simple LINQ statement would require several lines of code. Some of the more complex queries in this post would take considerably more code without LINQ.

144 views0 comments

Recent Posts

See All