In this post, we will read a Comma Separated Value (CSV) file using LINQ and perform some calculations on the data.
Create
a Console Application. Right click the project > Add > New File
> Select a text template and rename it to Sample.csv. Add the
following data in the CSV file
The
data shown above shows Quarter wise sales by each SalesPerson. There
are 5 columns. The first column is the SalesPersonID and the rest
represents total items sold by the salesperson in each quarter.
We will now use LINQ to query this data. Write the following code:
using System; using System.Linq; using System.IO; using System.Collections.Generic; namespace CSVLINQ { class Program { static void Main(string[] args) { IEnumerable<string> strCSV = File.ReadLines(@"../../Sample.csv"); var results = from str in strCSV let tmp = str.Split(',') .Skip(1) .Select(x => Convert.ToInt32(x)) select new { Max = tmp.Max(), Min = tmp.Min(), Total = tmp.Sum(), Avg = tmp.Average() }; // caching for performance var query = results.ToList(); foreach (var x in query) { Console.WriteLine( string.Format("Maximum: {0}, " + "Minimum: {1}, " + "Total: {2}, " + "Average: {3}", x.Max, x.Min, x.Total, x.Avg)); } Console.ReadLine(); } } }
Shown
above is an example that calculates the Max, Min, Sum and Average on
the rows of a .CSV file. We start by skipping the SalesPersonID column
and take the rest. Then each string is converted to an int and the
entire sequence is selected as one row in ‘results’. Since ‘results’ is
an IEnumerable, the query is not executed till we read from it. For
large csv files, you can cache the results for performance gain. Finally
loop and print the values.
Note:
Make sure to handle empty spaces, errors and other characters in the
CSV file. Check the different overloads of the Split() method.
OUTPUT
No comments:
Post a Comment