Tuesday, September 3, 2013

DataTable performance comparison & recomendation

Last Friday night I was doing a performance test on DataTable class and literally felt an earthquake.
I tried four different methods to fetch data from an instance of an arbitrary DataTable instance, matching arbitrary criteria.
I tried with 3K to 1 million records in the DataTable and performed 500-5000 iteration of data selection using each method.
  • DataTable.Select()
  •  Linq
  •  PLinq
  • Traditional (by literally writing code to loop around the DataTable and use if/else clause to find match)
Download the performance test project.

Traditional method was the fastest of them all.
Not very far behind the Traditional method were Linq & PLinq.
PLinq however seemed to have adverse effect over Linq if the volume of data in the DataTable was rather low.

The DataTable.Select() was the slowest and in fact poorly slow.
It was 19-24 times slower than the others.
i.e If something takes 20mins using the other methods, it would take 6-8 hours using DataTable.Select() method.

Resolution:
I created an extension class for DataTable with four extension methods with the following signature to handle most of the use cases where we are using DataTable.Select() method.
DataRow FindFirst(this DataTable table, Func<DataRow, bool> criteria);
DataRow[] FindAll(this DataTable table, Func<DataRow, bool> criteria);
bool Exists(this DataTable table, Func<DataRow, bool> criteria);
DataTable Extract(this DataTable table, Func<DataRow, bool> criteria);

The implementation of the extension class/methods is available for download here.

Scenario 1: Where we need all the rows matching a certain criteria.
       Using DataTable.Select()
DataRow[] rows = dt.Select(“A=’123’ AND B=’XYZ’”);

Using the new extension method
DataRow[] rows = dt.FindAll((r)=>r[“A”].Equals(“123”) && r[“B”].Equals(“XYZ”));

Scenario 2: Where we need to check the existence of retain record
       Using DataTable.Select()
bool isPresent = dt.Select(“A=’123’ AND B=’XYZ’”).Length>0;

Using the new extension method
bool isPresent = dt.Exists((r)=>r[“A”].Equals(“123”) && r[“B”].Equals(“XYZ”));

Scenario 3: Where we need to get first matching record (usually done on sorted DataTable)
       Using DataTable.Select()
DataRow matchingRecord = dt.Select(“A=’123’ AND B=’XYZ’”).Rows[0];

Using the new extension method
bool isPresent = dt.FindFirst((r)=>r[“A”].Equals(“123”) && r[“B”].Equals(“XYZ”));

Scenario 4: Where we need get all the matching rows in form of another DataTable instance
       Using DataTable.Select()
DataTable dtMatch = SomeUtil.ConvertToDataTable(dt.Select(“A=’123’ AND B=’XYZ’”));

Using the new extension method
DataTable dtMatch = dt.Extract((r)=>r[“A”].Equals(“123”) && r[“B”].Equals(“XYZ”));


Please do not use DataTable.Select() going forward, use the Linq or Traditional method instead.