Background Color:
 
Background Pattern:
Reset
Wednesday, October 18, 2017
menu click
Search

CSV File handling in VB.Net or C#

Response to a VBProfessionals LinkedIn forum question

Author: Host/Thursday, October 10, 2013/Categories: Code Blog, C# Solutions, VB Solutions

Rate this article:
No rating

On a LinkedIn forum, someone asked the following:

I want to import a CSV file from a network drive into sql server table using vb.net in a WinForm. I thought this would be easy but have not found a simple solution yet. The CSV file does have field names.

While it is not not a terribly difficult task to import a comma delimited file into a Net project, once imported the separating into fields then writing the results back to a data store like SQL-Server does have some complexity. 

And several people have their own way of approaching it.  What I have here in this article is a solution that fits a particular requirement I encountered recently that fit the above question close enough that I thought a response would make a nice blog article.

The original solution code was written in C# so I am going to provide both a C# version and a VB.Net version

The first issue is to get the CSV file loaded into the application.  Not rocket science but I will provide the code here anyway.

 

    ''' 
    ''' Gets CSV Lines from the specified file path
    ''' 
    ''' 
    ''' 
    ''' 
    Public Shared Function GetCSVLines(Path As String) As IList(Of String)

        Dim CSVLines As New Collection(Of String)

        Try

            If File.Exists(Path) Then

                Using Reader As New StreamReader(Path)

                    While Not Reader.EndOfStream
                        CSVLines.Add(Reader.ReadLine)
                    End While
                End Using

            End If

        Catch ex As Exception
            Throw
        End Try

        Return CSVLines

    End Function

public static Collection GetCSVLines(string pathIn)
{

	Collection CSVLines= new Collection();

	try 
	{	        
					
		if( File.Exists(pathIn))
		{
					
			var reader = new StreamReader(File.OpenRead(pathIn));

			while (!reader.EndOfStream)
			{
				CSVLines.Add(reader.ReadLine());

			}
						
		}
                    
	}
	catch (Exception)
	{
		
		throw;
	}

	return CSVLines;
}
		

 

 

We take in a path argument, check to see if the file actually exists and then read it line by line into a collection of type string.  We return that collection to the calling procedure.

So now we have a collection of strings that represent the rows of data in that CSV file. We need to separate the column values that are separated by commas.  A common method is to suse a code construct like this:

CSVFields = CSVLines(0).Split(",")
-or-
var CSVFields = CSVLines[0].Split(',');

This leaves you with n number of columns that you can now read into your data repository.  Simple enough if your data is completely free of embedded commas.  But anyone who has worked with more than trivial CVS files knows the agony of embedded commas that must be retained as part of the data. So what is the answer?  Why regular expressions of course!  As long as those fields that contain embedded commas are wrapped with quotes then this solution works. To be honest, the Regex patterns I am going to include here are not of my devising and if I could remember where I got them I would gladly give due credit. (if it is you, contact me, I will put your credits here)  What I added is the wrapping function that adjusts to pattern to fit any number of fields.  What this wrapper method does is take an integer argument that represents the number of fields  and builds a string that contains the pattern for handling a CSV row of data with the specified number of columns. Note that I toss away the first and last match items. This is because they do not contain useful  data, but instead usually contain empty quotes.

 

    ''' 
    ''' Bilds a Regex pattern that will parse a comma delimited
    ''' row of data.
    ''' 
    ''' 
    ''' 
    ''' 
    Public Shared Function BuildRegex(Count As Integer) As String

        Dim Buffer As New StringBuilder

        Try

            Buffer.Append("^(\x22[^\x22\r\n]*\x22|'[^'\r\n]*'|[^,\r\n]*)")

            For i As Integer = 1 To Count
                Buffer.Append(@"\s*,\s*(\x22[^\x22\r\n]*\x22|'[^'\r\n]*'|[^,\r\n]*)")
            Next
            Buffer.Append("$")

        Catch ex As Exception
            Throw
        End Try

        Return Buffer.ToString()

    End Function

public static string BuildRegex(int count)
{

    System.Text.StringBuilder Buffer = new System.Text.StringBuilder();

    try
    {

	Buffer.Append("^(\x22[^\x22\r\n]*\x22|'[^'\r\n]*'|[^,\r\n]*)");

	for (int i = 1; i < count; i++)
	{
		Buffer.Append(@"\s*,\s*(\x22[^\x22\r\n]*\x22|'[^'\r\n]*'|[^,\r\n]*)");
			}
		Buffer.Append("$");

	}
    catch (Exception)
    {

	throw;
    }

    return Buffer.ToString();

}

 

Now all we have to do is process each line using this regex string.  What I have here is another wrapper method that processes each CSV line into a list of columns. This example is specific to the CSV file I had to process only in the sense that it had a fixed value for the number of fields.  You could add an additional argument that includes a variable containing the number of fields to process.

 

 ''' 
    ''' Breaks the CSVLine into its value fields
    ''' 
    ''' 
    ''' 
    ''' 
    Public Shared Function CSVLineFields(CSVLine As String) As IList(Of String)

        Dim Values As New Collection(Of String)



        Try

            Dim RegPattern As String = BuildRegex(63)


            If Regex.IsMatch(CSVLine, RegPattern) Then

                Dim ValuesX = Regex.Split(CSVLine, RegPattern)

                For i As Integer = 1 To ValuesX.Count - 1
                    Values.Add(ValuesX(i))
                Next

            End If


        Catch ex As Exception
            Throw
        End Try


        Return Values

    End Function


Code:
private static IList<string> CSVLineFields(string csvLine)
{

   IList<string> values = new List<string>();

   string regPattern = Utilities.BuildRegex(63);

   try
   {

	if (Regex.IsMatch(csvLine, regPattern))
	{

	var valuesX = Regex.Split(csvLine, regPattern);

	/// we do a 1 to count-1 because we do not want
	/// the first and last items in this list.
	for(int i=1; i< valuesX.Count()-1;i++)
	{
		///regex escapes the extra quotes in a field that
		///is wrapped in quotes so we strip those escape
		///sequences as we add the field data to the values
		///list.
		values.Add(valuesX[i].Replace(@"\x22\","" ));
					}
	}

   }
   catch (Exception)
   {
		
	throw;
   }

   return values;

}

 

The list of string values returned here contains the separated values of the CSV Line.  The above does not take into account that the first line might actually contain field names instead of data. It is simple enough to add some code that will process the first line differently.   The above code works together in this method call that iterates though all of data rows contained in the string collection passed at the argument to the method.  In this method each line is processed into a list of values that can be written to a dataset, a collection of POCO objects or whatever destination you intend for them.

In my solution I was adding these lines to Serializable objects that I stored into a hierarchical model built up from several CSV files and then stored as an encrypted binary file.  But that is for another blog.

 

  ''' 
    ''' I am using a generic object as the return item.  In real life it would
    ''' be a collection of type instead
    ''' 
    ''' 
    ''' 
    ''' 
    Public Shared Function MyObjects(CSVLines As Collection(Of String)) As ObservableCollection(Of Object)

        Dim Results As New ObservableCollection(Of Object)
        Dim Values As Collection(Of String) = Nothing

        Dim MyConnection As New SqlConnection("My connection string")
        Dim MyCommand As SqlCommand
        Dim MyParam As SqlParameter
        Try


            MyConnection.Open()

            For Each Item As String In CSVLines

                Values = CSVLineFields(Item)

                If Values IsNot Nothing Then

                    'here you process the csv line's values into what ever format you require
                    'and add it to the results collection.

                End If

            Next

        Catch ex As Exception
            Throw
        End Try

        Return Results

    End Function


Code:
public static ObservableCollection<object> TranslateHospitalListingCSV(ObservableCollection<string> cSVLines)
{
   ObservableCollection<object> Listings = new ObservableCollection<Listings>();

   //62 fields

   try 
   {	        

   foreach(string Item  in cSVLines)
   {
	Listing hl = new HospitalListing();
					
	///get the field values for the given csv line
	///and properly handle embedded commas in data.
	var values= CSVLineFields(Item);
	
	/// Now process the field values as you desire


    }

    catch (Exception)
    {

	throw;
    }

    return Buffer.ToString();

}

The questioner that I was responding to mentioned that the final step was to store this in a SQL-Server data store. As one person pointed out in the forum, using SSIS to import the CSV files was perhaps the easiest and most optimal solution. But if you ever had to deal with database administrators who look at you as the barbarian horde scaling the walls of their domain with dreaded and uncouth SSIS projects you know that this solution is not always an option. :P

  What to do then?  I would use a SQL Command object to write to a stored procedure that would handle the inserting of the data.  Here's the MyObjects method again with some SQL Server updating logic added to it.  This is just to demonstrate the concept.  I would strongly suggest that you have a separate data layer that is called to do the updating.
  ''' 
    ''' I am using a generic object as the return item.  In real life it would
    ''' be a collection of type instead
    ''' 
    ''' 
    ''' 
    ''' 
    Public Shared Function MyObjects(CSVLines As Collection(Of String)) As ObservableCollection(Of Object)

        Dim Results As New ObservableCollection(Of Object)
        Dim Values As Collection(Of String) = Nothing

        Dim MyConnection As New SqlConnection("My connection string")
        Dim MyCommand As SqlCommand
        Dim MyParam As SqlParameter
        Try


            MyConnection.Open()

            For Each Item As String In CSVLines

                Values = CSVLineFields(Item)

                If Values IsNot Nothing Then

                    'here you process the csv line's values into what ever format you require
                    'and add it to the results collection.

                    MyCommand = New SqlCommand("SP Name", MyConnection)

                    MyCommand.Parameters.Add(New SqlParameter("@fieldName1", Values(0)))
                    MyCommand.Parameters.Add(New SqlParameter("@fieldName2", Values(1)))

                    ' etc
                    MyCommand.ExecuteNonQuery()

                End If

            Next

        Catch ex As Exception
            Throw
        Finally

            If MyConnection IsNot Nothing Then
                MyConnection.Close()
            End If



        End Try

        Return Results

    End Function

I know others would say build a table object, add it to a data set and add rows to the data table then do dataset update, But in truth it is no faster than the code above and has a lot less overhead. The last code block I did only in VB since it is specific to the original forum question.

Well that’s it; I certainly hope some will find this article useful.  The VB Code has not been tested, your mileage may vary.  The C# code is extracted from a working project. 

 

 

Number of views (11674)/Comments (0)

Tags:

Please login or register to post comments.