Malluf Consulting
Search
× Search

EasyDNNNews

Ibrahim Malluf

When DataTable Parameters really come in handy

When not to use a C# client to iterate through rowsets thus making multiple requests to SQL-Server

This is a another Stack Overflow question I answered for someone concerning the best way to handle processing data changes in a C# client connecting to a SQL-Server data store.
The question was like this:

I am trying to optimize Quartz .NET scheduler by eliminating the foreach iteration through each jobs.
It is a question related to SQL, and not in .NET.
The code iterates through each jobs to perform operations and, for each jobs, does the following:

 

bool existingJob = await JobExists(conn, newJob.Key, cancellationToken).ConfigureAwait(false);
try
{
    if (existingJob)
    {
        if (!replaceExisting)
        {
            throw new ObjectAlreadyExistsException(newJob);
        }
        await Delegate.UpdateJobDetail(conn, newJob, cancellationToken).ConfigureAwait(false);
    }
    else
    {
        await Delegate.InsertJobDetail(conn, newJob, cancellationToken).ConfigureAwait(false);
    }
}
catch (IOException e)
{
    throw new JobPersistenceException("Couldn't store job: " + e.Message, e);
}
catch (Exception e)
{
    throw new JobPersistenceException("Couldn't store job: " + e.Message, e);
}

 

Where in every await lies an sql request.
I would like to do the opposite: do a massive JobExists for all jobs which would then tell me which jobs exists and which don't, and then update the jobs that exists and add those who don't.
So, for example 200 000 jobs, instead of doing 200000 times exists, then add, or update, we would have 3 sql transactions, one which would validate which one exists, and then a transaction to add in bulk and a last one to update in bulk.
But I do not know how to do the massive Exists in SQL, I only know how to do an IF EXISTS (SELECT A FROM B) query for one. Is such a thing possible? Or should I do a massive SELECT, or a JOIN of some sort? How could I proceed?

The problem with this code, as the questioner explains is that it makes too many calls to SQL-Server causing a very slow processing of the data not to mention the burdeon on network traffic.  The questioner is looking for some way to do a bulk processing of the 200K job items for each iteration of the overall process.

My response to the Questioner was to explain how to use a DataTable parameter to send the entire list of Jobs to SQL-Server and let the database engine process the list. This is what SQL-Server or any database engine does best, process rowsets.  The Questioner asked for a code example of how to do this. The first thing I did was provide a T-SQL Stored Procedure prototype that would demonstrate the Server side of things:

 

/* Create a table type. */  
CREATE TYPE JobsTableType AS TABLE   
( JobKey INT  
, Value1 INT
, Value2 INT);  
GO  
 
CREATE PROCEDURE procProcessJobs 
	-- Add the parameters for the stored procedure here
	@Jobs JobsTableType READONLY
AS
 
DECLARE MYCURSOR CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR Select Distinct JobKey FROM @Jobs
 
Declare @MyKey Int
	
	OPEN MYCURSOR
 
	FETCH NEXT FROM MY_CURSOR INTO @MyKey
	
	WHILE @@FETCH_STATUS = 0
	BEGIN
		If EXISTS(Select * From [your_table_name] where [your_table_name].JobKey=@MyKey)
		BEGIN
			/*do your update code here*/
		END
		ELSE BEGIN
			/*do your insert code here*/
		END
 
	END
 
	CLOSE MYCURSOR
	DEALLOCATE MYCURSOR
	
GO
 

I told the Questioner that this was untested code I was presenting and that some tweaking might be in order. But it does demonstrate how the user could process a large rowset of data. What I didn't show here, and the Questioner could easily add it, was a Temp Table that could record the results of the action and be returned to the client as a dataset.

Finally I wrote a quick example of how the C# client could use this Stored Proc:

 

namespace ConsoleApp1
{
   
    public class Job
    {
        public int  Key { getset; }
        public int value1 { getset; }
        public int value2 { getset; }
    }
 
 
    class Program
    {
 
        static DataTable getJobTabel(List<Job> jobs)
        {
            DataTable results = new DataTable();
            results.Columns.Add("JobKkey"typeof(int));
            results.Columns.Add("Value1"typeof(int));
            results.Columns.Add("Value2"typeof(int));
            
 
            foreach(Job item in jobs)
            {
                object[] r = { item.Key, item.value1, item.value2, false };
                results.Rows.Add(new object[] { item.Key, item.value1, item.value2});
            }
 
            return results;
        }
 
 
        static void Main(string[] args)
        {
            List<Job> myJobs = new List<Job>;//populate the myjobs list
 
            DataTable JobsToProcess = getJobTabel(myJobs);
 
            //create your connection and command objects then add JobsToProcess as a parameter
 
        }
 
    }
}

Again, this is untested code and I left the data connection boiler plate out of the example code, the Questioner can put in their own work on that issue. My purpose here was to show how the data could be more efficiently processed.  Basically, when you have a large rowset of dat to be processed, move the processing to SQL-Server since the database engine processes rowsets far quicker than C#.  

As a side note, to all of those readers that are haveing heart attacks and berfing over my use of a Cursor in the Stored Procedure, remember, in this case I am opening a cursor on a TableVariable, not a regular table in the database.  There are no locks to process, the cursor is readonly and forward only, basically a firehose cursor that will use a minimum of resources and process quickly :P

 

Print
475 Rate this article:
No rating

Leave a comment

Add comment

x
Terms Of UsePrivacy StatementCopyright 2018 by Malluf Consulting Services Inc
Back To Top