Monday, June 18, 2012

MAXDOP and Cost Threshold for Parallelism – an example for a parallel query

The most popular post on this blog ranked by hits is one about the CXACKET wait stat. There are no comments, so I am not even sure if it was helpful to anyone, but me.

This post will be an example of a query from the AdventureWorks database that runs in parallel by defaults instance settings for SQL Server, but will step you thru changing the thresholds to see differences in query plans when changing these settings. Parallel queries is what ‘can’ cause CXPACKET waits.

First, the query:

SELECT sod.SalesOrderID, sod.OrderQty, 
        p.ProductID, p.Name
    FROM Production.Product p
        INNER MERGE JOIN Sales.SalesOrderDetail sod
            ON sod.ProductID = p.ProductID    


This code uses a query hint, MERGE, to force a certain query plan:


image


You will see the black arrows in a yellow background indicating iterators running in parallel. For a query to be considered for parallelism, the overall cost of the pre-parallel query must be above the Cost Threshold for Parallelism instance setting.  Here is a graphical view from the Object Explorer in Management Studio looking at the properties of the Instance. In this case, the default installation had 5 as the value.


image


The 2 setting will be looking at are Cost Threshold for Parallelism and Max Degree of Parallelism. These values can be obtained from the sp_configure system stored procedure if the “Show Advanced Option” is on. Here is the code for this:



EXEC sp_configure 'show advanced option', '1';
RECONFIGURE;
EXEC sp_configure;


Now, we are going to use another hint to remove running the query in parallel, OPTION (MAXDOP 1) to find the cost before running parallel.



SELECT sod.SalesOrderID, sod.OrderQty, 
        p.ProductID, p.Name
    FROM Production.Product p
        INNER MERGE JOIN Sales.SalesOrderDetail sod
            ON sod.ProductID = p.ProductID    
    OPTION (MAXDOP 1)

image


By doing this, we can see the cost associated with a non-parallel query plan for this query:


image


The Estimated Subtree Cost of this query is 10.7496, which is greater than the Cost Threshold for Parallelism (5) on this instance, thus the optimization engine can see if running parallel will help. Here is the cost after running parallel.


image





So, by running in parallel, the cost is 1/3 the original query. The other factor in parallel queries is the number of scheduler/threads or CPUs used to run the query. If we look at the properties of one of the iterators for a parallel process, we can find the number it used.


image


Here, we can see the iterator Parallelism (Repartition Streams) used 8 threads, which is the number of processors on this machine, to run the query in parallel.


If we change the MAXDOP in the query or instance, we can change the number of threads and cost. Below is a script to change the instance level setting from 0 (all processors) to 2.



EXEC sys.sp_configure N'max degree of parallelism', N'2'
GO
RECONFIGURE WITH OVERRIDE
GO

All running the query again, here are the results.


image




The cost is now 6.38581 and only 2 threads. This shows us that parallelism and the number of threads makes a difference in the cost of the query and how fast it might complete. The Max Degree of Parallelism is a setting that can be changed on the instance as well as using a query hint to control parallelism.


There are good articles on MAXDOP settings on SQLSkills.com that can help explain the changes you might make on the instance level settings.


Thomas

Thursday, June 14, 2012

PASS Data Architecture VC presents Neil Hambly–Improve Data Quality & Integrity using Constraint

Tuesday’s (June 19th, 2012) PASS Data Architecture Virtual Chapter has Neil talks about database design that uses Constraints for Integrity and Performance.

We will also be giving away an Amazon gift card plus someone gets a book - Itzik’s T-SQL Fundamentals.

If you would like to volunteer for this VC, please email PASS_DA_VC@HotMail.com

Subject: Improve Data Quality & Integrity using Constraints

Start Time: Tuesday, June 19th, 2012 12:00 PM US Central Time (6:00 PM GMT)

End Time: Tuesday, June 19th, 2012 1:00 PM US Central Time (7:00 PM GMT)

Presenter: Nail Hambly 

Live Meeting Link: Attendees LiveMeeting

Leverage the power of Constraints to improve both Data Quality and Performance of your databases. Powerful Data Entity & Referential Integrity can be achieved using the different constraints available in SQL Server. Discover how to utilize the full range of constraints available Default, Check, Unique, Foreign and Primary keys. Learn how to use DBCC commands to validate data conforms to
constraints, handle Nulls and gain real-world performance tips.

 

Neil Hambly is a SQL veteran with over 13 years’ expertise in SQL Server from
Version 6.5 through to the very latest 2012 edition. Neil has held Database
roles at major organizations {BBC, ABN AMRO, ACCENTURE} as well as market
leading smaller companies, experienced in DBA, Developer & Architect roles. Neil is an speaker at conferences and user groups. UG Leader of the London Area SQL Server User Group (UK) often seen presenting at UK & international events. Just loves SQL!!!

Add to calendarAdd to calendar