Thursday, December 23, 2010

CXPACKET – that mysterious wait type

At my current employer, we have experience various levels of CXPACKET waits causing or thinking that is causes problems.

The first assumption we made that the CXPACKET wait was a problem came from SQL Server error message that suggested using OPTION (MAXDOP 1). Googling the details of the error, which included the words resource semaphore, was bringing up nothing. Calls to Microsoft Premier Support was not helping either. So, we changed queries that had long CXPACKET waits (some queries would never finish) or the Resource Semaphore deadlocking to use OPTION (MAXDOP 1) and we just waited for the query to run long (at least it finished). This happened to about 5 long running queries in our OLTP system when we converted to SQL Server 2005 and larger production machines.

A good read about this situation can be located here from Bart Duncan. Thanks Bart!!!

So, we thought every time we see CXPACKET waits, we had this same problem. The SQL Server community would make comments like CXPACKET is not the problem or look at your Query Plan. Tune the query, etc. etc. But I could not see the forest through the trees.

Now, after a couple of years of reading and using various tools for performance tuning, I have now realized why some in the SQL community keep saying that CXPACKET is not the problem. The problem I was having with this statement was they were either not communicating what the problem was or my limited understanding of performance tuning hindered me from comprehending the solution. And I really believed I knew what performance tuning was all about. Man, I was wrong. And finally still today I know there is more to learn.

Our OLTP system has 5 to 1 Reads to Writes. An analysis by EMC for a new VMAX gave us these statistics. This means that long processing queries were victim to this perceived notion that CXPACKET wait was our problem. This also included our OLAP system which is Cognos and I am now a member of this department. We just assumed it was a wait problem.

Adam Machanic has a great utility called sp_WhoIsActive. By default, it does not show CXPACKET waits. He also has a recorded session on Parallel Processing which I was blessed to host for Adam and the PASS Performance Virtual Chapter. He also has a series with SQL University, which I suggest reading. Also, Paul White ( Blog | Twitter ) has been trying to clue me in on CXPACKET via Twitter for sometime now. If sp_WhoIsActive is hard to understand, first start with sp_who3, another great monitoring tool. Thanks Mr. Denny

I have 2 executions of sp_whoIsAtive in my shortcuts in SSMS:

Ctrl+7 - exec dbo.[sp_WhoIsActive] '', 'session', 'Replication%', 'program'

image

Ctrl+8 - exec dbo.[sp_WhoIsActive] @filter = '', @filter_type =  'session', @not_filter = 'Replication%', @not_filter_type = 'program', @get_task_info = 2

image

The filter is for not seeing transaction replication in the view. The Ctrl+8 has the @get_task_info = 2 which shows a combination of CXPACKET and other waits associated with the SPID execution. This is really cool stuff for a DBA geek.

After reading the above links and riffling through some execution plans, I came to an ah hah moment when I saw a Display Estimated Query Plan and the Actual Query Plan not be the same. The difference included a Nested Loop in the estimate that did a Lookup and then a Hash Loop with 2 Index Scans in the actual plan.

What I came to realize was that if I tune the query and tables(indexes), parallelism sometimes goes away. Did you get that? …it goes away. Why?  Because now the actual plan does not need to run in parallel for the best plan.

Another one came when I realized the table (Fact table in a Data Warehouse with 30+ million rows) did not have a clustered index, but many non-clustered indexes. The suggested index from the Missing Index feature of SSMS showed a new Non-clustered index, when I applied it, it did not help 90+% like the missing index feature said. So, I went to the table, found the no clustered index on table, combined the surrogate keys into a clustered index and the Query Plan cost went from 1000+ to less than 50.

WOW!!! Now I get it. Thanks Adam, Paul and others for being patient with me

One more thing, I do not remember where I saw this, but in the sys.dm_os_waiting_tasks, when  exec_content_id = 0 and the wait is CXPACKET, that means the process is waiting for parallel processing to finish, and is a valid wait.

I think I will dig up some of these old query tuning experiences and try to show others through this blog what I have found. Maybe even a step by step process I go through nowadays to find problems with performance.

Now, back to the ETL on Data Marts.

God Bless,

Thomas