Thursday, March 15, 2012

Execution Plans – Merge Join

The Merge Join is a Physical Operation when joining 2 sets of data that are in the same order.

There is an interesting Clustered Index in the Adventure Works database. The SaleOrderDetail table’s primary key is a combination of the SalesOrderID and SalesOrderDetailID, not just the identity field of the table (SalesOrderDetailID). Most developers I work with usually create the primary key (by default is the clustered index) on the Identity fields with a unique constraint on the business key of the table.

The compound primary key might take more space, but it looks to help query plans have more options when optimizing.

The following query will use a Merge Join after doing a clustered index scan on SalesOrderHeader and SalesOrderDetail.

SELECT *
    FROM Sales.SalesOrderHeader soh
        INNER JOIN Sales.SalesOrderDetail sod
            ON sod.SalesOrderID = soh.SalesOrderID

image

Now, of course, no one does a SELECT *, right? Maybe not, but you can see that since the first column in the SalesOrderHeader and SalesOrderDetail clustered indexes is the SalesOrderID, the query optimizer can Merge the 2 data sets together because they are in the same order. The optimizer knows this is the best join.

A benefit to the Merge Join operation is once the first piece of data (SalesOrderID) is merged with Detail data, the data can be passed to the next operator, which can improve performance if further processing is needed.

Say I want a list of SalesOrderIDs by ProductID and retrieve the Product Name from the product tables

SELECT sod.SalesOrderID,
        p.ProductID, p.Name
    FROM Production.Product p
        INNER JOIN Sales.SalesOrderDetail sod
            ON sod.ProductID = p.ProductID
   

image

The SalesOrderDetail Scan is on a non-clustered index on ProductID. This is how the Merge is able to be used with the Clustered Index scan of the Product table

Below is the info on the Scan Operators which shows the Object (index) used in the operator.

imageimage

What happens when we add a column to this query – Order Qty.

SELECT sod.SalesOrderID, sod.OrderQty,
        p.ProductID, p.Name
    FROM Production.Product p
        INNER JOIN Sales.SalesOrderDetail sod
            ON sod.ProductID = p.ProductID
   

image

The optimizer determines the Hash Match physical join is more efficient. The Cost is 1.7226.

An option here is to create a covering index by adding the OrderQty column to a non-cluster index using the INCLUDE statement of the Create Index statement.

CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductIDIncludeOrderQty] ON [Sales].[SalesOrderDetail]
    ([ProductID] ASC )
    INCLUDE (OrderQty)

image

You can force a Merge Join with a query hint to see the difference without the Covering Index.

-- Force Merge join
SELECT sod.SalesOrderID, sod.OrderQty,
        p.ProductID, p.Name
    FROM Production.Product p
        INNER MERGE JOIN Sales.SalesOrderDetail sod
            ON sod.ProductID = p.ProductID
  

The Cost is now  6.3887 and the query runs with parallelism. And, a Sort operator (74% of query cost) is needed if you do not have the covering index above.

image

Query Hints should be left to the experts.

If the data sets are not in the same order based on the Index (clustered or non-clustered), the optimizer can still use a Sort operator to order the data sets in the same logical order to use a Merge Operation.

The AdventureWorks database is available from CodePlex at http://msftdbprodsamples.codeplex.com/

I first learned Execution Plans from a free PDF provided by Red Gate with author Grant Fritchey. You can get this PDF at http://www.simple-talk.com/sql/performance/execution-plan-basics/

Happy Query Tuning from Louisiana!!!

Friday, March 9, 2012

Execution Plans – Nested Loop

The nested loop is a physical operator used to join 2 or more sets of data when the query optimizer believes this is the best plan for the query.

The following query can be run against the AdventureWorks database:

SELECT cust.CustomerID, soh.SalesOrderID
    FROM Sales.Customer cust
        INNER JOIN Sales.SalesOrderHeader soh
            ON soh.CustomerID = cust.CustomerID
    WHERE cust.CustomerID = 11091

The query joins the Customer table with the Sales Order Header by the CustomerID column (ON soh.CustomerID = cust.CustomerID). This is a Transaction table (Sales Order Header) joined to a lookup table (Customer). You get a Query Plan like the following:

image

The Clustered Index Seek on the Customer table retrieves the data for a customer based on the WHERE clause looking for CustomerID 11091. Since the rows retrieved from the customer table is smaller than the Sales Order Header, the customer becomes the outer loop of the Nested Join. The inner part of the loop looks for Sales Order Header rows based on the CustomerID 11091.

The Index Seek happens because there is an Index (Non-Clustered) on CustomerID for the SalesOrderHeader table. The SELECT part of the query only needs the SalesOrderID which is part of the Non-Clustered index. The clustered index on the Sales Order Header table includes the SalesOrderID column. All clustered index columns are included in the leaf level of the non-clustered index (idxSalesOrderHeader_CustomerID).

image

By hovering your mouse over the nested loop, a tool tip gives you more info, including a description of the Nested Loop, Physical and Logical Operation, and much more.

The Actual Number of Rows shows that the Sales Order Header had 28 rows for this customer.

By adding more columns to the SELECT for data from Sales Order Header, like AccountNumber and OrderDate, we get a second Nested Loop.

SELECT cust.CustomerID, soh.SalesOrderID, soh.AccountNumber, soh.OrderDate
    FROM Sales.Customer cust
        INNER JOIN Sales.SalesOrderHeader soh
            ON soh.CustomerID = cust.CustomerID
    WHERE cust.CustomerID = 11091

image

What happens now, is a second Nested Loop is needed to get the AccountNumber and OrderDate from the Clustered Index of the SalesOrderHeader tables. This is because the Non-clustered index on the CustomerID does not “cover” the columns needed for the SELECT but can be used to Seek the SalesOrderID values needed to satisfy the WHERE on CustomerID 11091. The Key Lookup (Clustered) is OK because it is not a scan.

The query can be improved by adding a Covering Index. The following index will use the INCLUDE clause of the CREATE INDEX statement to include the AccountNumber and OrderDate in the leaf level of the index and not the tree level.

    CREATE NONCLUSTERED INDEX idxSalesOrderHeader_CustomerID_IncludeAccountNumberOrderDate
        ON [Sales].[SalesOrderHeader] ([CustomerID])
        INCLUDE (AccountNumber, OrderDate)
GO

The second Nested Loop (Cost 93%) of the query will be removed.

image 

The Index Seek is now covering the SELECT for SalesOrderHeader data.

We have shown here a basic explanation and example of a Nested Loop along with some information on Cluster and Non-Cluster index Seeks as well as a Key Lookup.

Thomas

Monday, February 27, 2012

Another 24 Hours of PASS

Nothing like a 24 hour barrage of SQL Server training for FREE!!! Did I mention it was free?

The PASS Data Architecture Virtual Chapter will be hosting Ami Levin for his presentation Where Are My (Primary) Keys. It will be 5AM Central for me, but I am excited to be hosting Ami.

Ami did an excellent job on explaining Physical Join Operators at last year’s PASS Summit. I was very impressed with the presentation skills, visual props (he used a large deck of cards to show Nested/Merge/Hash joins) and his sense of humor. Ami uploaded a script from his presentation that allowed me to follow up on the presentation while viewing the recording as well as additional examples with comments above each script. He session alone for enough for one day, so I am really excited to see him present again.

The PASS board has decided the let the Virtual Chapters help moderate and select sessions this year for #24HOP. What most of the attendees of 24 Hours of PASS, the Summit and SQLRally do not know is that the Virtual Chapters host these same speakers all during the year with the same level of presentations. Did I mention it was free? This years #24HOP (Twitter has tag) will give the Virtual Chapters an opportunity to advertise sessions, recruit speakers and get volunteers to help with the VC. 

The Data Architecture VC could really use some help!!! E-mail us at PASS_DA_VC@HotMail.com if you want to help. The following are opportunities to help this chapter:

  1. LiveMeeting Host: These people contact the speaker to practice and make sure the LiveMeeting works. He or she will come online 30 minutes before the meeting to get it started, then a 5 minute intro of the VC and the speaker. During the session, he/she will help moderate questions and close the session for the speaker at the end. There is also a recording to be started and stopped.
  2. Marketing: This job helps get word out on Twitter/Megaphone/UGS, get some Bloggers to talk about the upcoming session and get some SQL Server websites to help broadcast the meeting announcement to others. The marketing job can include using DotNetNuke to manage the website.
  3. Speaker Coordinator: Speakers are need to keep the VC going each month. This is really cool job because you get to meet some of the great SQL Server community servants that share of their time and talents. Other duties include getting the abstract and bio from the speaker a month ahead to get on the website

Any of these positions can be shared with one another and is highly suggested. There are SQL PASS employees and board members dedicated to helping us be successful. Of course, the best lessons are learned through experience.

Of course, none of this happens without Sponsors. That is why, did I mention it is FREE!!! Thank you Microsoft and SQLSentry. SQLSentry is now famous for its free version of Plan Explorer. If you tune queries and look at Execution Plans, your SQL Tool Belt needs this free tool. They also have a cool scheduler to compliment the Analysis tools.

image

 

Thomas

Wednesday, February 15, 2012

PASS Data Arch VC presents Steve Simon - On Beyond Zebra AdventureWorks OR Where Did I Go Wrong?

I meet Steve when  he came down from New England area for SQL Saturday #28 IN Baton Rouge, LA. He is a great SQL and IT community person who has many years experience with database systems. Please join us Thursday at noon for some free training!!!

Next Meeting - http://dataarch.sqlpass.org/

Subject:
On Beyond Zebra AdventureWorks OR Where Did I Go Wrong?

Start Time:
Thursday, February 16th, 2012 12:00 PM US Central Time (February 16th, 2012 6:00 PM GMT)

End Time:
Thursday, February 16th, 2012 1:00 PM US Central Time (February 16th, 2012 7:00 PM GMT)

Presenter:
Steve Simon (LinkedIn)

Live Meeting Link:

https://www.livemeeting.com/cc/usergroups/join?id=TRTR2B&role=attend&pw=jT%2F%5E3%3DS%5Bc

On beyond Zebra AdventureWorks OR where did I go wrong?
No matter how well planned and executed, the data structures of any data warehouse oft times eventually land up resembling something out of a Dr. Seuss book. In this presentation we shall be looking at and discussing a few of the “perfect” data structures (that I have inherited) and discussing the flaws that have only recently surfaced. We shall be looking at the ‘before’, discussing a few alternatives to make these structure more efficient and effective, then we shall then look at final “production structure” and the resulting improvement metrics; all of which help us to help our clients’ make better decisions in a timely manner..

Steve Simon
Steve Simon is a Senior Business Intelligence Engineer with MyWebGrocer in Burlington Vermont. He has been involved with database design and analysis for over 26 years. Steve has presented papers at eight North American PASS Summits (in Orlando, Seattle WA (4), Denver CO (2) and San Francisco CA), two at PASS Europe 2009 and one at PASS Europe 2010. He has just recently presented at the Johannesburg and Cape Town SQL Server Saturdays. Steve is actively involved with the SQL Server community within the Boston area and is a regular presenter at the New England Code Camps and SQL Saturday events and participates actively on many of the Microsoft User Forums. He is also the Virtual Chapter leader of the Professional Association for SQL Server (PASS) Oracle – SQL Server Virtual Chapter and is a PASS regional mentor.Steve has also presented papers at the Information Builders’ International Technology Summits ( in New Orleans (1998), in Orlando (1999), Las Vegas (2007), two in Nashville (2008), one in Nashville (2009), one in Orlando 2010 and will be presenting a paper at the 2011 Summit in Dallas TX).

Add to calendarAdd to calendar

Monday, February 6, 2012

Baton Rouge SQL Server User Group

I will be speaking at the February Baton Rouge SQL Server User Group meeting on Wednesday the 8th. The session will be an continuation of Execution Plan Basics that I shared last year and at 2 SQL Saturday events plus Houston Tech Fest in 2010. This one is called Execution Plan: Beyond The Basics. The topics include Digging deeper into Loops, Missing Indexes in Query Plans, WHERE clause and LEFT JOIN, TempDB usage, SQL Server 2012 improvements to Query Plan and Tools like SQLSentry’s Plan Explorer, Database Tuning Advisor and SP_WhoIsActive from Adam Mahanic.

I am hoping this session gets pick by some SQL Saturday events in the future, first at Houston with Nancy Hide Wilson and Jonathon Gardner. Still looking at Pensacola and waiting for John Sterret and gang in West Virginia to add one to the schedule. Really would like to do a half day execution plan session, that would be nice.

More Info:

February '12 Baton Rouge SQL Server User Group Meeting

Topic: Execution Plans: Beyond the Basics

Date: Wednesday, February 8, 2012
Location: LSU Campus, Turead Hall Room 103 - http://g.co/maps/xb5pf

Agenda:
5:45 - 6:15 pm: Networking and Refreshments
6:15 - 6:55 pm: Lightning Round Topics
6:55 - 8:00 pm: Main Topic
8:00 - until: Open Forum for questions and Raffle
Afterwards: An open invitation to join us at Walk-Ons
Lightning Round Speaker: Patrick LeBlanc, Microsoft

Topic: New Features of SQL 2012

Lightning Round Speaker: Matt Maddox, Lamar Advertising

Main Topic:Execution Plans: Beyond the Basics
Main Topic Speaker: Thomas Leblanc, Turner Industries
Summary: This will session will be a deeper look into execution plans for query tuning. The Nested, Merge and Hash Join loops will be discussed. A look at TempDB usage when memory grants are not enough for the plan. How do you use MAXDOP and Cost Threshold to understand parallel plans? We will examine Sort, Compute scalar and more iterators for performance tuning. Missing indexes in query plans will be used to help performance.

Wednesday, January 25, 2012

SQL Saturday #104 Colorado Springs, CO

Janet, my girlfriend, and I went to Colorado Springs a day early, Thursday, to ski on Friday before SQL Saturday #104. It was her first time skiing and for me it has been about 3 years since the last trip on a mountain. The altitude at Monarch was about 10,700 feet above sea level at the base, and this had me breathing hard just going up the stairs.

image 

We meet up with Jason Horner and Christian Leo on Monarch Mountain. It was great to spend some social time with other SQL geeks before the event. Janet is trying to get used to techy talk but at least she is trying.

I meet Jeremy Lowell from Data Realized as sponsor of the Friday night event. They gave us ski vests with SQL Patrol on the back like the Ski Patrols on the mountains. The ski theme was even part of the schedule, Chris Shaw and gang did a great job. There were others that promoted the event and helped the day go by that were awesome.

I attended an Introduction to BI which was given by 2 individuals, Marc Beacom and someone else, because it was right before mine. The rooms were small, but packed 20-30 people. I thought that would be a problem, but actually was good for everybody could see without using Zoomit. They introduced a new term for Data Warehouse calling it a Data Vault. Very interesting!!!

My session was on Dimensional Modeling 101. I tried to demonstrate methods from the Kimball Group while showing flow in SSIS and display in SSAS. The reviews were what I expected for the first time doing this session – from excellent to you were all over the place. 60 minutes is just not enough time, need 75-90 minutes.

The best session of the day was with Thomas LaRock and Jason Strate, that I attended. It was an attendee group effort to get to the root cause of a performance problem. Jason drove while Thomas guided. I am are sure this helped every one understand that the native tools of SQL Server can diagnosis a problem while helping all learn how to cooperate with each other to solve a problem. They also guided us in how to communicate with other departments and the importance of being able to say the right things and showing your worth in the company.

Lunch was spent socializing with attendees and catching up with some SQL friends William (Bill) Pearson and Grant Fritchey. Janet and I talked more with Grant waiting for the plane Sunday morning.

Mike Fal did an excellent job with Partitioning, so good he is doing the session for the PASS Performance and Data Architecture Virtual Chapters. Also, David Eicher showed some tips for using maps in Reporting Services. I caught some of Always ON 2012 from Austin nut Jim Murphy.

I am amazed that I gained all this knowledge and excitement and fun from a SQLSaturday. This site is loaded with 20-30 events coming up in the next 4-5 months, so please do not missed these opportunities to learn and/or teach.

God Bless, Thomas

Tuesday, January 17, 2012

PASS Data Architecture VC presents: Audrey Hammonds–Database Makeover: Renovate Your Data Model

Subject: Database Makeover: Renovate Your Data Model

Start Time:   Thursday, January 19, 2011 12:00 PM US Central Time (January 19, 2011 6:00 PM GMT)

End Time:  Thursday, January 19, 2011 1:00 PM US Central Time (January 19, 2011 7:00 PM GMT)

Presenter:  Audrey Hammonds (Blog|@DataAudrey)

Live Meeting Link:  https://www.livemeeting.com/cc/UserGroups/join?id=QZCH9D&role=attend&pw=g3d%3F%7BspDw

Database Makeover: Renovate Your Data Model
You know the concepts of normalization, logical modeling, and physical implementation. But what happens when you’ve inherited a less than perfect data model? You need to renovate while keeping the production system humming. In this session, we’ll talk about how to evaluate an existing model, how to approach design when your system is up and running and how to incrementally apply database design changes in a fast-moving environment. Every database, even ones consisting of a collection of flat, unrelated tables, has the potential to become a beautiful database. All you need is an eye for design and a strategy to get you there.

Audrey Hammonds
Audrey Hammonds is a Database Developer for ista North America in Atlanta, GA and a SQL Server MVP. Fifteen years ago, she volunteered for DBA training to escape COBOL, and never looked back. A firm believer that good fundamentals and solid design can save a database professional’s sanity, she has devoted much of her career to designing (hopefully) elegant databases and straightening up others’ not-so-elegant databases. She blogs at http://datachix.com with her partner in crime, Julie Smith.

Add to calendarAdd to calendar