Showing posts with label Database design. Show all posts
Showing posts with label Database design. Show all posts

Thursday, June 19, 2014

5 Things to Know About Indexes for SQL Server

One of the first questions I got while interviewing for a potential position as a DBA was “If there is performance problem, where is the first place you would look?”

Now at this time, I was trying to move from Software Developer to DBA, and I answered that I would look at IO, Network and errors in code. The interviewer said, “Indexes.” I always assumed that all database designers knew you needed good indexes. This changed my thinking about what I was learning to start reading more about being a DBA. SQLServerCentral.Com became my friend and I started reading one article a day and answering the Question of the Day every morning when I got to work.

1. Clustered Index

What is a clustered index? Or better yet, what do I need to know about a clustered index? The first thing to know is that you can have only one. The logical structure of the table becomes the order of the column(s) you use in a clustered index. You might hear someone say that the primary key is always the clustered index. This is not true. Any index can be the clustered index. By default, the primary key becomes the clustered index unless you change this in a script or the User Interface creating/modifying a table.

ALTER TABLE [Person].[Address]
    ADD  CONSTRAINT [PK_Address_AddressID]
    PRIMARY KEY CLUSTERED
  ( [AddressID] ASC )
  ON [PRIMARY]
GO

 

2. Non-Clustered Index

You can have more than one non-clustered index. I would suggest creating a clustered index before creating any non-clustered indexes. The clustered index column(s) become part of the non-clustered index. A query plan might need to go to the clustered index to retrieve more columns not available in the non-clustered index after a scan or search of non-clustered index. The non-clustered index can be one or more columns, and the order is important. You want the most specific column (as far as values go) as the first column. It also important to have the first column(s) to be used in the join or where clause that you are trying to be used for a query.

CREATE NONCLUSTERED INDEX
      [idxAddress_AddressLine12CistyStateZip]
    ON [Person].[Address]
( [AddressLine1] ASC, [AddressLine2] ASC, [City] ASC,
     [StateProvinceID] ASC, [PostalCode] ASC)
ON [PRIMARY]

 

3. Include Columns (i.e. Covering Index)

This is where Microsoft started shining in my opinion. The include columns in a non-clustered index can be these columns that your query is trying to retrieve from the clustered index mentioned above. Instead of retrieving more columns from the clustered index, the include columns are the ones the query needs. The are stored in the Data Page of the non-clustered index and not in the index tree.  Be careful not to go overboard with include columns, because you can eventually create a whole new table (even though it is an index) by including all the columns not specified in the index itself. I have also seen a form of Deadlock with Include Column indexes.

CREATE NONCLUSTERED INDEX
      [idxAddress_State_IncludeColumns]
    ON [Person].[Address]
( [StateProvinceID] ASC)
INCLUDE ([AddressLine1], [AddressLine2], [City], [PostalCode])
ON [PRIMARY]

 

4. Filtered Indexes

Filtered Indexes should be used carefully because they could become more of a performance problem with Insert/Update/Delete statements. They are really nice for large data warehouses where some tables are not structure. Also, I have seen them used for a column that has one value other than a empty value. Filtered indexes are new to SQL Server, so you should see performance improvements in future versions.

CREATE NONCLUSTERED INDEX
      [idxAddress_State_Filtered]
    ON [Person].[Address]
    (AddressLine1, [AddressLine2], [City], [PostalCode])
  WHERE [StateCode] = 'LA'
ON [PRIMARY]

 

5. Missing Index Feature

Now, this is really cool. With SQL Server 2008, the Query Plan now includes a Missing Index feature which basically gives you a hint on what index might be beneficial. Again, you should be cautious when it suggests include columns that look like all the columns in the table. See number 3 above.

image

In conclusion, indexes are a great help with query performance. There are some basics shown above to get you started with understanding indexes. Do not close your mind to more information about indexes after you start to understand what was mentioned in this article. I know today after doing DBA work for over 10 years, there is always more to learn.

Wednesday, May 7, 2014

SQLSaturday #308 Houston, TX Saturday May 10th

This will be the second SQLSaturday in Houston I have spoken at and I am excited about talking Execution Plans and Multidimensional Analysis Services. These are 2 topics I deal with on a daily basis, so it is great I get to help others understand these technologies.

image

I also get to talk with John Sterret, Alan Kinsel and Nancy Hide Wilson who are #SQLFamily. Their hard work on spreading the knowledge of SQL Server is a pleasure to watch.

There are many great speakers coming to Houston because the MS TechED event is the following week in Houston. So, MVPs, MCMs and veteran speakers will be at San Jacinto College – South Campus this Saturday to help all us learn to love SQL Server.

Hope to see you there!!!

Monday, May 5, 2014

SQLSaturday #305 Business Analytics Edition Dallas–RECAP

So, I am flying home thinking about the event Saturday and many things are running through my head. No, not work on Monday. First, I would like to say that Hadoop is not going to replace SQL Server nor Business Intelligence. Second, a new thing is coming but it is only going to compliment what we already do. And last, Dallas did an excellent job of spreading the learning over 4 tracks with lots to choose from – Visualization, Deployment, Big Data and Analytics. You always had something good to choice from.

The Dimensional Modeling 101 session I presented was great. Lots of sharing of individual experience with creating dimensions and facts. Most of the time was spent on talking about surrogate keys and performance. When I first started doing dimensional modeling, I had the idea that all data came from one transactional system with foreign keys on all tables. The real world proves different. That is what happens after you use some technology for awhile. There was much talk about making mistakes with dimensional modeling and learning from success and failure. There were mostly good comments and 4-5 conversations after the session, during lunch and between other sessions. I look forward to reviewing the critical comments to improve my session abstracts, session content and presentation.

The first session I attended was Intro to Hadoop. The MSBIC group from Dallas was able to get HortonWorks’ speakers which help with explaining a lot about how Big Data works. It is obvious this is a good thing for high output manufacturing systems, logging systems or internet data. Future versions will expand the current features. You could hear many questions about how it all works because you want to wrap your head around the inner workings. The visual display of the Hadoop stack helped tremendously. There is definitely a programming background needed before diving into this new technology. You can download a free sandbox from HortonWorks.

After my session (9:45), I went to a session on pig. It was very hard to follow, so afterwards I went to the speaker room to discuss with others. There, I meet back up with Meagan from Kansas City and Kristen, the leader of the Oklahoma City SQL Server user group. This conversation continued from where we left off from the Friday night speaker social. There is a lot of excitement within the SQL Server community no matter where you are.

The lunch session involved a Microsoft Technical Architect (David Brown) from Dallas that shared his experience with clients in the BI stack. He had great examples that had people laughing and relating to his project management. One comment he made that has stuck in my head is that the problems presented with BI – Self-service, Master Data, Security, Performance, etc. – was “Maybe these problems are good to have for IT.” I think I understand.

The afternoon included a HDInsight session. The cloud is here and you can use it. Next was statistics with R and Mahout which was more about how excited the speaker was rather than demos (that worked). Lastly, was a Slalom consultant talking about Time Intelligence in PowerPivot. He did a good job of listing the functions, but more time should have been spent on simple examples rather than showing everything. The crowd started to ask to many questions that he had to back up and explain.

Finally, I wanted to mention something that happened during the last PASS Summit 2013 in Charlotte that triggered me  to come to this SQLSaturday. Karla was talking with me about the Business Analytics conference coming up and I said something about how this was not SQL Server. Her comments made me think back 6-7 years ago when BI started to gain exposure at the Summit and I and others where saying “This is not SQL Server.” Now, I am in the BI stack and love it. I still love DBA work, but I expanded my skills by learning SSIS and SSAS. In Baton Rouge, you need a variety of skills to be employed in the SQL Server realm. You can not just be an expert in SSIS and expect to be employed.

Now might be the time to learn some more skills – Big Data.

Winking smile

Wednesday, April 23, 2014

SQLSaturday #305 Dallas Business Analytics Edition

I am excited about being selected to speak and attend the May 3rd SQLSaturday BA Edition in Dallas, TX. There are a number of sessions I want to attend including Eric Mezell’s An Introduction to Hadoop, Andy Egger’s Applied Predictive Analytics, David Browne on Enterprise Business Intelligence (for lunch) and Garrett Edmonson – Measuring Data Warehouse Performance. The full schedule is here - http://www.sqlsaturday.com/305/schedule.aspx.

I hope to create some time to visit with old friends in the Dallas area like Brian Smith (MS) and Tim Costello. I hope this is a precursor for Dallas to host a PASS Business Analytics week long conference in the future. California was a little to far to travel this year even though I should have went.

I am presenting one of my first BI presentations I have ever given – Dimensional Modeling 101. When I move over to Business Intelligence, I had a strong background in Data Normalization. Transitioning to a Dimension Model was quite difficult and I wish I had someone to explain like Warren Thornthwaite did at a pass PASS Summit before I started. He did a great job using PASS registration data to great a simple Data Mart and Analysis Services cube while previewing in Excel. That hooked me.

Thanks to all the sponsors for helping make this a free event for all attendees.

image

Friday, April 11, 2014

Call for Speakers – SQLSaturday #324 Baton Rouge

Baton Rouge is now on its 6th SQLSaturday and the event grows year after year. It all started with Patrick LeBlanc back in 2009 with #17 http://sqlsaturday.com/17/eventhome.aspx and with help from local User Groups (William AssafSparkhound), it has expanded its topics and attendees. This event is also called Baton Rouge Tech Fest because there are more than just SQL Server related topics. Last year, we had a CIO/Manager track, Windows Phone, SharePoint and .Net.

So, this is a Call for Speakers to the next SQLSaturday August 2dn 2014 in Baton Rouge at the Business Center on LSU’s campus. Most speakers get at least one session, so do not be shy about submitting. If you need help, practice by giving a Lighting Round at the local User Group (SQL and .Net) meeting the 2nd Wednesday of every month. There is even a Java user group meeting at the same time.

Pictures from previous SQLSaturdays in Baton Rouge

   

Tuesday, March 25, 2014

5 Tools Every DBA Should Know About

After monitoring enterprise SQL Server instances for over 10 years now, there are 5 tools that are used every day for spot check views of a system. The order here is not a countdown from 5 to 1. They are presented here in no particular order, because each has its own area where it is number one. Over the years, those around me have gain more understanding of a SQL Server instance and fine tuning that is needed to optimize the usage of these tools. Though most are free, there is a cost in performance on using these assistants and you better have a good understanding of their output.

Performance Dashboard

This feature was added in SQL Server 2005, but has versions available for 2008 and 2012. The 2008 version is actually the 2005 version with a modification to the script. It is a central location from Management Studio to view what is going on in the last 15 minutes plus drill-thru reports for current OS and SQL statistics along with historical statistics. The dashboard is divided into 4 areas – CPU Utilization, Current Waits, Current Activity and Historical/Miscellaneous Information. The links on the dashboard provided in each area drill into reports (rdl files) that get statistics from DMOs (Dynamic Management Objects) which are DMFs and DMVs you might be familiar with.

image

The top-left quadrant is for CPU Utilization. The blue part of the percentage bar is the amount of CPU used from SQL Server process, while the rest is outside of SQL Server CPU usage – Windows OS and other applications. Clicking the blue part of the CPU bar will show current running queries ordered by CPU usage. This is convenient information about was is taking up SQL CPU time at the current time. The Current Waits section lists the waits categorized by type with drill-thru capabilities to see the queries with waits. Blocking is one common wait I see. The lower left Current Activity table lets you see the requests happening along with all current connections from the User Sessions link. The 4th section has links to more historical statistics like IO by database files, Waits summed up by category, along with reports about queries sorted by Reads, Writes, CPU, Duration and more. The database IO usage can help with SAN Administrators.

 

SP_WhoIsActive

Adam Machanic has done a great job with this tool, and it is suggested to go visit the 29 days of SP_WhoIsActive to really understand what he has done. I would take my time reading his blogs and definitely let each day sink in before going to the next post. This tool gives the user the ability to see what is running with all kinds of statistics like CPU, reads, writes and the list goes on and on. I have 2 short cut keys in Management Studio (SSMS) with 2 different versions: one has the Query Plan in a column and the other does not. I also filter out some logins (sa) and databases (master & msdb).

Ctrl-6 – no Query Plan and some filtering

image

Ctrl-7 – include Query Plan with different filtering

The filter assists with clearing the noise of executing queries from certain applications. Being able to drill into the query plan from the selected query is great because you can see the execution plan in SSMS.

NOTE: SP_Who3

I have to say I still use sp_Who3 ‘Active’ first before going to SP_WhoIsActive. I remember first using this in SQL Server 2000, but Denny Cherry might have created it before then. The query uses DBCC InputBuffer to get the actual SQL statement that is associated with the SPID of concern which is not available in some cases with DMOs.

 

Built-In Reports

Back to Microsoft and the ability to use reports right from Management Studio. My favorite is Disk Usage and Disk Usage by Top Tables. Right-clicking on a database, then selecting Reports/Standard Reports/Disk Usage from the context menu gives a summary of the database data and log files. You can see how full these files are and space free. If the files have expanded recently, you can expand the amounts and time it took. We recently used this to see a new Distribution database for replication was expanding at 1MB but 507 times each minute which showed up as IO Wait.

image  image

I really like the Schema Change and backup and Restore Events, but as you can see below there are many to help with exploring a instances and its databases.

image

 

PlanExplorer

PlanExplorer is a tool to view Query/Execution Plans. This tool integrates into Management Studio (SSMS) so it can be launch from viewing a query plan, maybe after using SP_WhoIsActive. The ability to sort by different columns of statistics from the plan is great. It also compresses the graphical display for easier reading. Highlights with various colors indicate the high object usage percentages so you can drill to the iterators that are the heavy hitters. See below for a view (Fit to Screen) in SSMS followed by the PlanExplorer view.

image

SSMS

image

PlanExplorer

Should not be hard to see the difference this tools makes with viewing execution plans.

 

Profiler

Even though Extended Events will be replacing Profiler in a few versions, Profiler is still being used every day around the world. Once you master the events to trap and columns to view, this tool becomes a great way to see what is going on. Profiler enables you to save templates once you get the columns and events setup that you like. Recalling these templates helps quickly start a profiler trace.

image

Caution should be notes here that profiler can cause performance problems on systems that are used heavily. I once frozen a production system that had 16 processors and 128GB of RAM by creating a client profile and trying to save the Query Plan event on a production system. The boss was not happy. This is where I learned you can script the Profile trace out and use as a Server-Side trace, saving the trace to a file to be read later. We also removed capturing the Query Plan which I do not suggest using in a trace.

image

Sunday, March 9, 2014

5 Things a developer should know about databases

 

Database Normalization

Theoretical versus real-world

There is not a strong need for developers to know the theoretical definition of 1st thru 5th normal form, but there needs to be an understanding of some design practices. The keys are:

  1. No Duplicate data in columns in one table
  2. Relationships built as foreign keys
  3. Primary Key to identify a row
  4. Learn Many to Many relationships (this could be a whole blog)

Always look for natural key when using identity for Primary Key

The first of which is a primary key. Over the years, I have finally grasped the understanding of using an Identity column as the primary key but always design a table with a natural key. The following example that shows the difference in a Customer table.

CREATE TABLE [dbo].[Customer](
    [CustomerID] [int] IDENTITY(1,1) NOT NULL,
    [CustomerName] [varchar](40) NOT NULL,
    [AddressLine1] [varchar](60) NULL,
    [City] [varchar](30) NOT NULL,
    [StateAbbreviation] [varchar](2) NOT NULL,
    [PostalCode] [varchar](15) NOT NULL,
 CONSTRAINT [PK_Address_AddressID] PRIMARY KEY CLUSTERED 
( [CustomerID] ASC) 
) ON [PRIMARY] 
 
CREATE UNIQUE NONCLUSTERED INDEX [unc_Customer_CustomerName] 
    ON [dbo].[Customer]
([CustomerName] ASC)
ON [PRIMARY]
GO

The key is that I understand that the Identity column is not the column used to locate a customer, but the the Customer Name is really how we identify a customer. From my experience, I saw the identity column come into software development for 3 reasons:



  1. The int column takes up less space in non-clustered indexes
  2. It helps with lookups in Objects for Object Oriented coding
  3. The joins are faster and helps with related tables when used as part of the related table’s primary and/or foreign key

Lookup versus Parent-Child tables


The Customer table can be can be considered a Lookup table for customers that have Orders. The relationship between an Order Header and Order Detail would be a Parent-Child relationship. The Order Header and Detail still need to have a Natural Key just like the Customer table even though Identity is used strongly in these tables.


image


The example above show OrderID as the primary key of the OrderHeaders table, but there is a natural key to identify rows which is CustomerID + OrderDate. The OrderID column is carried down into the OrderDetail table. The column is combined with ProductID to form the primary key of the table.


 


Data Types



char vs. varchar


Remember that using a char data type will use the space required in the size in the data pages of ever row, where varchar will only take up the space used in the actual data in the column for that row.


Money not decimal


The money data type is a much better option for dollar amounts in a database than trying to specify the size and number of decimal places in a decimal data type.


nvarchar or nchar


The use of nvarchar and nchar are for storage of all Unicode characters while char and varchar are for non-Unicode characters. The nchar/nvarchar data types take up 2 times the space for a column in each row. In 25 years of IT work, I have never seen a need for Unicode characters, though some would say differently.


integers – tiny, small, int and bigint


The integer data type now have various size that are different in space in the rows of a table. If you know that a lookup table only has 10 possible rows, you do not need to use int or bigint for the ID column. You should go ahead and use smallint. See Books Online for more information.


Indexes



Learn Primary Key defaults to Clustered



As we saw in the diagram above, the CustomerID was the primary, clustered index of the Customer table without even specifying clustered. This is by default for SQL Server. You can change this to be a non-clustered primary key index, but you should always have a clustered index on a table in a transaction type database (OLTP). You could have easily specified the Unique Index on CustomerName as Clustered but be warned. The clustered index columns are included in all non-clustered indexes on a table.


Non-clustered indexes


The non-clustered index is usually created for improving the performance of queries that request data from a different column than the primary key. In the example above, a query on CustomerName in a WHERE clause of a query can return results faster because of the Unique Non-Clustered index. Another place to look for non-clustered indexes is on foreign key columns of related tables.



Overuse of Include clause


Creating covering indexes used to require including the additional columns in the actual index for help with eliminating a lookup in a query plan. The addition of the INLCUDE clause in SQL Server now lets you include additional columns at the leaf level of the index so the additional columns do not have to take of space in the tree of the index. Watch out not to overuse this option, because you can end up with more data space used in indexes than the table itself. Also, I have seen where include indexes cause Deadlocking, which is not a good thing.



T-SQL



Do not use SELECT * (specify columns)


Using SELECT * causes a couple of issues. First, it might take a full scan of the table (Clustered Index Scan). This has higher IO and Memory usage than just specifying the columns you need. Second, if someone adds additional columns to the table, it can break views and code that are not equipped to handle the additional columns


Learn difference between LEFT and INNER


Joins in T-SQL go back to the idea of Database Normalization. The INNER JOIN will return all the rows that match in both tables. The LEFT will return all rows from the table specified in the FROM part of the T-SQL and returns data to the columns specified in the JOIN table that have a match. The rows that do not have a match will have Nulls in those specified columns.

New Features of SQL Server



Windows Functions


Every new version of SQL Server comes out with improvements and additions. Yes, Microsoft does actually improve our lives. I am extremely excited to use the ROW_NUMBER() OVER PARTITION to find the latest or earliest rows in a data set. This eliminates the need to use MAX on a Date in a sub-query to join to the main query to find some matching data that is not related to the keys of the tables. There is even more functions like RANK, DENSE_RANK and NTILE.


Common Table Expressions (replaces cursors in some cases)


If you want to learn something really cool, try common table expressions. This can eliminate the use of CURSORS to loop through rows with T-SQL code. It will get you on path to understanding Set Based querying of your database.


There are many of features of SQL Server that a developer should learn, and these are just some of the common items I see new developers lack in their experience that are usually the first for me to help as a co-worker. Happy programming!!!

Sunday, March 17, 2013

PASS Data Arch VC presents Steve Hughes on Accounting for Windows Azure

I meet Steve this past November in Seattle at the PASS Summit. Reading his series on XMLA and Analysis Services help me understand more about SSAS. It was good to finally talk with this dude and get him to present to the Data Arch VC

His blog is Data on Wheels.

Please join us on March 21st at noon central to get some data architecture information for Windows Azure.

Steve Hughes Accounting for Windows Azure

Online Meeting URL: https://www.livemeeting.com/cc/UserGroups/join?id=C2NPG7&role=attend&pw=x9%25%299%23Dr2

When iCal
UTC : Thu, Mar 21 2013 17:00 - 18:00
Event Time : Thu, Mar 21 2013 12:00 - 13:00 (GMT-06:00) Central Time (US & Canada)
Accounting for Windows Azure in Your Data Architecture

Speaker: Steve Hughes, Practice Lead - Data and Business Intelligence Magenic

Summary: As Windows Azure becomes a more significant part of the IT landscape, data architects need to understand the data technologies at their disposal. Not all data can nor should be stored in the relational database in Azure. The discussion will include the various storage types available for data in Azure and how they may fit into an architecture. Learn which data storage techniques may work best for your next project with Azure and some of the nuances of using each type within a project.

About Steve: Steve Hughes is a Practice Lead at Magenic, working with Microsoft’s SQL Server stack to deliver data and BI solutions. He has worked with technology for over 15 years and is passionate about helping customers understand that data is valuable and profitable. Over the past 6 years, he has delivered more than 30 presentations on SQL Server and data architecture. In 2010, he received Consultant of the Year honors at Magenic. Steve is also a PASS Regional Mentor for the Northeast.

Sunday, February 24, 2013

PASS Data Arch VC presents: Neil Hambly’s Think You Have The Right Data-Type?

Please join this Thursday Feb 24th 2013 at noon central for Confio SQL Server expert Neil Hambly talking SQL Server Data-Types. This seemingly simply feature is the heart of tables and data in SQL Server, so you do not want to miss Neil sharing implications of not knowing the details to Data Types.

Online Meeting URL: https://www.livemeeting.com/cc/UserGroups/join?id=7BTR84&role=attend&pw=s%26Q2H%2B%3A%263
RSVP URL: DataArch.SQLPass.org

This month Confio DBA Neil Hambly presents Think you have the right Data-Type? He will talk about selecting the right Data-Type (DT) for data is key to efficient data storage, retrieval and processing, with over 30 different DT to choose from this isn’t as straight forward as you might think. Join us Feb 28th 2013 at noon central for a talk about data type design and decisions.

When

iCal
UTC : Thu, Feb 28 2013 18:00 - 19:00
Event Time : Thu, Feb 28 2013 12:00 - 13:00 (GMT-06:00) Central Time (US & Canada)
Your Local Time: 2/28/2013 12:00 - 13:00

Think you have the right Data-Type?

Speaker: Neil Hambly, Database Administrator Confio Software

Summary: Selecting the right Data-Type (DT) for data is key to efficient data storage, retrieval and processing, with over 30 different DT to choose from this isn’t as straight forward as you might think, have your also considered the usage pattern of the data in addition to its storage requirements This session explores the various data-types now available, we look at conversion’s between different DT’s as well as index sizes & effectiveness, compression and other factors can that influence these decisions, we finish off by discussing different approaches on ‘migrating’ a DT to a different DT, throughout the session we consider any performance considerations, such as when implicit conversions occur and more

About Neil: Neil Hambly is a SQL Server veteran of more than 14 years, with expertise in SQL Server from Version 6.5 to the latest 2012 edition. He's held database roles at major organizations, including the BBC, ABN AMRO, and Accenture, as well as at a number of market-leading companies. Currently a Senior DBA at Confio Software, he’s held prior roles as a DBA, developer, and architect. Leader of the PASS London Chapter, Neil can frequently be found speaking at UK and international events.

Sunday, October 14, 2012

When a Covering Index no longer covers

I received an email from my boss on the morning supervisors approve timesheets. He stated that they are complaining about an error: Deadlock. Searching the SQL Log, I was able to get the details of the deadlocking. We have Trace Flags 1204 and 1222 turned on which provides the details.

Once I got the details and having followed Bart Duncan’s Deadlock Troubleshooting, I found the victim and successful DML statements.

Next, I started a trace to get some parameter values for the victim T-SQL. By the way, the UPDATE statement was the higher priority statement that succeeded and a SELECT was the deadlock victim. Not the classic UPDATE/UPDATE example so many people use to illustrate deadlocking. When I got the Execution Plan for the victim, I noticed the Seek was on a Non-Clustered index with the INCLUDE option (Covering Index). After the Seek, there was a lookup on the Cluster Index.

Now, months ago I added the covering index to help this same SQL statement, but now it needed 2 extra columns from original suggested index.

What changed? Well, before I got the answer on what changed, I dropped the index so the users could proceed and not get the error, with some performance issues (it was taking 5-6 seconds instead of less than one)

What Changed? The previous week, the development team released a new version of the application and add 2 columns to the SELECT statement.

So, here is an example of what happened with the Adventure Works database.

USE [AdventureWorks2012]
GO
 
SELECT [SalesOrderID], [RevisionNumber], [OrderDate], [DueDate]
  FROM [Sales].[SalesOrderHeader]
  WHERE OrderDate Between '20070101' AND '20070101'

If you run the above query with Include Execution Plan, you will get a Missing Index like the following:



USE [AdventureWorks2012]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [Sales].[SalesOrderHeader] ([OrderDate])
INCLUDE ([SalesOrderID],[RevisionNumber],[DueDate])
GO

The Execution Plan is using a Clustered Index scan to find the data with a cost of 0.54


Adding the index, the Execution Plan now does an Index Seek with a cost of 0.0033. Big improvement!!!


Now, when I add a column to the SELECT query that is not in the Covering Index:


image


The INCLUDE column index (covering index) is used in a SEEK, but then a Key Lookup is added to get the additional 2 columns from the Clustered Index.


So, how does this cause a deadlock. I have seen this problem many times over the last 5-6 years.


The SELECT statement starts to SEEK the Covering Index at the same time an UPDATE statement locks the clustered index to update, then tries to update the covering Index. But, the SELECT statement has a shared lock on the Covering Index (Non-clustered Index) that is now trying to place a shared lock on the Clustered Index, which is locked by the UPDATE statement. BOOM!!! Deadlock!!!


How do you fix this?


First, you can add the 2 columns to the covering index and the SELECT no longer needs a Key Lookup on the clustered index. This is fine, but I wish it could be found in development and not production.


Or, you could just remove the Covering Index (like I initially did) and find other ways with the development team to improve performance. I do not believe they even knew what I did when I added the Covering Index, nor did I know that had added some additional columns to the SELECT statement.

Tuesday, October 9, 2012

Part II - Louis Davidson (Dr SQL)–Designing for Common Problems in SQL Server…continued

The PASS Data Architecture Virtual Chapter will host part II of Designing for Common Problems in SQL Server by Dr. SQL Louis Davidson as he helps application and database developers design and implement SQL Server databases.

I saw Louis at the first PASS Summit I attended in Denver in 2006. He and Paul Nielsen help me see where data models and normalization can increase the efficiency with storing and retrieving data.

They also encouraged me by their presentations to start talking about database design at work and in the SQL community.

Please join us on Thursday October 11th at Noon Central for another informative presentation.

Subject: Designing for Common Problems in SQL Server Part II 

Start Time: Thursday October 11th, 2012 12:00 PM US Central Time

End Time: Thursday October 11th, 2012 1:00 PM US Central Time

Presenter: Louis Davidson

Live Meeting Link: Live Meeting

Designing for Common Problems in SQL Server II 

In this session, I will do a design and code review of several common patterns of solving problems that a typical programmer will come up against. Problems like coding for hierarchical data, data driven design, dealing with image data, structure generalization, user specified schemas, dimensional reporting, and dealing with uniqueness beyond what you might deal with using a simple uniqueness constraint might allow you to deal with.

Louis Davidson

Louis has been in the IT industry for 17 years as a corporate database developer and architect. He has been a Microsoft MVP for 8 years and has written 5 books on database design and has spoken on the topic of database design and implementation at SQL PASS, SQL Rally, many SQL Saturday events, as well as Devlink. Currently serves as the Data Architect for the Christian Broadcasting Network supporting offices in Virginia Beach, Virginia and Nashville, Tennessee. Louis has a bachelor's degree from the University of Tennessee at Chattanooga in computer science. For more information please visit his website at drsql.org.

Sunday, August 19, 2012

PASS Virtual Chapter August presentations

If you do not know by now, PASS has many virtual chapters that give free webcasts all month long. I have helped chair the Data Architecture VC for the past year or so, started by volunteering  with the Performance VC for about 6 months and have done presentations on the BI/DW VC. Please visit the Virtual Chapter site for more information - http://www.sqlpass.org/PASSChapters/VirtualChapters.aspx

Tuesday at noon Eastern, I will be presenting on SSIS 2012 – CDC, DQS and Catalog for the BI/DW VC. More info below.

Also, the recorded presentation from this past Thursday’s Data Architecture VC is now uploaded in the Meeting Archive. Louis Davidson will do part 2 later this fall. His example code can be found at DrSQL.org in the presentation section.

Tue August 21st 12pm-1pm EST

Using SSIS 2012 for ETL in a Data Warehouse

Speaker: Thomas LeBlanc

URL: https://www.livemeeting.com/cc/usergroups/join?id=GQ5DKK&role=attend&pw=J%3A%2BW4%3F%28hK

This session will go through some old and new features of Integration Service 2012 to help with loading data into a Data Warehouse/Mart. The GUI has been improved visually during design and execution. The storage of packages has been update with Integration Service Catalogs. The Change Data Capture task helps with incremental updates. Logging has been improved for debugging. A demonstration of creating your own Slowly Changing Dimension with current Tasks will be shown with some design hints.

Friday, August 10, 2012

Louis Davidson (Dr SQL)–Designing for Common Problems in SQL Server

The PASS Data Architecture Virtual Chapter hosts none other than Dr. SQL Louis Davidson as he helps application and database developers design and implement SQL Server databases.

I saw Louis at the first PASS Summit I attended in Denver in 2006. He and Paul Nielsen help me see where data models and normalization can increase the efficiency with storing and retrieving data.

They also encouraged me by their presentations to start talking about database design at work and in the SQL community.

Please join us on Thursday August 16th at Noon Central for another informative presentation.

 

Subject:  Designing for Common Problems in SQL Server

Start Time:  Tuesday, August 16th, 2012 12:00 PM US Central Time

End Time:  Tuesday, August 16th, 2012 1:00 PM US Central Time

Presenter:  Louis Davidson

Live Meeting Link:  Live Meeting

Designing for Common Problems in SQL Server

In this session, I will do a design and code review of several common patterns of solving problems that a typical programmer will come up against. Problems like coding for hierarchical data, data driven design, dealing with image data, structure generalization, user specified schemas, dimensional reporting, and dealing with uniqueness beyond what you might deal with using a simple uniqueness constraint might allow you to deal with.

Louis Davidson

Louis has been in the IT industry for 17 years as a corporate database developer and architect. He has been a Microsoft MVP for 8 years and has written 5 books on database design and has spoken on the topic of database design and implementation at SQL PASS, SQL Rally, many SQL Saturday events, as well as Devlink. Currently serves as the Data Architect for the Christian Broadcasting Network supporting offices in Virginia Beach, Virginia and Nashville, Tennessee. Louis has a bachelor's degree from the University of Tennessee at Chattanooga in computer science. For more information please visit his website at drsql.org.

Add to calendarAdd to calendar

Thursday, September 8, 2011

Database Standards: Links to past blogs

After a great session on 24 Hours of PASS, lots of people wanted the SOP document. While I cannot give the one from the company to you, here is a post of the links to the blogs about them

Parts 1 thru 6 and Lookup tables. Please review and place comments on the blog!!!

Database Standards Part I: Defs, Abbreviation & Data Types

Database Standards Part II: Schemas, Tables & View and Columns

Database Standards Part III: Indexes, Constraints & Primary/Foreign Keys

Database Standards Part IV: Stored Procedures

Database Standards Part V: Triggers and User-defined Functions

 Database Standards Part VI: code and design

Lookup tables

I will answer the other questions in a blog next week as soon as I can get come reliable answers. Thanks again for attending and all the great questions and comments. Thanks to PASS for this opportunity

God Bless,

Thomas

Friday, July 16, 2010

Database Lookup Tables

Over the years, I have noticed a trend in building databases. A developer will get on a certain path when it comes to what I define as Lookup tables. There are many names or labels the community has placed on this Type/Code/Category of tables, but I want to give 3 examples of the types we ran into in the latest creation of a database.


The below list and table structure was standardized once so everybody (developer, DBA, BI, etc.) would be in agreement. After the Create Table below, please read for an explanation of the columns and types.

1. Drop-down list - this table is really just a pick list for a transaction or category type.
    Example
        BillingClass: B - Billable, NB - NonBillable, etc.
        RequstType: P - Phone, L - Letter, etc.

2. Developer Control - developers usually like to have some tables that cannot be change by end user because it would change the flow of the application code. This group of developers wanted to enumerate the IDs in .Net code. Too bad we (DBAs) can use enumeration in a Query window 
    Example
        WorkFlow - I - Initial, U - Updated, P - Processed & C - Close

3. End User Control - end users want to be able to add to the list for use in the application, but unlike the drop-down list, there are other properties to the Item.
    Example
        RevenueCode - 435 - Inventory stream, 321 - visit cost, etc.

The first 2 types have a consistent structure:

CREATE TABLE [Billing].[lkClass](
    [Code] [varchar](25) NOT NULL,
    [ID] [int] NOT NULL,
    [Name] [varchar](100) NOT NULL,
    [Description] [varchar](1000) NULL,
    [DisplayOrder] [int] NOT NULL,
    [RowState] [int] NOT NULL,
    [TimeStamp] [timestamp] NULL,
  CONSTRAINT [PK_lkClass] PRIMARY KEY CLUSTERED
    ([Code] ASC)
      WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
        IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
        ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
  CONSTRAINT [uc_lkClass_ID] UNIQUE NONCLUSTERED
    ( [ID] ASC )
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
       IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
       ALLOW_PAGE_LOCKS = ON) ON [DB_INDEX],
  CONSTRAINT [uc_lkClass_Name] UNIQUE NONCLUSTERED
      ( [Name] ASC )
     WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
        IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
        ALLOW_PAGE_LOCKS = ON) ON [DB_INDEX]
  ) ON [PRIMARY]

The code is 25 characters because we voted to have a varchar () that would satisfy all possibilities. Why still use Code when there is a unique ID field? End users still like to be able to type known code/type values into an interface for fast data entry. Even with the intelliSence possibilities, I personally do not see codes going away with end users. Also, reports need abbreviated versions of lookup to help compact display. This is the foreign key to a related table, not the ID.

The ID is used for .Net objects that want to reference a list with an integer, rather than using a Search/Find method to locate the entry in a list/collection. The Name is really the long description, while the Description is what you might call a memo field (notice MS Access reference).

DisplayOrder is self-explanatory and RowState has values like 0-active, 1-inactive, and other numbers reserved for future use. This allows the row to be ‘deleted’ – inactive, but not removed from the table. Timestamp is really the RowVersion data type from SQL Server data types. We have not changed this to RowVersion, yet.

Even though State Abbreviation is known as only 2 characters and you need only 2 columns in the table, we still use this structure because the Object Oriented interface/object code using the same structure is global throughout the project. The end user control type table will include the additional columns on the same table. We do not create a one-to-one relationship to another table just for more attibutes, in most cases.

Primary key is in the Primary file group because it is the clustered index, but the other Unique Constraints are in the INDEX File Group in order to place the Clustered Index (table) on a different file group than the Indexes. The Unique constraints are used to make sure duplicate IDs and/or Names are never possible even though the developers promise that will prevent that in the Code.

Thursday, July 8, 2010

Database Standards Part VI: code and design

Database Standards Part VI:  code and design

This is Part 6 of 6 blogs about database standards. This last post is about coding standards that should be followed.

Design
A data dictionary and diagram should be maintained. The diagram can be a bird's eye view. The dictionary contains tables with description and columns with data type, size, default, constraints and descriptions. The foriegn keys would be listed along with primary keys.

Coding

- Avoid using SELECT *.
- Use CTEs instead of Cursors
- Dynamic SQL is difficult to read, confuses security and not very maintainable.
- NOLOCK and READUNCOMITTED can only be used when absolutely necessary.
- INNER JOIN not JOIN
- prefix objects with schema, even dbo.
- Use TRY/CATCH for error trapping
- only use parentheses for AND\OR expressions
- spaces between expressions and variables
   Type = 'S' NOT type='S'

- BEGIN/END should only be used with there is more than one execution line after condition


Formatting
    Bad:
Select Sample.*, Analysis.*,

Sample.DueDate, Sample.EnteredBy
FROM Sample JOIN
Analysis on Sample.SampleID = Analysis.SampleID

   Better:
SELECT s.SampleID, s.Description AS SampDesc,
    a.AnlCode AS AnalysisCode
  FROM Sample s
      INNER JOIN Analysis a
        ON a.SampleID = s.SampleID
  WHERE s.SampleID = @SampleID


Next set of Blogs will be a diary of the transition from production senior DBA to BI Data Integration Lead Developer.

Thomas LeBlanc
Outoging Production Senior DBA

Thursday, June 17, 2010

Database Standards Part V: Triggers and User-defined Functions

Database Standards Part V: Triggers and User-defined Functions

 
This is Part 5 of 6 blogs about database standards. Again, though we do not enforce these with an iron fist, they are advised for clarity and consistency. The important point, in my opinion, is to establish a set of standards and stick to them.
 
Triggers
Naming convention for triggers are 'it' for insert, 'ut' for update and dt for delete (itPatient). Trigger can slow CRUD statements so we warn developers to be careful when creating triggers. It also can 'hide' logic from development. Over and over, we get confused people trying to find how data gets added to a table because of a trigger on another table. Verify during a review that the trigger can handle multi-row updates when joining to the inserted or deleted recordsets. Most triggers I see are related to auditing Ins/Upd/Del statements.
 
User-Defined Functions
User-defined functions are prefixed with udf. They can be helpful with string manipulation commonly used in stored procedures or SQL statements. UDFs do add processing time to the execution and can cause scans when used on fields in a WHERE clause. Looking at an execution plan might exclude the code from the UDF. I see table-valued UDFs used like DMFs from dynamic management objects from Microsoft, mostly on small data sets.
 
Also, when  profiling, you get a statement for each call to the udf for each row in the SELECT statement if used in a query, so you might want to exclude the data in the profiled output.
 
Repeating again, and again…

Though my opinion is not the same as some of these standards, I do believe standards should be discussed and established, then followed. If a deviation is needed, a group discussion should be voted on to change.

Next week I will talk about coding and deisgn standards.

Thomas LeBlanc

Friday, June 4, 2010

Database Standards Part IV: Stored Procedures


This is Part 4 of 6 blogs about database standards. Again, though we do not enforce these with an iron fist, they are advised for clarity and consistency. The important point, in my opinion, is to establish a set of standards and stick to them.

Stored Procedure
Stored procedures should be required for all create, read, update and delete (CRUD) DML (Data Manipulation Language) statements. Using stored procedures benefits security, consistence, performance and manageability on the DBA side of applications.

Security can be managed by giving GRANT EXECUTE to logins requiring access to DML statements, or in 2005 and greater EXECUTE can be granted to a login on the entire database. Performance gains are realized in compiled query plans and SQL Server’s ability to re-compile only the statement in the SP that needs a new/better plan.

Naming conventions:
  Read - uspPatientGet or GetPatient
  Delete - uspPatientDel or DelPatient
  Report – rptSecurityAudit
  Process – prcProcessPatient

The use of Schemas can better organize the SPs. Do not use prefixes sp, sp_ or xp_. These are used by the SQL Server system or custom extended procedures.

Precompiled SQL can use less memory in the Procedure Cache and require less look up in Proc Cache for existing plans.

Remember not to repeat the schema name in the SP – Patient.GetDetails instead of Patient.GetPatientDetails.

SET ANSI_NULLS ON can be a requirement because it will be depreciated in future SQL Server versions.

SET NOCOUNT ON can be prevent round trips to the client - http://msdn.microsoft.com/en-us/library/ms189837.aspx

Using a Source Control application can reduce comments in a SP. The versioning in TFS or Source Safe gives the developer the ability to associate comments with the check in. Our SPs begin with the following structure

IF NOT EXISTS (SELECT * FROM Sys.Objects
                              WHERE Object_Id = OBJECT_ID(N’[Ptient].[rptPatientReleased] ’)
                                 AND type in (N’P’, N’PC’))
  BEGIN
    CREATE PROCEDURE [Ptient].[rptPatientReleased]
      AS RETURN
    GO
  END

ALTER PROCEDURE [Aptient].[rptPatientReleased] AS
    SELECT TOP 10 * FROM NewProcedureView
  GO

Always end the SP in GO when creating or altering. The Alter is preferred because it retains permissions on SP. If DROP/CREATE is used, remember to apply the permissions. Return should not be used to return data to the calling application. Return should be reserved for returning the status of the SP. Use INPUT and OUTPUT parameters in the SP.

Repeating again, and again…
Though my opinion is not the same as some of these standards, I do believe standards should be discussed and established, then followed. IF a deviation is needed, a group discussion should be voted on to change.

Next week I will talk about Triggers and User-Defined Functions.

Thomas LeBlanc

Tuesday, May 25, 2010

Database Standards Part III: Indexes, Constraints & Primary/Foreign Keys

This is Part 3 of 6 blogs about database standards we use at my current employer. Repeating myself, though we do not enforce these with an iron fist, they are advised for clarity and consistency. The important point, in my opinion, is to establish a set of standards and stick to them.


Indexes
The naming of indexes is not unanimous. The standard is for a prefix of idx plus the table name then the field names with an underscore between the table and field name – idxPatient_LocationCode. This will create some very long names. I prefer abbreviation of the table and fields names, or like another DBA a reference name of where the index will be used.

A reminder to developers and contractors is that a primary key clustered index is an index. We have seen duplicate indexes created because of this. I kid not. Also, clustered indexes are included in non-clustered indexes, so a multi-column clustered index has to be seriously reviewed before implemented.

Understanding covering indexes and how they are used is important to reduce the columns used in some indexes. Fields that are regular updated probably should not be in a clustered index.

A reindex maintenance plan needs to be implemented. I will try to cover the custom one we use in a later blog or webcast.

Constraints
A unique constraint has its own index, so do not create an index on a unique constraint.

Use prefix chk (chkDigits) for check constraint and uc (ucPatientNumber) for unique constraint naming.

Default definitions should be in line with the create table statement and default objects have been deprecated, so do not use. Do not use data type conversion in constraint definitions.

Primary Key and Foreign keys
All tables have to have a primary key. When an identity field is used, a unique constraint must be created from a separate field(s) to uniquely identify a row. Every table needs to be able to be queried based on the contents, not an identity field. The normalization talks I have been privileged to give, explains more about this with the 1st normal form portion.

All foreign keys have to be indexes and should be the primary or unique constraint key from the parent table.

Naming convention is prefixed with pk for primary key (pkPatientNumber). Foreign Key naming is prefix fk then parent table name underscore column(s) underscore child table name and last underscore column(s) – fkPatient_ID_Admit_PatientID

Repeating Again, and again…
Though my opinion is not the same as some of these standards (I say you can abbrev. anything), I do believe standards should be discussed and established, then followed. IF a deviation is needed, a group discussion should be voted on to change.

Next week I will talk about Stored Procedures.

Thomas LeBlanc

Thursday, May 13, 2010

Database Standards Part II: Schemas, Tables & View and Columns

This is Part 2 of 6 blogs about database standards we use at my current employer. Repeating myself, though we do not enforce these with an iron fist, they are advised for clarity and consistency. The important point, in my opinion, is establish a set of standards and stick to them.

Schema

The seperation of schmea name and logins was a great addition in SQL Server 2005. Schema names should be used in place of dbo. This makes viewing in SSMS great with filters. It also categorizes the objects so a new DBA or developer can get a good view of the organization in a database. We suggest a Common schema for objects that do not fit in a group, then names that are specific to functionality - Payroll, HR, Billing, etc.

Tables
Tables names should reflect its functionality and they cannot start with a number, contain spaces, underscores or be reserved words. ANSI_NULL (OFF is deprecated in a future version) should be SET ON. Pascal Casing should be used like ClientBill and singular like Admit not Admits.

There are 3 different kinds of Lookup tables we see - developer populated (workflow), small lists (state) and known list that are updattable (RevenueCode). Small lookup tables can use a character primary key. If the developer insists on ID (identity), then a character code/description column created with a unique contraint must be included. The code should be used in the transaction table. (I will clarify this is a seperate blog). Lookup tables use camel case - lkState.

The transaction table(s) should use a ID (indentity or sequential int) as the primary, but should have a unique contraint for uniquely identifying a row. Third Key Normal Form and higher is required. The ID field will be the foreign key in the child table, and named ID (PatientID in an Episode table).

Views
Camel casing should be used in naming views - vwPatientEpisode. Future thinking should be used if the view will eventually become an indexed view, then schema binging and ansi settings need to be looked at.

Columns
Column names should describe the data placed in them, using Pascal Casing - LastName and not start with a number/special character, no spaces, dashes or underscores. Beware of reserve words or key words as object names. Do not repeat table name - use LastName not PatientLastName unless a foreign key to another table - PatientID. No data type in the name - intPatientCount, but Date is acceptable in a date name - AdmitDate.

Be aware of new features in 2008 like a seperate DATE and TIME data types. Before, you had to use midnight as the time in a date column type in order to not specify a time.

Repeating
Though my opinion is not the same as some of these standards (I say you can abbrev. anything), I do believe standards should be discussed and established, then followed. IF a deviation is needed, a group discussion should be voted on to change.


Next week I will talk about Indexes, constraints and Primary/Foreigh Keys.


Thomas LeBlanc