Tuesday, August 28, 2012

Lookup transformation in SSIS: Performance Improvements

There are 2 suggestions I hear about when performance tuning the Lookup transformation on a large dimension table when getting the surrogate key for a fact table. I hope to introduce here a third.

The first is the caching option.

image

 

The Full Cache option will cache the whole table (or query columns). If you have a large dimension table, this might use a lot of memory.

The Partial Cache will load the matching rows but will clear least used lookup values once the memory size of the cache is exceed.

No Cache will query the data each and every time a lookup value is needed. No good is running lots of lookups on a large table

 

I believe using Full Cache is good when you have no memory limitations and a small dimension table. The Partial Cache is good for large tables. Not sure where No Cache would be used unless the lookup is not used much in the Data Flow Task.

The next suggestion is instead of selecting the whole table, use a T-SQL query to select only the columns you need,

image

By using only the needed columns in the SELECT T-SQL statement, you limit the amount of information retrieve into SQL Server’s Buffer Pool and the SSIS cache.

One more option can help improve this query even more – a proper index.

SELECT ProductSKey, ProductID
  FROM DimProduct
WHERE Status = 1

This query used above will still do a Clustered Index scan or Non-clustered Index with a Key Lookup in the execution plan based on the indexes available like an index on the just the status column.


But, if you can create a covering index with the Status as a Non-Clustered Index including ProductSKey (surrogate key) and ProductID (natural key) in the index, you can get better performance on the SQL instance side.


The cost of the query went from 0.0178649 to 0.0065309. The statistics IO went from 21 logical reads to 7 and the execution went from a Clustered Index scan to a Non-clustered Index Seek.


image  image


Using the partial cache and proper index on dimension table helped reduce a look up on one of our dimension tables from 3 seconds to less than one second on a 200,000+ row dimension table at my current employer. The execution performance was viewed in the new SSISDB statistics report in the 2012 version of SSIS. More on that in another blog.

Sunday, August 19, 2012

PASS Virtual Chapter August presentations

If you do not know by now, PASS has many virtual chapters that give free webcasts all month long. I have helped chair the Data Architecture VC for the past year or so, started by volunteering  with the Performance VC for about 6 months and have done presentations on the BI/DW VC. Please visit the Virtual Chapter site for more information - http://www.sqlpass.org/PASSChapters/VirtualChapters.aspx

Tuesday at noon Eastern, I will be presenting on SSIS 2012 – CDC, DQS and Catalog for the BI/DW VC. More info below.

Also, the recorded presentation from this past Thursday’s Data Architecture VC is now uploaded in the Meeting Archive. Louis Davidson will do part 2 later this fall. His example code can be found at DrSQL.org in the presentation section.

Tue August 21st 12pm-1pm EST

Using SSIS 2012 for ETL in a Data Warehouse

Speaker: Thomas LeBlanc

URL: https://www.livemeeting.com/cc/usergroups/join?id=GQ5DKK&role=attend&pw=J%3A%2BW4%3F%28hK

This session will go through some old and new features of Integration Service 2012 to help with loading data into a Data Warehouse/Mart. The GUI has been improved visually during design and execution. The storage of packages has been update with Integration Service Catalogs. The Change Data Capture task helps with incremental updates. Logging has been improved for debugging. A demonstration of creating your own Slowly Changing Dimension with current Tasks will be shown with some design hints.

Friday, August 10, 2012

Louis Davidson (Dr SQL)–Designing for Common Problems in SQL Server

The PASS Data Architecture Virtual Chapter hosts none other than Dr. SQL Louis Davidson as he helps application and database developers design and implement SQL Server databases.

I saw Louis at the first PASS Summit I attended in Denver in 2006. He and Paul Nielsen help me see where data models and normalization can increase the efficiency with storing and retrieving data.

They also encouraged me by their presentations to start talking about database design at work and in the SQL community.

Please join us on Thursday August 16th at Noon Central for another informative presentation.

 

Subject:  Designing for Common Problems in SQL Server

Start Time:  Tuesday, August 16th, 2012 12:00 PM US Central Time

End Time:  Tuesday, August 16th, 2012 1:00 PM US Central Time

Presenter:  Louis Davidson

Live Meeting Link:  Live Meeting

Designing for Common Problems in SQL Server

In this session, I will do a design and code review of several common patterns of solving problems that a typical programmer will come up against. Problems like coding for hierarchical data, data driven design, dealing with image data, structure generalization, user specified schemas, dimensional reporting, and dealing with uniqueness beyond what you might deal with using a simple uniqueness constraint might allow you to deal with.

Louis Davidson

Louis has been in the IT industry for 17 years as a corporate database developer and architect. He has been a Microsoft MVP for 8 years and has written 5 books on database design and has spoken on the topic of database design and implementation at SQL PASS, SQL Rally, many SQL Saturday events, as well as Devlink. Currently serves as the Data Architect for the Christian Broadcasting Network supporting offices in Virginia Beach, Virginia and Nashville, Tennessee. Louis has a bachelor's degree from the University of Tennessee at Chattanooga in computer science. For more information please visit his website at drsql.org.

Add to calendarAdd to calendar