Friday, July 30, 2010

SQLSaturday, BI and What I am reading

SQLSaturday # 28 (http://www.sqlsaturday.com/28/eventhome.aspx) is coming to Baton Rouge on August 14th at LSU in the CEBA building. We have 58 sessions and 490+ registered attendees.

This is the second SQLSaturday for BR thanks to SQL Server MVP Patrick LeBlanc leadership and employees of Sparkhound and Antares as well as LSU. Patrick is my brother from a different mother.

I spoke last year on RML Utilities and SQLNexus, but this year the session I will be presenting is “Third Normal Form: That’s Crazy Talk” (http://www.sqlsaturday.com/viewsession.aspx?sat=28&sessionid=1323). This seems to be my calling as a presentation that I have locked down. Volunteering has been interesting as we watch things change month to month by our fearless leader and great politician.


The other news is I have started a new position at Amedisys, moving from Senior DBA to BI Data Integration Lead Developer. The last 3 weeks have been filled with meetings and documentation review. They have already assigned me tech specs to complete, assist with latest Data Mart ETL and improve performance on the End of Month process. My whole career has revolved around normalized databases, which I am very passionate about. Changing my thinking is difficult (I am 42), and seems to be the biggest barrier. Please pray for me.

Louis Davidson's book Pro SQL Server 2008 Relational Database Design and Implementation (http://drsql.org/ProSQLServerDatabaseDesign.aspx)  is a great and long read that I believe is very comprehensive when it comes to educational reading for database design. I am using the reading of this book to help me prepare for SQLSaturday #28 as well as Houston TechFest (http://www.houstontechfest.com/)  on Oct. 9th. It looks like Houston has accepted 2 sessions, the one above and Whiteboarding Normalization (http://www.houstontechfest.com/dotnetnuke/HoustonTechFest/Sessions/tabid/56/CodecampId/3/SessionId/218/Default.aspx) for their sessions packed event.

Wish me luck, and come back for some more talk about Normalization, and Denormalization at this blog.

God Bless,
Thomas LeBlanc

Friday, July 16, 2010

Database Lookup Tables

Over the years, I have noticed a trend in building databases. A developer will get on a certain path when it comes to what I define as Lookup tables. There are many names or labels the community has placed on this Type/Code/Category of tables, but I want to give 3 examples of the types we ran into in the latest creation of a database.


The below list and table structure was standardized once so everybody (developer, DBA, BI, etc.) would be in agreement. After the Create Table below, please read for an explanation of the columns and types.

1. Drop-down list - this table is really just a pick list for a transaction or category type.
    Example
        BillingClass: B - Billable, NB - NonBillable, etc.
        RequstType: P - Phone, L - Letter, etc.

2. Developer Control - developers usually like to have some tables that cannot be change by end user because it would change the flow of the application code. This group of developers wanted to enumerate the IDs in .Net code. Too bad we (DBAs) can use enumeration in a Query window 
    Example
        WorkFlow - I - Initial, U - Updated, P - Processed & C - Close

3. End User Control - end users want to be able to add to the list for use in the application, but unlike the drop-down list, there are other properties to the Item.
    Example
        RevenueCode - 435 - Inventory stream, 321 - visit cost, etc.

The first 2 types have a consistent structure:

CREATE TABLE [Billing].[lkClass](
    [Code] [varchar](25) NOT NULL,
    [ID] [int] NOT NULL,
    [Name] [varchar](100) NOT NULL,
    [Description] [varchar](1000) NULL,
    [DisplayOrder] [int] NOT NULL,
    [RowState] [int] NOT NULL,
    [TimeStamp] [timestamp] NULL,
  CONSTRAINT [PK_lkClass] PRIMARY KEY CLUSTERED
    ([Code] ASC)
      WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
        IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
        ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
  CONSTRAINT [uc_lkClass_ID] UNIQUE NONCLUSTERED
    ( [ID] ASC )
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
       IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
       ALLOW_PAGE_LOCKS = ON) ON [DB_INDEX],
  CONSTRAINT [uc_lkClass_Name] UNIQUE NONCLUSTERED
      ( [Name] ASC )
     WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
        IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
        ALLOW_PAGE_LOCKS = ON) ON [DB_INDEX]
  ) ON [PRIMARY]

The code is 25 characters because we voted to have a varchar () that would satisfy all possibilities. Why still use Code when there is a unique ID field? End users still like to be able to type known code/type values into an interface for fast data entry. Even with the intelliSence possibilities, I personally do not see codes going away with end users. Also, reports need abbreviated versions of lookup to help compact display. This is the foreign key to a related table, not the ID.

The ID is used for .Net objects that want to reference a list with an integer, rather than using a Search/Find method to locate the entry in a list/collection. The Name is really the long description, while the Description is what you might call a memo field (notice MS Access reference).

DisplayOrder is self-explanatory and RowState has values like 0-active, 1-inactive, and other numbers reserved for future use. This allows the row to be ‘deleted’ – inactive, but not removed from the table. Timestamp is really the RowVersion data type from SQL Server data types. We have not changed this to RowVersion, yet.

Even though State Abbreviation is known as only 2 characters and you need only 2 columns in the table, we still use this structure because the Object Oriented interface/object code using the same structure is global throughout the project. The end user control type table will include the additional columns on the same table. We do not create a one-to-one relationship to another table just for more attibutes, in most cases.

Primary key is in the Primary file group because it is the clustered index, but the other Unique Constraints are in the INDEX File Group in order to place the Clustered Index (table) on a different file group than the Indexes. The Unique constraints are used to make sure duplicate IDs and/or Names are never possible even though the developers promise that will prevent that in the Code.

Thursday, July 8, 2010

Database Standards Part VI: code and design

Database Standards Part VI:  code and design

This is Part 6 of 6 blogs about database standards. This last post is about coding standards that should be followed.

Design
A data dictionary and diagram should be maintained. The diagram can be a bird's eye view. The dictionary contains tables with description and columns with data type, size, default, constraints and descriptions. The foriegn keys would be listed along with primary keys.

Coding

- Avoid using SELECT *.
- Use CTEs instead of Cursors
- Dynamic SQL is difficult to read, confuses security and not very maintainable.
- NOLOCK and READUNCOMITTED can only be used when absolutely necessary.
- INNER JOIN not JOIN
- prefix objects with schema, even dbo.
- Use TRY/CATCH for error trapping
- only use parentheses for AND\OR expressions
- spaces between expressions and variables
   Type = 'S' NOT type='S'

- BEGIN/END should only be used with there is more than one execution line after condition


Formatting
    Bad:
Select Sample.*, Analysis.*,

Sample.DueDate, Sample.EnteredBy
FROM Sample JOIN
Analysis on Sample.SampleID = Analysis.SampleID

   Better:
SELECT s.SampleID, s.Description AS SampDesc,
    a.AnlCode AS AnalysisCode
  FROM Sample s
      INNER JOIN Analysis a
        ON a.SampleID = s.SampleID
  WHERE s.SampleID = @SampleID


Next set of Blogs will be a diary of the transition from production senior DBA to BI Data Integration Lead Developer.

Thomas LeBlanc
Outoging Production Senior DBA