Insights on Authentic Intelligence – Bob Leithiser, Ph.D. | Insights from my research and work related to large databases, SQL Server, predictive analytics, and automated problem solving

Auto-Generating Pivot Views

Posted on July 10, 2009 by Bob L

Coming back from the Sci-Fi world of AI, etc to some real world scenarios… Earlier I posted a generic stored procedure that automatically unpivots data so that columns become rows. Today, I provide the inverse capability, although not truly generic at this point, but the technique is generic.

Recently, I’ve been working with Microsoft System Center Configuration Management (SCCM) reporting and needed to create a pivot view of a task sequence. If you’re not immersed in SCCM, the example probably won’t make much sense, but the technique for generating the view may be useful for other scenarios. SCCM provides a lot, (and I mean ALOT) of views and reports out of the box and it provides the ability to create your own reports. SCCM 2007 stores all of it’s data in a SQL Server 2005 database along with the views. For more about SCCM, see http://www.microsoft.com/systemcenter/configurationmanager/en/us/default.aspx. For more about creating custom reports, see http://www.microsoft.com/downloads/details.aspx?FamilyId=87BBE64E-5439-4FC8-BECC-DEB372A40F4A&displaylang=en

As much as I’d like to get into SCCM more, let’s focus on the pivot view generation which is applicable to any scenario. In our scenario, we have multiple steps for which to report the status for a task sequence. A task sequence is a series of tasks to perform on a selected set of computers (known as a collection). You can think of it as a workflow, and SCCM lets you define whether a sub-task must complete successfully, or may be allowed to fail and allow the task sequence to continue. Task sequences may be re-run on a given machine.

Our business case is that we need to track how well our task sequences are doing and we want to drill down on specific sub-tasks (steps) that have meaning to our specific workflow and provide a single view of how all the sub-tasks performed within a date range.

SCCM captures this information in a view they provide called I created a view on top of this to ensure that only the distinct steps are captured per advertisement)

CREATE VIEW [dbo].[V_Step_Summary] AS
SELECT AdvertisementId, Step, 
SUM(CASE WHEN ExitCode = 0 THEN 1 ELSE 0 END) AS SuccessCount,
SUM(CASE WHEN ExitCode <> 0 THEN 1 ELSE 0 END) AS FailureCount,
COUNT(*) AS AttemptCount
FROM [dbo].[v_TaskExecutionStatus]
GROUP BY AdvertisementId, Step

I also created a view for extracting the status of the steps by date since the underlying data is stored by date/time, and this is SQL 2005, so I can’t just down-convert to date. I need to be able to group by date, so that the users can see the count of success/failures in a single day.

CREATE VIEW [dbo].[V_Step_Summary_ByDate] AS
SELECT TOP 100 PERCENT AdvertisementId, Step, 
DATEADD(DD,DATEDIFF(DD,0,ExecutionTime),0) AS ExecutionDate,
SUM(CASE WHEN ExitCode = 0 THEN 1 ELSE 0 END) AS SuccessCount,
SUM(CASE WHEN ExitCode <> 0 THEN 1 ELSE 0 END) AS FailureCount,
COUNT(*) AS AttemptCount
FROM [dbo].[v_TaskExecutionStatus]
GROUP BY AdvertisementId, Step, 
DATEADD(DD,DATEDIFF(DD,0,ExecutionTime),0)

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

OK, so now we have the building blocks, let’s look at the views that pivot the data. We have 3 views: 1 to pivot just the success count, 1 to pivot just the failure count, and 1 to outer join both together to give us both failures and success.

Brace yourself, here are the views:

CREATE VIEW [dbo].[V_StepPivot_SuccessCount] AS   SELECT 
  AdvertisementID, ExecutionDate, 'SuccessCount' as CountType,
 [00] AS [S00],
 [01] AS [S01],
 [02] AS [S02],
 [03] AS [S03],
 [04] AS [S04],
 [05] AS [S05],
 [06] AS [S06],
 [07] AS [S07],
 [08] AS [S08],
 [09] AS [S09],
 [10] AS [S10],
 [11] AS [S11],
 [12] AS [S12],
 [13] AS [S13],
 [14] AS [S14],
 [15] AS [S15],
 [16] AS [S16],
 [17] AS [S17],
 [18] AS [S18],
 [19] AS [S19],
 [20] AS [S20],
 [21] AS [S21],
 [22] AS [S22],
 [23] AS [S23],
 [24] AS [S24],
 [25] AS [S25],
 [26] AS [S26],
 [27] AS [S27],
 [28] AS [S28],
 [29] AS [S29],
 [30] AS [S30],
 [31] AS [S31],
 [32] AS [S32],
 [33] AS [S33],
 [34] AS [S34],
 [35] AS [S35],
 [36] AS [S36],
 [37] AS [S37],
 [38] AS [S38],
 [39] AS [S39],
 [40] AS [S40],
 [41] AS [S41],
 [42] AS [S42],
 [43] AS [S43],
 [44] AS [S44],
 [45] AS [S45],
 [46] AS [S46],
 [47] AS [S47],
 [48] AS [S48],
 [49] AS [S49],
 [50] AS [S50],
 [51] AS [S51],
 [52] AS [S52],
 [53] AS [S53],
 [54] AS [S54],
 [55] AS [S55],
 [56] AS [S56],
 [57] AS [S57],
 [58] AS [S58],
 [59] AS [S59],
 [60] AS [S60],
 [61] AS [S61],
 [62] AS [S62],
 [63] AS [S63],
 [64] AS [S64],
 [65] AS [S65],
 [66] AS [S66],
 [67] AS [S67],
 [68] AS [S68]
  FROM  (SELECT AdvertisementID,   ExecutionDate,Step,SuccessCount FROM dbo.V_Step_Summary_ByDate) p
 PIVOT (SUM (SuccessCount) For Step in (
 [00],
 [01],
 [02],
 [03],
 [04],
 [05],
 [06],
 [07],
 [08],
 [09],
 [10],
 [11],
 [12],
 [13],
 [14],
 [15],
 [16],
 [17],
 [18],
 [19],
 [20],
 [21],
 [22],
 [23],
 [24],
 [25],
 [26],
 [27],
 [28],
 [29],
 [30],
 [31],
 [32],
 [33],
 [34],
 [35],
 [36],
 [37],
 [38],
 [39],
 [40],
 [41],
 [42],
 [43],
 [44],
 [45],
 [46],
 [47],
 [48],
 [49],
 [50],
 [51],
 [52],
 [53],
 [54],
 [55],
 [56],
 [57],
 [58],
 [59],
 [60],
 [61],
 [62],
 [63],
 [64],
 [65],
 [66],
 [67],
 [68]
     )) AS PVT
GO
CREATE VIEW [dbo].[V_StepPivot_FailureCount] AS   SELECT 
  AdvertisementID, ExecutionDate, 'FailureCount' as CountType,
 [00] AS [F00],
 [01] AS [F01],
 [02] AS [F02],
 [03] AS [F03],
 [04] AS [F04],
 [05] AS [F05],
 [06] AS [F06],
 [07] AS [F07],
 [08] AS [F08],
 [09] AS [F09],
 [10] AS [F10],
 [11] AS [F11],
 [12] AS [F12],
 [13] AS [F13],
 [14] AS [F14],
 [15] AS [F15],
 [16] AS [F16],
 [17] AS [F17],
 [18] AS [F18],
 [19] AS [F19],
 [20] AS [F20],
 [21] AS [F21],
 [22] AS [F22],
 [23] AS [F23],
 [24] AS [F24],
 [25] AS [F25],
 [26] AS [F26],
 [27] AS [F27],
 [28] AS [F28],
 [29] AS [F29],
 [30] AS [F30],
 [31] AS [F31],
 [32] AS [F32],
 [33] AS [F33],
 [34] AS [F34],
 [35] AS [F35],
 [36] AS [F36],
 [37] AS [F37],
 [38] AS [F38],
 [39] AS [F39],
 [40] AS [F40],
 [41] AS [F41],
 [42] AS [F42],
 [43] AS [F43],
 [44] AS [F44],
 [45] AS [F45],
 [46] AS [F46],
 [47] AS [F47],
 [48] AS [F48],
 [49] AS [F49],
 [50] AS [F50],
 [51] AS [F51],
 [52] AS [F52],
 [53] AS [F53],
 [54] AS [F54],
 [55] AS [F55],
 [56] AS [F56],
 [57] AS [F57],
 [58] AS [F58],
 [59] AS [F59],
 [60] AS [F60],
 [61] AS [F61],
 [62] AS [F62],
 [63] AS [F63],
 [64] AS [F64],
 [65] AS [F65],
 [66] AS [F66],
 [67] AS [F67],
 [68] AS [F68]
  FROM  (SELECT AdvertisementID,   ExecutionDate,Step,FailureCount FROM dbo.V_Step_Summary_ByDate) p
 PIVOT (SUM (FailureCount) For Step in (
 [00],
 [01],
 [02],
 [03],
 [04],
 [05],
 [06],
 [07],
 [08],
 [09],
 [10],
 [11],
 [12],
 [13],
 [14],
 [15],
 [16],
 [17],
 [18],
 [19],
 [20],
 [21],
 [22],
 [23],
 [24],
 [25],
 [26],
 [27],
 [28],
 [29],
 [30],
 [31],
 [32],
 [33],
 [34],
 [35],
 [36],
 [37],
 [38],
 [39],
 [40],
 [41],
 [42],
 [43],
 [44],
 [45],
 [46],
 [47],
 [48],
 [49],
 [50],
 [51],
 [52],
 [53],
 [54],
 [55],
 [56],
 [57],
 [58],
 [59],
 [60],
 [61],
 [62],
 [63],
 [64],
 [65],
 [66],
 [67],
 [68]
     )) AS PVT

CREATE VIEW [dbo].[V_StepPivot_FailureCount] AS   SELECT 
  AdvertisementID, ExecutionDate, 'FailureCount' as CountType,
 [00] AS [F00],
 [01] AS [F01],
 [02] AS [F02],
 [03] AS [F03],
 [04] AS [F04],
 [05] AS [F05],
 [06] AS [F06],
 [07] AS [F07],
 [08] AS [F08],
 [09] AS [F09],
 [10] AS [F10],
 [11] AS [F11],
 [12] AS [F12],
 [13] AS [F13],
 [14] AS [F14],
 [15] AS [F15],
 [16] AS [F16],
 [17] AS [F17],
 [18] AS [F18],
 [19] AS [F19],
 [20] AS [F20],
 [21] AS [F21],
 [22] AS [F22],
 [23] AS [F23],
 [24] AS [F24],
 [25] AS [F25],
 [26] AS [F26],
 [27] AS [F27],
 [28] AS [F28],
 [29] AS [F29],
 [30] AS [F30],
 [31] AS [F31],
 [32] AS [F32],
 [33] AS [F33],
 [34] AS [F34],
 [35] AS [F35],
 [36] AS [F36],
 [37] AS [F37],
 [38] AS [F38],
 [39] AS [F39],
 [40] AS [F40],
 [41] AS [F41],
 [42] AS [F42],
 [43] AS [F43],
 [44] AS [F44],
 [45] AS [F45],
 [46] AS [F46],
 [47] AS [F47],
 [48] AS [F48],
 [49] AS [F49],
 [50] AS [F50],
 [51] AS [F51],
 [52] AS [F52],
 [53] AS [F53],
 [54] AS [F54],
 [55] AS [F55],
 [56] AS [F56],
 [57] AS [F57],
 [58] AS [F58],
 [59] AS [F59],
 [60] AS [F60],
 [61] AS [F61],
 [62] AS [F62],
 [63] AS [F63],
 [64] AS [F64],
 [65] AS [F65],
 [66] AS [F66],
 [67] AS [F67],
 [68] AS [F68]
  FROM  (SELECT AdvertisementID,   ExecutionDate,Step,FailureCount FROM dbo.V_Step_Summary_ByDate) p
 PIVOT (SUM (FailureCount) For Step in (
 [00],
 [01],
 [02],
 [03],
 [04],
 [05],
 [06],
 [07],
 [08],
 [09],
 [10],
 [11],
 [12],
 [13],
 [14],
 [15],
 [16],
 [17],
 [18],
 [19],
 [20],
 [21],
 [22],
 [23],
 [24],
 [25],
 [26],
 [27],
 [28],
 [29],
 [30],
 [31],
 [32],
 [33],
 [34],
 [35],
 [36],
 [37],
 [38],
 [39],
 [40],
 [41],
 [42],
 [43],
 [44],
 [45],
 [46],
 [47],
 [48],
 [49],
 [50],
 [51],
 [52],
 [53],
 [54],
 [55],
 [56],
 [57],
 [58],
 [59],
 [60],
 [61],
 [62],
 [63],
 [64],
 [65],
 [66],
 [67],
 [68]
     )) AS PVT
GO
CREATE VIEW [dbo].[V_StepPivot_Count] AS SELECT sc.AdvertisementID,   sc.ExecutionDate,  
 [S00], [F00],
 [S01], [F01],
 [S02], [F02],
 [S03], [F03],
 [S04], [F04],
 [S05], [F05],
 [S06], [F06],
 [S07], [F07],
 [S08], [F08],
 [S09], [F09],
 [S10], [F10],
 [S11], [F11],
 [S12], [F12],
 [S13], [F13],
 [S14], [F14],
 [S15], [F15],
 [S16], [F16],
 [S17], [F17],
 [S18], [F18],
 [S19], [F19],
 [S20], [F20],
 [S21], [F21],
 [S22], [F22],
 [S23], [F23],
 [S24], [F24],
 [S25], [F25],
 [S26], [F26],
 [S27], [F27],
 [S28], [F28],
 [S29], [F29],
 [S30], [F30],
 [S31], [F31],
 [S32], [F32],
 [S33], [F33],
 [S34], [F34],
 [S35], [F35],
 [S36], [F36],
 [S37], [F37],
 [S38], [F38],
 [S39], [F39],
 [S40], [F40],
 [S41], [F41],
 [S42], [F42],
 [S43], [F43],
 [S44], [F44],
 [S45], [F45],
 [S46], [F46],
 [S47], [F47],
 [S48], [F48],
 [S49], [F49],
 [S50], [F50],
 [S51], [F51],
 [S52], [F52],
 [S53], [F53],
 [S54], [F54],
 [S55], [F55],
 [S56], [F56],
 [S57], [F57],
 [S58], [F58],
 [S59], [F59],
 [S60], [F60],
 [S61], [F61],
 [S62], [F62],
 [S63], [F63],
 [S64], [F64],
 [S65], [F65],
 [S66], [F66],
 [S67], [F67],
 [S68], [F68]
  FROM V_StepPivot_FailureCount fc
   FULL OUTER JOIN V_StepPivot_SuccessCount sc 
    ON sc.AdvertisementId = fc.AdvertisementId
     AND sc.ExecutionDate = fc.ExecutionDate

Wow, 68 columns are required – 1 for each step! That’s a lot of pivoting. If you think I typed in all 68 column names, you’re wrong! I generated the code using the below stored procedure. I didn’t bother using SP_EXECUTESQL – just output and then manually cut and paste the output into a new query window and then run it from there. It wouldn’t be that hard to put this all into a variable and then execute the variable to build the view directly from the stored proc. Note the retrieval from V_Step_Summary in order to generate the list of columns

CREATE PROCEDURE [dbo].[USP_Generate_StepPivot_Views_V3] AS 
BEGIN
    SET NOCOUNT ON
    SELECT 'CREATE VIEW V_StepPivot_SuccessCount AS   SELECT ' AS Line
    UNION ALL SELECT '  AdvertisementID, ExecutionDate, ''SuccessCount'' as CountType,'
    UNION ALL select distinct 
      (' [' + (CASE WHEN STEP < 10 THEN '0' + CONVERT(VARCHAR(1), STEP) ELSE CONVERT(VARCHAR(2),Step) END) + ']') 
    +
      (' AS [S' + (CASE WHEN STEP < 10 THEN '0' + CONVERT(VARCHAR(1), STEP) ELSE CONVERT(VARCHAR(2),Step) END) + 
    (CASE WHEN STEP = (SELECT MAX(STEP) FROM V_Step_Summary) THEN ']' ELSE '],' END))
        FROM dbo.V_Step_Summary

    UNION ALL SELECT '  FROM  (SELECT AdvertisementID,   ExecutionDate,Step,SuccessCount FROM dbo.V_Step_Summary_ByDate) p'
    UNION ALL SELECT ' PIVOT (SUM (SuccessCount) For Step in ('
    
--  This part generates the pivoted column list by selecting the distinct rows that 
-- correspond to the pivot columns to create.  The case statement is used to identify
-- the last step.


UNION ALL select distinct 
      (' [' + (CASE WHEN STEP < 10 THEN '0' + CONVERT(VARCHAR(1), STEP) ELSE CONVERT(VARCHAR(2),Step) END) +
    (CASE WHEN STEP = (SELECT MAX(STEP) FROM V_Step_Summary) THEN ']' ELSE '],' END))
        FROM dbo.V_Step_Summary


    UNION ALL SELECT '     )) AS PVT'
    UNION ALL SELECT 'GO'

    UNION ALL SELECT 'CREATE VIEW V_StepPivot_FailureCount AS   SELECT ' AS Line
    UNION ALL SELECT '  AdvertisementID, ExecutionDate, ''FailureCount'' as CountType,'
    UNION ALL select distinct 
      (' [' + (CASE WHEN STEP < 10 THEN '0' + CONVERT(VARCHAR(1), STEP) ELSE CONVERT(VARCHAR(2),Step) END) +']')
    +
      (' AS [F' + (CASE WHEN STEP < 10 THEN '0' + CONVERT(VARCHAR(1), STEP) ELSE CONVERT(VARCHAR(2),Step) END) + 
    (CASE WHEN STEP = (SELECT MAX(STEP) FROM V_Step_Summary) THEN ']' ELSE '],' END))
        FROM dbo.V_Step_Summary
    UNION ALL SELECT '  FROM  (SELECT AdvertisementID,   ExecutionDate,Step,FailureCount FROM dbo.V_Step_Summary_ByDate) p'
    UNION ALL SELECT ' PIVOT (SUM (FailureCount) For Step in ('
-- This part generates the pivoted column list by selecting the distinct rows that 
-- correspond to the pivot columns to create.  The case statement is used to identify
-- the last step.
    UNION ALL select distinct 
      (' [' + (CASE WHEN STEP < 10 THEN '0' + CONVERT(VARCHAR(1), STEP) ELSE CONVERT(VARCHAR(2),Step) END) +
    (CASE WHEN STEP = (SELECT MAX(STEP) FROM V_Step_Summary) THEN ']' ELSE '],' END))
        FROM dbo.V_Step_Summary
    UNION ALL SELECT '     )) AS PVT'

    UNION ALL SELECT 'GO'
    UNION ALL SELECT 'CREATE VIEW V_StepPivot_Count AS SELECT sc.AdvertisementID,   sc.ExecutionDate, '
    UNION ALL select distinct 
      (' [S' + (CASE WHEN STEP < 10 THEN '0' + CONVERT(VARCHAR(1), STEP) ELSE CONVERT(VARCHAR(2),Step) END) + '],' 
    +
      (' [F' + (CASE WHEN STEP < 10 THEN '0' + CONVERT(VARCHAR(1), STEP) ELSE CONVERT(VARCHAR(2),Step) END) 
    +
    (CASE WHEN STEP = (SELECT MAX(STEP) FROM V_Step_Summary) THEN ']' ELSE '],' END)))
        FROM dbo.V_Step_Summary
    UNION ALL SELECT '  FROM V_StepPivot_FailureCount fc'
    UNION ALL SELECT '   FULL OUTER JOIN V_StepPivot_SuccessCount sc '
    UNION ALL SELECT '    ON sc.AdvertisementId = fc.AdvertisementId'
    UNION ALL SELECT '     AND sc.ExecutionDate = fc.ExecutionDate'
    UNION ALL SELECT 'GO'
END

OK, so you ask “how long did it take you to write that stored procedure”? Didn’t it take longer than if you’d just written the views by hand? Yes, probably did, but not by much. And I didn’t have to spend much time testing the output view, since it was generated directly from the data that defines the view as being correct. Plus what happens if the number of steps change, do I want to change the pivot view by hand each time? For example, we could actually modify the stored proc to put the code into a variable which it then executes, rather than outputting. Once that is in place, we can create a scheduled task to automatically run this stored procedure on a scheduled basis to ensure that the pivot view is always correct.

Technorati Tags: SQL Server,SCCM,Tips

Posted in Uncategorized | Leave a comment

Generating a List of Calendar Dates

Posted on July 6, 2009 by Bob L

For today, here’s a simple trick. Ever need to get a list of all the calendar dates for a period? This is very simple using a user defined function with a table. Below is a simple version. I have a more complex version I am using for my application that filters based on another table containing holidays, etc.

-- =============================================
-- Author:        Bob Leithiser
-- Create date: 7/5/2009
-- Description:    Return a list of dates in table format for a specified date range
-- =============================================
CREATE FUNCTION [Util].[udf_GetCalendarDates]
(
    @StartDate DATE, 
    @EndDate DATE
)
RETURNS 
@CalendarDates TABLE 
(
    -- Add the column definitions for the TABLE variable here
    CalendarDate DATE
)
AS
BEGIN
    -- Fill the table variable with the rows for your result set
    DECLARE @CalendarDate DATE = @StartDate
    WHILE @CalendarDate  <= @EndDate
    BEGIN
        BEGIN
            INSERT INTO @CalendarDates (CalendarDate) VALUES (@CalendarDate)
        END
        SET @CalendarDate = DATEADD(DD,1,@CalendarDate)
    END
    RETURN 
END

GO


select * from util.udf_GetCalendarDates('20090101','20090130')

and now we get:

CalendarDate
2009-01-01
2009-01-02
2009-01-03
2009-01-04
2009-01-05
2009-01-06
2009-01-07
2009-01-08
2009-01-09
2009-01-10
2009-01-11
2009-01-12
2009-01-13
2009-01-14
2009-01-15
2009-01-16
2009-01-17
2009-01-18
2009-01-19
2009-01-20
2009-01-21
2009-01-22
2009-01-23
2009-01-24
2009-01-25
2009-01-26
2009-01-27
2009-01-28
2009-01-29
2009-01-30

Here’s an example where I found this useful. I needed to generate a list of holidays to avoid processing of days not containing any business activity data as part of an analytic/simulation application.

insert into Load.Holiday (ExchangeName,HolidayDate)
select '*', CalendarDate
from util.udf_GetCalendarDates('2007-05-01','2009-07-04')
where not exists
(select distinct marketdate from dbo.EquityHistory where MarketDate = CalendarDate )
and DATEPART(WEEKDAY,CalendarDate) NOT IN (1,7)

select * from Load.holiday

(20 row(s) affected)

ExchangeName    HolidayDate
*    2007-05-28
*    2007-07-04
*    2007-09-03
*    2007-11-22
*    2007-12-25
*    2008-01-01
*    2008-01-21
*    2008-02-18
*    2008-03-21
*    2008-05-26
*    2008-07-04
*    2008-09-01
*    2008-11-27
*    2008-12-25
*    2009-01-01
*    2009-01-19
*    2009-02-16
*    2009-04-10
*    2009-05-25
*    2009-07-03

Posted in Uncategorized | Leave a comment

Dumping SQL data in pivoted format

Posted on June 17, 2009 by Bob L

Technorati Tags: SQL Server,Tips

If you’re like me and spend a lot of time in SQL Query Analyzer, querying data directly, you may find the column display format tedious for tables with lots of columns or where you are only working with a couple of entries in the table anyways.

Consider the following data that I was just trying to dump out as part of another blog post related to my doctoral research on automated software.

That’s not too bad, but I only have 5 columns. What if I have a lot more as in

select * from HumanResources.Employee Where EmployeeId = 1 using adventureworks database:

Those are just the first few columns, Unless you have a 30 inch wide screen with 3000 pixels across, you still won’t be able to see everything across the width of the screen without scrolling.

Wouldn’t it be nice if we could just do something like exec util_PivotAllColumns

EXEC  [dbo].[util_PivotAllColumns]

            @FromSpecifier = N'Person.Contact',

            @AfterFromClause = 'WHERE ContactId = 1',

            @ColumnList = '*',

            @PrintSelectStatement = 1

and have SQL like below generated automatically,

SELECT 0 AS ColSeq, 'EmployeeID' AS ColName, CONVERT(NVARCHAR(MAX),[EmployeeID]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 1 AS ColSeq, 'NationalIDNumber' AS ColName, CONVERT(NVARCHAR(MAX),[NationalIDNumber]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 2 AS ColSeq, 'ContactID' AS ColName, CONVERT(NVARCHAR(MAX),[ContactID]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 3 AS ColSeq, 'LoginID' AS ColName, CONVERT(NVARCHAR(MAX),[LoginID]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 4 AS ColSeq, 'ManagerID' AS ColName, CONVERT(NVARCHAR(MAX),[ManagerID]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 5 AS ColSeq, 'Title' AS ColName, CONVERT(NVARCHAR(MAX),[Title]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 6 AS ColSeq, 'BirthDate' AS ColName, CONVERT(NVARCHAR(MAX),[BirthDate]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 7 AS ColSeq, 'MaritalStatus' AS ColName, CONVERT(NVARCHAR(MAX),[MaritalStatus]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 8 AS ColSeq, 'Gender' AS ColName, CONVERT(NVARCHAR(MAX),[Gender]) AS ColValue FR
OM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 9 AS ColSeq, 'HireDate' AS ColName, CONVERT(NVARCHAR(MAX),[HireDate]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 10 AS ColSeq, 'SalariedFlag' AS ColName, CONVERT(NVARCHAR(MAX),[SalariedFlag]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 11 AS ColSeq, 'VacationHours' AS ColName, CONVERT(NVARCHAR(MAX),[VacationHours]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 12 AS ColSeq, 'SickLeaveHours' AS ColName, CONVERT(NVARCHAR(MAX),[SickLeaveHours]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 13 AS ColSeq, 'CurrentFlag' AS ColName, CONVERT(NVARCHAR(MAX),[CurrentFlag]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 14 AS ColSeq, 'rowguid' AS ColName, CONVERT(NVARCHAR(MAX),[rowguid]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 15 AS ColSeq, 'ModifiedDate' AS ColName, CONVERT(NVARCHAR(MAX),[ModifiedDate]) AS ColValue 
FROM HumanResources.Employee WHERE EmployeeId = 1

 UNION ALL SELECT 1 AS ColSeq, 'NameStyle' AS ColName, CONVERT(NVARCHAR(MAX),[NameStyle]) AS ColValue 
FROM Person.Contact

 
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }

So, we could see our data like this directly from Query Analyzer

Enter the util_PivotAllColumns stored proc:

/****** Object:  StoredProcedure [dbo].[util_PivotAllColumns]    Script Date: 06/16/2009 16:43:29 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

-- =============================================

-- Author:        Bob Leithiser

-- Create date: 6/16/2009

-- Description:    Pivots all columns from a table and selects primary key value if specified as a parameter

-- WARNING: This isn't safe from SQL Injection, not to be used for production, just a testing/dumping tool.

-- Uses sys.columns view from current database where the stored proc is located, so you must create in every

-- database that you want to use this in.

-- =============================================

CREATE PROCEDURE [dbo].[util_PivotAllColumns]

  @FromSpecifier NVARCHAR(MAX),

  @AfterFromClause NVARCHAR(MAX) = NULL, -- typically the where clause, but make it flexible for group by, etc.

  -- Downside is that this means user must supply the WHERE keyword rather than just the criteria.

  @ColumnList NVARCHAR(MAX) = '*',

  @PrintSelectStatement BIT = 0

AS

BEGIN

/* Usage Example:

EXEC [dbo].[util_PivotAllColumns]

     @FromSpecifier = N'Person.Contact',

     @AfterFromClause = 'WHERE ContactId = 1',

     @ColumnList = '*',

     @PrintSelectStatement = 1

-- You can omit the optional parameters and for a quick table dump just do:

EXEC [dbo].[util_PivotAllColumns] 'Tablename'

*/

  -- TODO: Validate input parameters and add try/catch exception handling

  -- TODO: Support multiple tables in the FROM clause

  -- TODO: Add parsing to support column list

    SET NOCOUNT ON

    -- Assuming just a single table at this point, not parsing for multiple

    -- DECLARE @TableName SYSNAME =  OBJECT_NAME(OBJECT_ID(@FromSpecifier))

    -- DECLARE @SchemaName SYSNAME = OBJECT_SCHEMA_NAME(OBJECT_ID(@FromSpecifier))

    -- Get the column list if not provided

    IF COALESCE(@ColumnList,'*') = '*'

    BEGIN

        DECLARE @SQLCmd NVARCHAR(MAX)

        DECLARE @ColumnName SYSNAME

        DECLARE ColumnCursor CURSOR FOR

        SELECT [name]

            FROM sys.columns

            WHERE object_id = OBJECT_ID(@FromSpecifier) -- Need to add parsing for multiple tables, joins, etc.

        OPEN ColumnCursor

        FETCH NEXT FROM ColumnCursor INTO @ColumnName

        DECLARE @ColOrder INT = 0

        WHILE @@FETCH_STATUS = 0

        BEGIN

            DECLARE @ColumnSpecifier NVARCHAR(2000) = N''

            -- Once past first column, tack on UNION ALL

            IF @ColOrder > 0

                SET @ColumnSpecifier = N' UNION ALL SELECT '

                ELSE SET @ColumnSpecifier = N'SELECT '

            -- Add the column SEQuencer

            SET @ColumnSpecifier =

                @ColumnSpecifier + CONVERT(NVARCHAR(2000),@ColOrder) + N' AS ColSeq, '

                -- 2000 columns ought to be enougn

            -- Add the column NAME

            SET @ColumnSpecifier = @ColumnSpecifier + N'''' + @ColumnName    + N''' AS ColName, '

            -- Add the column VALUE - Have to convert to same type - use nvarchar - so all the unions get along

            SET @ColumnSpecifier = @ColumnSpecifier + N'CONVERT(NVARCHAR(MAX),[' + @ColumnName    + ']) 
AS ColValue'

            -- Add the FROM clause and AFTER From Clause (typically the WHERE clause - must include WHERE)

            SET @ColumnSpecifier = @ColumnSpecifier + N' FROM ' + @FromSpecifier + ' ' + 
COALESCE(@AfterFromClause,'')

            -- Add the column specifier to the SQL Command String and toss in c/r l/f to make more source query

            -- readable

            SET @SQLCmd = COALESCE(@SQLCmd,N'') + CHAR(13) + CHAR(10) + @ColumnSpecifier

            -- Increment the column sequencer

            SET @ColOrder = @ColOrder + 1

            FETCH NEXT FROM ColumnCursor INTO @ColumnName

END

        CLOSE ColumnCursor

        DEALLOCATE ColumnCursor

END

    ELSE BEGIN

    -- Parse the column list and do inline replacements

        PRINT 'Sorry, I dont parse column lists yet'

END

    SELECT @SQLCmd    -- For debugging

    IF @PrintSelectStatement = 1

    BEGIN

        PRINT @SQLCmd

END

    EXEC sp_ExecuteSQL @stmt = @SQLCmd

END

GO

I asked around a little and found out about some neat dynamic SQL generators for pivoting, see

http://www.sommarskog.se/pivot_sp.sp

http://www.sqlteam.com/article/dynamic-cross-tabs-pivot-tables

but these were overkill for what I needed, plus I wanted something quick and easy to use without having to think about how to summarize the data.

codeproject

Posted in Uncategorized | Leave a comment

SQL Data Warehousing – Tip 2 – Indexed Views

Posted on May 13, 2009 by Bob L

Last time we discussed the use of a period table to consolidate analysis into smaller segments of data. This can be done for an analytical application where the details are not needed or as part of a rollup. Today, we’re going to look at how to use the period table. For the application discussed in part 1, it would be useful to have information related to analyzing the period quickly accessible without having to aggregate all of the history rows. To do this, we create an indexed view with schema binding.

There are several rules for indexed view discussed in the “Creating Indexed Views” article at http://msdn.microsoft.com/en-us/library/ms191432.aspx. This includes not using AVG, MIN, MAX and using COUNT_BIG with SUM in order to derive the averages as well as restrictions requiring the use of ANSI_NULLS, QUOTED_IDENTIFIER as summarized below:

Posted in Uncategorized | Leave a comment

SQL Data Warehousing Tips 1 – Building a Period table

Posted on May 5, 2009 by Bob L

Recently, I started a personal project for analyzing stock market trends and calculating the optimal trading strategies. I’ve learned a lot of new stuff about LINQ and SQL 2008 from this. Based on that I’m opening a new series on data warehouse tips.

In my project, one of the things I quickly realized after calculating that I would need to generate about 1 billion rows in order to have sufficient detail for analysis of 5,000 publicly traded stocks with 1 year of history was that I needed to consolidate the data. It was obvious I needed a way to summarize data into periods and use those as dimensions for analyzing the performance, rather than calculating all of this from the detailed data.

What would be nice, would be to have periods that could overlap and align. For example, I want to be able to go back and look at last month, last quarter, last 6 months, and last year. However each one of these requires a different set of transactions because even though the closing date is the same, the opening date is different. The variation in open date affects the price at which a stock would be purchased at (or short-sold at), thus leading to a different set of transactions based on an entry/exit strategy. In order to focus on the technical aspect of this, I won’t get into all of the application design considerations. Suffice to say, that discrete, yet aligned periods were needed.

So I immediately wrote a stored procedure (final version below) to try to insert my periods and decided to leverage the merge capability.

Below is the table definition of the Period Table

CREATE TABLE [dbo].[Period](
    [PeriodId] [smallint] NOT NULL,
    [StartDate] [date] NOT NULL,
    [EndDate] [date] NOT NULL,
   [MonthsInPeriod] AS (datediff(month,[StartDate],dateadd(day,(1),[EndDate]))),
CONSTRAINT [PK_Period] PRIMARY KEY CLUSTERED
(
    [PeriodId] ASC
)
) ON [PRIMARY]

Note, that we can calculate the number of months in the period (assumes all periods are in 1 month increments) easily using a computed field with datediff

And the Merging (upsert) stored procedure

CREATE PROCEDURE [dbo].[UpsertMonthlyPeriod]
    @PeriodDate date,
    @Months int
AS BEGIN
    DECLARE @EndDate date
    SET @EndDate = DATEADD(DD,-1,DATEADD(MM,@Months, @PeriodDate))
    MERGE INTO Period AS Target
    USING (VALUES (@PeriodDate, @EndDate)) AS Source (NewStartDate, NewEndDate)
    ON Target.StartDate = Source.NewStartDate
    AND Target.EndDate =Source.NewEndDate
    WHEN NOT MATCHED BY TARGET THEN
        INSERT (StartDate, EndDate)
        VALUES (NewStartDate, NewEndDate);
END

This is very simple example of the new Merge functionality. The USING portion identifies the values to use for mapping the source data to the target, which in this case was simply start-date and end-date (remember I am basically allowing overlapping periods for analysis purposes).

The below stored procedure then actually generates the periods by calling the Upsert stored proc:

CREATE PROCEDURE [dbo].[SetupPeriods]
    @StartDate date,
    @Periods int,
    @CreateCummulative bit
AS BEGIN
    DECLARE @x int = 0
    DECLARE @PeriodDate date
    SET @PeriodDate = @StartDate

    WHILE @x < @Periods
    BEGIN
    — Create Monthly periods
        DECLARE @PeriodMonths int
        EXEC dbo.UpsertMonthlyPeriod @PeriodDate = @PeriodDate, @Months=1
    — Create Quarerly periods
        IF @x % 3 = 0 OR @CreateCummulative =1
        BEGIN
            EXEC dbo.UpsertMonthlyPeriod @PeriodDate = @PeriodDate, @Months=3
        END
    — Create Bi-Annual periods
        IF @x % 6 = 0 OR @CreateCummulative =1
        BEGIN
            EXEC dbo.UpsertMonthlyPeriod @PeriodDate = @PeriodDate, @Months=6
        END
    — Create Annual periods
        IF @x % 12 = 0 OR @CreateCummulative =1
        BEGIN
            EXEC dbo.UpsertMonthlyPeriod @PeriodDate = @PeriodDate, @Months=12
        END
        SET @x = @x+1
        SET @PeriodDate = DATEADD(MM,@x,@StartDate)
    END
END

The end result is the data shown below:

PeriodId    StartDate    EndDate    MonthsInPeriod
8041    2008-04-01    2008-04-30    1
8043    2008-04-01    2008-06-30    3
8046    2008-04-01    2008-09-30    6
8040    2008-04-01    2009-03-31    12
8051    2008-05-01    2008-05-31    1
8053    2008-05-01    2008-07-31    3
8056    2008-05-01    2008-10-31    6
8050    2008-05-01    2009-04-30    12
8061    2008-06-01    2008-06-30    1
8063    2008-06-01    2008-08-31    3
8066    2008-06-01    2008-11-30    6
8071    2008-07-01    2008-07-31    1
8073    2008-07-01    2008-09-30    3
8076    2008-07-01    2008-12-31    6
8081    2008-08-01    2008-08-31    1
8083    2008-08-01    2008-10-31    3
8086    2008-08-01    2009-01-31    6
8091    2008-09-01    2008-09-30    1
8093    2008-09-01    2008-11-30    3
8096    2008-09-01    2009-02-28    6
8101    2008-10-01    2008-10-31    1
8103    2008-10-01    2008-12-31    3
8106    2008-10-01    2009-03-31    6
8111    2008-11-01    2008-11-30    1
8113    2008-11-01    2009-01-31    3
8116    2008-11-01    2009-04-30    6
8121    2008-12-01    2008-12-31    1
8123    2008-12-01    2009-02-28    3
9011    2009-01-01    2009-01-31    1
9013    2009-01-01    2009-03-31    3
9021    2009-02-01    2009-02-28    1
9023    2009-02-01    2009-04-30    3
9031    2009-03-01    2009-03-31    1
9041    2009-04-01    2009-04-30    1

…

10011 2010-01-01…

But, wait! Where did the PeriodId come from? It’s not an identity field and the numbers appear to follow a pattern.

Can you guess the pattern?

…

Yes, that’s right – last 2 digits of year + 2 digit month + length of period (except that 12 month period is considered “0”). This was so that it could still fit in smallint field. I’ve got millions of rows linking back to the period and want to keep the index small, so didn’t want the extra digit.

The trick for this was an insert/update trigger shown below:

CREATE TRIGGER [dbo].[Period_IU_Trig]
ON [dbo].[Period]
AFTER INSERT, UPDATE
AS
BEGIN
— Set Period Id to be meaningful (YY + MM + Period Length)

    SET NOCOUNT ON;
    UPDATE Period
    SET PeriodId = CAST(CAST(DATEPART(YY,INSERTED.StartDate) – 2000 AS VARCHAR(2)) +
       CAST(RIGHT(100 + (DATEPART(MM,INSERTED.StartDate)),2) AS VARCHAR(2)) +
       CAST((CASE WHEN INSERTED.MonthsInPeriod = 12 THEN 0 ELSE INSERTED.MonthsInPeriod END)AS CHAR(1)) AS SMALLINT)
    FROM INSERTED
    WHERE INSERTED.PeriodId = Period.PeriodId

END

Note, I normally don’t advocate having “meaningful” surrogate keys (an oxymoron), but in this case, its very handy in testing to quickly identify the period without going back to the source table.

To present this to the users, I provide a view that formats the dates to show the date range in descending order of the period along with the length in months. For this scenario, this is easier to work with than a calendar control and works great for a drop down as you can see below

The stored procedure for formatting this is shown below:

CREATE procedure [Reports].[SelectCurrentPeriod]
AS
SELECT     TOP (100) PERCENT PeriodId,
CAST(StartDate AS VARCHAR(10)) + ‘ To ‘
+ CAST(EndDate AS VARCHAR(10))
+ ‘ (‘
+ CAST(MonthsInPeriod AS VARCHAR(2))
+ ‘ Months)’ as PeriodDesc
FROM         dbo.Period
WHERE     (EndDate <
                          (SELECT     MAX(LoadDate) AS Expr1
                            FROM          dbo.LoadHistory))

There you have it, a period table and some supporting procedures. For our next tip, we’ll look at how the period table actually comes into play when linking up with the data.

Posted in Uncategorized | Leave a comment

Windows Home Server and Backups for the Paranoid

Posted on January 4, 2009 by Bob L

I recently discovered to my chagrin that Windows Home Server is not exactly a secure backup platform. Let me explain before I get in too much trouble…

On the plus side, The latest versions of WHS connectors can actually backup BitLocker and EFS volumes and it works with Vista 64 bit, and actually Windows Server 2008 (64 bit to boot) (though it’s not documented). I’m finding out that just about anything that works with Vista 64 bit, in the way of not just application software, but drivers as well, works with W2K8. Also, WHS is running on a W2K3 server which means you can have strong passwords, etc.

But, the plus side turns out to be the negative side if somebody happens to steal your WHS. The problem is that all that nice and secure information that was encrypted can now be restored without encryption, you just need to get access to the drive. But, alas, “I used a strong password that nobody would guess”. Once again, there’s a problem – WHS provides an administrative recovery console procedure if you happen to forget the password – THAT DOES NOT REQUIRE A PASSWORD – When you use that feature, you can now get your WHS that you forgot the password for, up and running again, and access all that useful backup information. Voila, your household thieve just graduated to become an identity thief and has information to all that “secure” bit-locked or EFS data.

You may be thinking, why not just run BitLocker on the WHS, that won’t work, it’s W2K3. What about running EFS on the protected volume. I don’t think that will work, these are raw volumes, and I don’t see anything to indicate that this would be supported. You’re not even supposed to be logging into that WHS even though remote desktop is available for it.

So, for those of us who don’t just worry about ours tuff being stolen, but what might happen to it afterwards (I live in California, after all), but we’d also like to backup our laptops and benefit from the nice automation features of WHS, what is one to do?

Alas, Virtualization to the rescue… The idea is that you secure your WHS inside of an environment that can’t be compromised. In my case, that is my Windows Server 2008 running on a Dell Precision workstation with BitLocker turned on. So, if somebody swipes my Precision (which is unlikely since they probably can’t lift it and get it out of the house for several minutes with our home alarm system blaring all the time…), then they have to crack that O/S before they can even see the WHS Virtual disks. Installing WHS on a virtual is a breeze, BTW, worked right out of the box. So, now I backup my Microsoft laptop as well as my Windows 2008 Server (yes, you can backup the server that actually contains the virtual WHS including the virtual machines to the WHS). I then backup the WHS virtual machine to a removable disk which can be put in the safety deposit box.

So, what do do with my other WHS, the “real” one that is actually running in a box. I still use that to do backups of family computers that don’t contain a lot of confidential information and use the file server features. There actually is no problem having more than 1 home server on the same network, just run the connector software install over on the machines that need to be changed over to a different WHS.

To make this a bit more clear, here is a diagram of my backup configuration:

So, how did I get to be so paranoid, don’t ask…

Technorati Tags: Windows Home Server,Secure Backup,Windows Server 2008

Posted in Uncategorized | Leave a comment

Back at it again (No, I’m not dead, not yet anyway)…

Posted on January 4, 2009 by Bob L

Can’t believe it has been 6 months since I posted anything. No, I haven’t been dead; No, it isn’t that I haven’t been doing anything worthwhile to blog about. Once again tyranny of the urgent has robbed from the important. I really feel that blogging is important, but just like exercise and other positive habits, it takes discipline and persistence to maintain it.

So, I’m adding to my New Years Resolution to blog daily, along with resuming working out daily (maybe I can figure out a way to do them both at the same time :-). Anyway, as far as writing, I have always been a perfectionist, not wanting to post anything until all of the bugs are out and I’ve proof-read several times. But, that becomes paralyzing to maintaining a blog and doesn’t provide value to the community. I’m going to change over to just do short posts and talk about what I do and not worry about not having all the related code cleaned up and working.

So, what have I been up to in the past 6 months. Here’s a list of what I’ll try to catch up on:

1) Web parts – We did a Collaboration/SOA proof-of-concept project using SharePoint Portal and I got to do a few web parts, a couple of interesting aspects from those that I’ll share.

2) BizTalk/SharePoint integration – This was also a byproduct of the SharePoint Portal project as we included some BizTalk integration including a custom pipeline.

3) Updates to the Generator for Data and Business Objects. This was a tool that I developed over a year ago that was recently updated to improve the UI. Although some of what the tool does is addressed by Linq. There are some components of this that have relevance even in a LInq environment and the principles/approach involved in automating application development are relevant.

4) SCCM (System Center Configuration Manager) Tools: Have developed some custom interfaces to assist with rollout and provide better efficiency to help desk personnel in order to perform specific tasks tied to large-scale operations using SCCM.

5) Radius: Write (actually re-wrote) a Custom Win32 DLL as an extension to Internet Authentication Services (IAS). This was written in C++. I’d like to share my experience with this in that I tried to do some of this in managed code using Interop and ran into some insurmountable challenges calling from the unmanaged code.

6) TFS: Workshops around best practices for SSIS and using Team Build with SSIS. This is still in process.

7) PhD Work: Almost done with my Preliminary proposal and would like to share my approach. The emphasis is on the value of automated software development and techniques based on relational metadata.

So, stay tuned, little by little I hopefully can start sharing the things that I’ve learned that hopefully have some value for the community.

Posted in Uncategorized | Leave a comment

SQL Server Data Model Monitor Series

Posted on June 21, 2008 by Bob L

Place holder for Data Model Monitor

Posted in Uncategorized | Leave a comment

SQL Server 2005 High-Availability Options (Distilled)

Posted on June 21, 2008 by Bob L

Have you ever been asked by a high-level architect or CIO what Microsoft’s approach is to a given scenario? If you have, then you know the challenges faced because as you read through MSDN and TechNet and even at the Microsoft.com site level, you find a lot of material which is very large and complex. Trying to distill things, particularly multiple things into a short executive-type summary is difficult. It’s not to say that the information isn’t there, it’s just that there is so much of it and it is scattered around.

I was asked earlier this week to provide Microsoft’s options for SQL Server high-availability. Below is what I cobbled together from MSDN – there’s not much original material in here, but hopefully you’ll find this helpful, if you need to provide the same type of information in a distilled format. I wrote this for SQL 2005. I confess I haven’t gotten up to speed enough on SQL 2008 to know if there are any significant modifications needed, but I suspect that most of this should be application for 2008 as well.

Microsoft SQL Server High Availability Architecture

Microsoft provides an architectural framework to support a high availability (HA) environment known as the Windows Server System Reference Architecture. It is a framework of best practices that leverage the high availability technologies built into the Microsoft line of products including SQL Server as well as Microsoft .NET Application Servers. It encompasses the following products and technologies:

Database Engine
- SQL Server Database – Failover Clustering, Mirroring, Log Shipping, Replication
- SQL Server Analysis Services – Failover Clustering
- SQL Server Notification Services – Failover Clustering

.NET Application Servers – Network Load Balancing Farm, Application Pooling approach with High-Availability Back-end SQL Database.
- SQL Server Reporting Services
- Microsoft .NET Framework Web Services (IIS/Component Enterprise Services)
- Microsoft Office SharePoint Server (MOSS) and Windows SharePoint Services
- BizTalk

There are several options available with SQL Server in regards to high availability. The relevant technologies are:

SQL Server Failover Clustering

Failover clustering provides high-availability support for an entire instance of SQL Server. A failover cluster is a combination of one or more nodes, or servers, with two or more shared disks. Applications such as SQL Server and Notification Services are each installed into a Microsoft Cluster Service (MSCS) cluster group, known as a resource group. At any time, each resource group is owned by only one node in the cluster. The application service has a virtual name that is independent of the node names, and is referred to as the failover cluster instance name. An application can connect to the failover cluster instance by referencing the failover cluster instance name. The application does not have to know which node hosts the failover cluster instance.
A SQL Server failover cluster instance appears on the network as a single computer, but has functionality that provides failover from one node to another if the current node becomes unavailable. For example, during a non-disk hardware failure, operating system failure, or planned operating system upgrade, you can configure an instance of SQL Server on one node of a failover cluster to fail over to any other node in the disk group.
A failover cluster protects against server failure, but does not protect against disk failure. Use failover clustering to reduce system downtime and provide higher application availability.
Failover clustering does not protect against a geographic disaster and is intended for high availability within a data center. SQL Mirroring, SQL Replication or Log Shipping can be used to augment Failover Clustering for geographic fault tolerance.

Figure 1 – SQL Clustering

SQL Server Database Mirroring

Database mirroring is primarily a software solution to increase database availability by supporting almost instantaneous failover. Database mirroring can be used to maintain a single standby database, or mirror database, for a corresponding production database that is referred to as the principal database. Database mirroring provides geographic failover, since the mirrored database may be in an entirely different geographic location.
The mirror database is created by restoring a database backup of the principal database with no recovery. This makes the mirror database is inaccessible to clients. However, it is possible to use it indirectly for reporting purposes by creating a database snapshot on the mirror database. The database snapshot provides clients with read-only access to the data in the database as it existed when the snapshot was created.
Each database mirroring configuration involves a principal server that contains the principal database, and a mirror server that contains the mirror database. The mirror server continuously brings the mirror database up to date with the principal database.
Database mirroring runs with either synchronous operation in high-safety mode, or asynchronous operation in high-performance mode. In high-performance mode, the transactions commit without waiting for the mirror server to write the log to disk, which maximizes performance. In high-safety mode, a committed transaction is committed on both partners, but at the risk of increased transaction latency.
In its simplest configuration, database mirroring involves only the principal and mirror servers. In this configuration, if the principal server is lost, the mirror server can be used as a warm standby server, with possible data loss. High-safety mode supports an alternative configuration, high-safety mode with automatic failover. This configuration involves a third server instance, known as a witness, which enables the mirror server to act as a hot standby server. Failover from the principal database to the mirror database typically takes a few seconds.
Database Mirroring can be integrated with Failover cluster to provide a geographic failover layer on top of a local failover approach. Typically, when mirroring is used with clustering, the principal server and mirror server both reside on clusters, with the principal server running on the failover clustered instance of one cluster and the mirror server running on the failover clustered instance of a different cluster. If a cluster failover makes a principal server temporarily unavailable, client connections are disconnected from the database. After the cluster failover completes, clients can reconnect to the principal server on the same cluster, or on a different cluster or an unclustered computer, depending on the operating mode

Figure 2 – SQL Mirroring

Figure 3 – SQL Mirroring integrated with Clustering

SQL Server Mirroring/Clustering Integration Solution

SQL Server Log Shipping

Like database mirroring, log shipping operates at the database level. Log shipping can be used to maintain one or more warm standby databases, referred to as secondary databases, for a corresponding production database that is referred to as the primary database. Each secondary database is created by restoring a database backup of the primary database with no recovery, or with standby. Restoring with standby permits the resulting secondary database to be used for limited reporting purposes.
A log shipping configuration includes a single primary server that contains the primary database, one or more secondary servers that each have a secondary database, and a monitor server. Each secondary server updates its secondary database at regular intervals from log backups of the primary database. Log shipping involves a user-modifiable delay between when the primary server creates a log backup of the primary database and when the secondary server restores the log backup. Before a failover can occur, a secondary database must be brought fully up-to-date by manually applying any unrestored log backups.
Log shipping provides the flexibility of supporting multiple standby databases. If you require multiple standby databases, you can use log shipping alone or as a supplement to database mirroring. When these solutions are used together, the current principal database of the database mirroring configuration is also the current primary database of the log shipping configuration.

Figure 4 – SQL Log Shipping

SQL Server Replication

Replication uses a publish-subscribe model, allowing a primary server, referred to as the Publisher, to distribute data to one or more secondary servers, or Subscribers. Replication allows real-time availability and scalability across these servers. It supports filtering to provide a subset of data at Subscribers, and also allows partitioned updates. Subscribers are online and available for reporting or other functions, without query recovery. SQL Server offers three types of replication: snapshot, transactional, and merge. Transactional replication provides the lowest latency and is most commonly used for high availability.

Technorati Tags: SQL Server,High Availabilty,Database Mirroring,Clustering,Log Shipping,Replication

Figure 5 – SQL Server Transactional Replication

References

· http://www.microsoft.com/windowsserver2003/wssra/default.mspx – Microsoft Windows Server System Reference Architecture

· http://msdn.microsoft.com/en-us/library/ms190202.aspx – SQL Server 2005 Books Online: Configuring High Availability

Posted in Uncategorized | Leave a comment

Creating the Infrastructure for Capturing a Data Model

Posted on June 13, 2008 by Bob L

At this point, all I’ve given you is a set of tables that can store metadata about database tables and their relationships – they pretty much reflect a subset of what you can get from the system views. Let’s take a look at the process for capturing this information as the database is modified. The following are the major steps:

Create Entity Tracker database
Create change log table, supporting tables to capture the data model snapshot, and a view to help later with being able to dynamically construct a graphical data model diagram.
Create supporting stored procedure and user-defined function to be invoked by DDL Trigger to create the database snapshot
Create the DDL Trigger to execute the stored procedure whenever a change occurs to any entities affecting the database.
Implement the DDL trigger and stored procedure into the database to monitor. Note that the actual tables containing the changes are centralized into tables in the Entity Tracker database.

After these steps are done any change to any table or relationship will result in a new version of the information being stored in the Entity Tracker database. Over time, this will allow us to monitor how the data model has changed for a particular database.

Attached is a zip that contains the SQL needed to create the Entity Tracker database tables, UpdateSchemaRelations stored procedure, and UpdateSchemaVersion DDL trigger. Let’s take a look at each step and some of the code.

1) Create the EntityTracker database – That’s actually not in the zip (CREATE DATABASE ENTITYTRACKER is all you need or use SSMS). You might want to call it something else, that’s not a problem, but you’ll need to modify the UpdateSchemaRelations stored procedure in that case to match. The reason is that this stored procedure is actually implemented into the target database in addition to the DDL trigger. The reason is that the stored proc references system tables from the database being monitored and I haven’t figured out a way to get the stored proc to be able to do that aside from putting it into the target DB (if somebody else knows a way, please comment).

2) Next, the change log and supporting tables (the ones from the diagram in the introductory post) need to be created along with the view to facilitate querying relationships. The script EntityTracker.Tables.sql and EntityTracker.Views.sql in the EntityTrackerCore.zip handles this. Note, we only care about the view_SchemaTableRelations view at this point, you can disregard the other views for now.

3) Create supporting stored procedure (UpdateSchemaRelations) for DDL Trigger to invoke and supporting user-defined function. These files are named UpdateSchemaRelations.Procedure.sql and the function is found in EntityTracker.Functions.sql (We are only interested in the dbo.udf_GetTableMaxDepth function in the EntityTracker.Functions.zip). This is the core logic that actually makes the snapshot of the data. Where possible, the procedure uses standard Information Schema views with the exception of the relationships snapshot (had to use Sys.Indexes for retrieving key info due to problem with unique index not included as table constraints when the primary columns are already defined as a separate primary key).

Before getting buried in the code, let’s look at the high-level logic:

Insert list of tables into SchemaTable table
Insert list of referential constraints into TableRelation table
Insert Relation column pairs associated with referential constraints into RelationColumnPair table
Calculate the relationship for each depth using a custom TSQL recursive function that traverses the relationships to find the deepest level for each table. One of the challenges for this process is that a RDBMS allows self-joined tables and circular relationships so the function must check for these conditions to avoid infinite recursion. Note that this function relies on the dbo.udf_GetTableMaxDepth function which relies on the view_SchemaTableRelations.

For the next posts, we’ll drill down into the specific .SQL code and finally we will look at the user interface in more detail. I’ll also make a live demo available that you can use to get an appreciation of the user experience. After that, we’ll go back down into the code used to support the UI and discuss extending this technique for other scenarios.

Posted in Uncategorized | Leave a comment

	Bob L on Moving Databases from One Inst…
	Bob L on History may not repeat, b…
	Michael Abrams on History may not repeat, b…
	Mark Polczynski on Moving Databases from One Inst…
	emilie parker on Now at 5.11 GB/s SQL backup to…

Insights on Authentic Intelligence – Bob Leithiser, Ph.D.

Auto-Generating Pivot Views

Generating a List of Calendar Dates

Dumping SQL data in pivoted format

SQL Data Warehousing – Tip 2 – Indexed Views

SQL Data Warehousing Tips 1 – Building a Period table

Windows Home Server and Backups for the Paranoid

Back at it again (No, I’m not dead, not yet anyway)…

SQL Server Data Model Monitor Series

SQL Server 2005 High-Availability Options (Distilled)

Microsoft SQL Server High Availability Architecture

SQL Server Failover Clustering

SQL Server Database Mirroring

SQL Server Mirroring/Clustering Integration Solution

SQL Server Log Shipping

SQL Server Replication

References

Creating the Infrastructure for Capturing a Data Model

Recent Posts

Recent Comments

Archives

Categories

Meta