Using INSTEAD OF Triggers to enable updatable views

A common misconception with SQL Server views is that only indexed views can provide update capability. Actually, any view that is schema-bound can be enabled for update by simply using an “INSTEAD OF” trigger.  Although an indexed view may also be setup with a trigger, it is not required.  The view does not need to obey the constraints of an indexed view other than being schema-bound since the trigger writes to the base tables.

There are a few scenarios where this is useful. One is in the case of tables that utilize a master-secondary design pattern.  In many scenarios, there may be central objects that manifest themselves in different forms requiring different attributes. Often the central objects need to be accessed singularly for their core attributes so splitting up the objects into multiple tables makes running queries that need objects of both types together awkward and forces the use of UNION queries. The other complexity involved with separating objects based on type into separate tables is that they often share relationships to common child objects. To support the different types of interfaces for different object types, multiple views can be configured.

One example of this is a name and address system I worked on a long time ago related for a non-profit organization. It turns out that donors may be organizations or individuals or married couples. Different attributes are associated with each. To further complicate matters, a donor could be a member – that is a member could donate to a donor. The member might also be married and the organization need to track the household unit associated with married members.

The solution was to create a generic entity object and then satellite objects for individual, household, or organization. This allowed the name to be linked to the outside systems without having to maintain multiple relationships and then allow the relationships between the entities to be maintained in another table. In this example an organization could donate as well as an individual which would show up on a donation report, whereas the organization attributes were not muddled up with individual attributes.

Below is the database diagram representing the common entity approach for name/address. Note that in addition to the master-secondary relationships, there are relationships between the objects such as the organizational contact for an organization entity or the spouse for a household entity. From the diagram, it is not clear as to the direction of the 1-to-1 relationships, but it is is from Entity – that is a row on Org, Household, or Person must first have a row in Entity with the Entity Id.

image

In order to support interfaces that need to insert/update different types of entities, views can be created to support the specific type of entity. The view is coupled with an INSTEAD OF trigger so that adding data to the view will execute the insert tables. Only the main Entity table uses the identity property. The other tables primary keys are all based on the entity table key. The below triggers provide a couple of interfaces to adding new entities. The first inserts a singular person and generate both the entity and the person table.

The second trigger could add a new organization with the assumption that the contact person was already added as a person entity.

The third trigger shows how cascading can be used for even more elaborate scenarios including inserting into other views. This example provides the columns for the contact person entity as well as the organization entity rather than the person entity key and inserted into the person table as well as the organization and main entity tables.  It leverages the prior two triggers by inserting directly into the views.

CREATE VIEW v_PersonEntity WITH SCHEMABINDING
AS SELECT FirstName, LastName, Suffix, WorkPhone, CellPhone, HomePhone, PrimaryPhone
FROM dbo.Person p
GO

CREATE VIEW v_OrgEntity WITH SCHEMABINDING
AS SELECT ContactPersonId, FederalId, EntityName
FROM dbo.Org o
INNER JOIN dbo.Entity e
    ON o.EntityId = e.EntityId
GO

CREATE VIEW v_OrgEntity_WithContact WITH SCHEMABINDING
AS SELECT FederalId, EntityName, FirstName, LastName, WorkPhone
FROM dbo.Org o
INNER JOIN dbo.Entity e
    ON o.EntityId = e.EntityId
INNER JOIN dbo.Person p
    ON p.EntityId = o.ContactPersonId
GO

CREATE TRIGGER v_Person_ITrig ON dbo.v_PersonEntity
   INSTEAD OF INSERT
AS BEGIN
    IF (SELECT COUNT(*) FROM inserted) > 1 BEGIN
        RAISERROR (‘View only supports adding 1 person at a time’,16,1)
    END
    DECLARE @EntityID INT
    — Create the main entity row
    INSERT INTO dbo.Entity (EntityName, EntityType, MainPhoneNumber)
    SELECT
        i.FirstName + ‘ ‘ + i.LastName, ‘P’
        ,CASE
            WHEN PrimaryPhone = ‘H’ THEN HomePhone
            WHEN PrimaryPhone = ‘C’ THEN CellPhone
            WHEN PrimaryPhone = ‘W’ THEN WorkPhone
        END
    FROM inserted i

    — Create the secondary person row
    SELECT @EntityId = SCOPE_IDENTITY()
    INSERT into dbo.Person (EntityId, FirstName, LastName, Suffix,
    WorkPhone, CellPhone, HomePhone, PrimaryPhone)
    SELECT @EntityId, FirstName, LastName, Suffix,
    WorkPhone, CellPhone, HomePhone, PrimaryPhone
    FROM inserted
END
GO

CREATE TRIGGER v_OrgEntity_ITrig on dbo.v_OrgEntity
   INSTEAD OF INSERT
AS BEGIN
    IF (SELECT COUNT(*) FROM inserted) > 1 BEGIN
        RAISERROR (‘View only supports adding 1 person at a time’,16,1)
    END
    DECLARE @EntityID INT
   
    — Create the main entity row
    INSERT INTO dbo.Entity (EntityName, EntityType)
    SELECT EntityName, ‘O’
    FROM inserted i

    — Create secondary org row
    SELECT @EntityId = SCOPE_IDENTITY()
    INSERT into dbo.Org
    (EntityId, ContactPersonId, FederalId)
    SELECT @EntityId, ContactPersonId, FederalId
    FROM inserted

END
GO

CREATE TRIGGER v_OrgEntity_WithContact_ITrig on dbo.v_OrgEntity_WithContact
   INSTEAD OF INSERT
AS BEGIN
    IF (SELECT COUNT(*) FROM inserted) > 1 BEGIN
        RAISERROR (‘View only supports adding 1 person at a time’,16,1)
    END
    DECLARE @EntityID INT
   
    — Could use the view to create the person and entity rows, but
    — then there is a challenge with getting the identity inserted
    — since SCOPE_IDENTITY is nulled out after the last add to the
    — person table
    INSERT INTO dbo.Entity (EntityName, EntityType, MainPhoneNumber)
    SELECT
        i.FirstName + ‘ ‘ + i.LastName, ‘P’, WorkPhone
    FROM inserted i

    — Create the secondary person row
    SELECT @EntityId = SCOPE_IDENTITY()
    INSERT into dbo.Person (EntityId, FirstName, LastName,
    WorkPhone, PrimaryPhone)
    SELECT @EntityId, FirstName, LastName,
    WorkPhone, ‘W’
    FROM inserted

    — Create the organization and entity row using the org view
    INSERT into dbo.v_OrgEntity
    (ContactPersonId, FederalId, EntityName)
    SELECT @EntityId, FederalId, EntityName
    FROM inserted
END
GO

Below shows how all tables are affected by insert to the single Org with contact view:

image

These triggers illustrate the use of insert, however update and delete actions may also be utilized in INSTEAD OF triggers. Below is an example that does both an insert and update for one of the tables that I utilize in my universal problem resolution framework utilized in my dissertation. Also, in the prior example, the triggers only work for 1 row insertion at a time because of the need to capture the identity associated with the new entity due to the fact that triggers operate against the entire row set rather than individually on each row. In the below example, a temporary table and a pre-defined sequencing approach is utilized to support multiple simultaneous insertions. The coalesce for the update causes omitted items to remain as they were before.

ALTER TRIGGER [Data].[view_Problem_Item_ITrig]
   ON  [Data].[view_Problem_Item]
   INSTEAD OF INSERT
AS
BEGIN
    — SET NOCOUNT ON added to prevent extra result sets from
    — interfering with SELECT statements.
    SET NOCOUNT ON;
    BEGIN TRY
        BEGIN TRANSACTION
        –SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
        DECLARE @EntityTypeId MONEY = Data.udf_GetItemTypeId(‘Data.Entity’)
        DECLARE @NextEntityId MONEY — = Data.udf_GetNextItemId(@EntityTypeId)

        DECLARE @ProblemTypeId MONEY = Data.udf_GetItemTypeId(‘Data.Problem’)
        DECLARE @NextProblemId MONEY — = Data.udf_GetNextItemId(@ProblemTypeId)
        SELECT @NextProblemId = MAX(ItemId) FROM Data.Item  WITH (HOLDLOCK, UPDLOCK)

        SET XACT_ABORT ON

        SELECT  
            (CONVERT(MONEY,ROW_NUMBER() OVER (ORDER BY ItemName)) / 10000.00) — – .0001
             + @NextProblemId AS ItemId,
            ActivationTaskDetailId,
            DefineActionTreeId,
            SolveActionTreeId,
            ProblemDefinitionState,
            ProblemSolutionState,
            CurrentStepNumber,
            ItemName,
            TypeId,
            VersionNumber,
            IsSystemItem,
            ItemPath,
            AutoGenerated,
            RootProblemId,
            BranchProblemId,
            IsRoot,
            ActivationValue,
            ProblemCheckSum,
            BranchStepNumber,
            OutcomeReason
        INTO #Problem FROM inserted
        WHERE (ItemId IS NULL OR ItemId = 0)

        INSERT Data.Item  WITH (ROWLOCK)
            (
            ItemId,
            ItemName,
            TypeId,
            VersionNumber,
            IsSystemItem,
            ItemPath,
            AutoGenerated,
            RootProblemId
            )
        SELECT
            ItemId,
            ItemName,
            COALESCE(TypeId,@ProblemTypeId),
            COALESCE(VersionNumber,1.0),
            COALESCE(IsSystemItem, 0),
            COALESCE(ItemPath, ItemName),
            COALESCE(AutoGenerated,0),
            RootProblemId
        FROM #Problem

        INSERT Data.Problem  WITH (ROWLOCK)
        (
            ProblemId,
            ActivationTaskDetailId,
            DefineActionTreeId,
            SolveActionTreeId,
            ProblemDefinitionState,
            ProblemSolutionState,
            CurrentStepNumber,
            BranchProblemId,
            IsRoot,
            ActivationValue,
            ProblemCheckSum,
            BranchStepNumber,
            OutcomeReason
        )
        SELECT
            ItemId,
            ActivationTaskDetailId,
            DefineActionTreeId,
            SolveActionTreeId,
            ProblemDefinitionState,
            ProblemSolutionState,
            COALESCE(CurrentStepNumber,0.0),
            BranchProblemId,
            COALESCE(IsRoot,0),
            ActivationValue,
            ProblemCheckSum,
            BranchStepNumber,
            OutcomeReason
        FROM #Problem

        UPDATE item  WITH (ROWLOCK)
        SET ItemName = COALESCE(i.ItemName, item.ItemName),
            TypeId = COALESCE(i.TypeId, item.TypeId),
            VersionNumber = COALESCE(i.VersionNumber,item.VersionNumber),
            IsSystemItem = COALESCE(i.IsSystemItem,item.IsSystemItem),
            ItemPath = COALESCE(i.ItemPath,item.ItemName),
            AutoGenerated = COALESCE(i.AutoGenerated, item.AutoGenerated),
            RootProblemId = COALESCE(i.RootProblemId, item.RootProblemId)
        FROM inserted i
        INNER JOIN Data.Item item 
            ON i.ItemId = item.ItemId 
        WHERE (i.ItemId IS NOT NULL AND i.ItemId <> 0)

        UPDATE p WITH (ROWLOCK)
        SET ActivationTaskDetailId = COALESCE(i.ActivationTaskDetailId, P.ActivationTaskDetailId),
            DefineActionTreeId = COALESCE(i.DefineActionTreeId, p.DefineActionTreeId),
            SolveActionTreeId = COALESCE(i.SolveActionTreeId, p.SolveActionTreeId),
            ProblemDefinitionState = COALESCE(i.ProblemDefinitionState, p.ProblemDefinitionState),
            ProblemSolutionState = COALESCE(i.ProblemSolutionState, p.ProblemSolutionState),
            CurrentStepNumber = COALESCE(i.CurrentStepNumber, p.CurrentStepNumber),
            BranchProblemId = COALESCE(i.BranchProblemId, p.BranchProblemId),
            ActivationValue = COALESCE(i.ActivationValue, p.ActivationValue),
            ProblemCheckSum = COALESCE(i.ProblemCheckSum, p.ProblemCheckSum),
            OutcomeReason = COALESCE(i.OutcomeReason, p.OutcomeReason),
            Complexity = COALESCE(
        FROM inserted i
        INNER JOIN Data.Problem p
            ON i.ItemId = p.ProblemId 
        WHERE (i.ItemId IS NOT NULL AND i.ItemId <> 0)

        DROP TABLE #Problem
       
        COMMIT
    END TRY
    BEGIN CATCH
        IF XACT_STATE() <> 0 ROLLBACK
        EXEC dbo.usp_RethrowError
    END CATCH

END

Posted in Uncategorized | Leave a comment

Solving Sudoku with T-SQL

Note: This is an update to the earlier post to include the query to actually display the solved puzzle (for the skeptics… Smile)

Recently I had to make a short trip and took a look at a Sudoku puzzle in the airline magazine. In the past, I’ve posted some about automating solving problems and just finished a PhD in this area that outlined an automated technique for algorithm discovery through analyzing solution sequences from simulations.

What intrigued me about the Sudoku scenario is that it lends itself to a Cartesian product solution approach – i.e. rows and columns and groupings must all contain specific values. Based on that, it should be solvable with relational algebra. I modeled the data structure on the plane to see how hard it would be to arrive to a solution using T-SQL for a puzzle. Before spending too much time on this, I searched the Internet and found that somebody else had already modeled and solved this. Samuel Aina actually undertook this scenario back in 2010 – the link to his approach is at http://www.developerfusion.com/article/84374/solving-sudoku-with-sql/.

In keeping with my generic problem model approach, I still thought it would be worthwhile to do this via a less procedural approach than that of Aina. My approach for generic problem solving relies on a schema that defines the start, transition, and final states for any problem solving scenario. In keeping with that approach, I decided to try to solve the problem based on a view that would present to a solver procedure to enumerate through the steps.

For testing purposes, I went to http://www.websudoku.com/ and pulled in the first puzzle that came up shown below:

image

The schema and execution script is at the end of this post. Below are the general steps for accomplishing the solution.

1) Define a table for the Sudoku cell with the row, column, and section row/column. I also added an absolute row/column/section number to simplify the queries to allow the sums to go across the sections (section being the 3 x 3 areas that need to have all number from 1 to 9) and provide the values associated in a row or column across the whole puzzle.  I also created a table to record the solution steps and a trigger on the cell table to write to this so that the solution generated by a solver procedure can be replayed and verified.

2) Create a generic numbers table with values 1 to 9. This is used to generate possible values as well as populate the initial rows in the Soduku cell table.

3) Populate the Cell table – I did a stored procedure that receives the absolute row, column, and value and updates the appropriate row in the table.

4) Create the views that return the values not yet used in either a row, column, or section

5) Create a view that returns all possible moves by joining the values via absolute row and column across the possible value views from row, column, section. This provides all the legal moves

6) Create a view that returns the required next move – This uses a having clause to look for the one distinct row, column that has only one valid move.  If there is not a possible path with only one row, column multiple instances are required to pursue possible solution paths. This adds some complexity, but not a huge amount.  This issue is explained further down in the post (Note).

7) Create a stored procedure that looks for the row/column with only one solution path and update the value. This results in further solutions to materialize for the next step – as each number is filled in the puzzle, it leads to other row/columns that can be filled in based on the result.

8) Execute the stored procedure to solve the puzzle – this results in the moves being recorded into a solver table for verification purposes using the trigger outlined in step 1.

Note: In my test case I was fortunate that there was at least one cell meeting the criteria of having only one possible solution as I was using an easy puzzle. This simplifies the solution so that I did not need to try multiple paths. For a more complex puzzle, I would have had to generate multiple instances for whatever row/column had the least number of paths. This is the design pattern of my generic simulator that I utilized to solve Tower of Hanoi where there were multiple solution paths. Aina’s solution does handle this scenario whereas mine does not yet.

Here are the detailed steps in SQL:

1) Create tables and trigger (for more complex scenario, an instance id would be needed to generate multiple solution paths for the same game configuration)

CREATE TABLE [Sudoku].[Cell](
    [CellId] [int] IDENTITY(1,1) NOT NULL,
    [GameId] [int] NOT NULL,
    [Row] [int] NOT NULL,
    [Col] [int] NOT NULL,
    [Val] [int] NULL,
    [SectionRow] [int] NOT NULL,
    [SectionCol] [int] NOT NULL,
    [AbsoluteRow]  AS ([Row]+([SectionRow]-(1))*(3)),
    [AbsoluteCol]  AS ([Col]+([SectionCol]-(1))*(3)),
    [SectionNumber]  AS (([SectionRow]-(1))*(3)+[SectionCol]),
    [MoveNumber] [int] NULL,
CONSTRAINT [PK__Cell__EA424A08D48B6731] PRIMARY KEY CLUSTERED
(
    [CellId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [Sudoku].[Cell] ADD  CONSTRAINT [DF__Cell__GameId__24927208]  DEFAULT ((0)) FOR [GameId]
GO

ALTER TABLE [Sudoku].[Cell] ADD  CONSTRAINT [DF_Cell_MoveNumber]  DEFAULT ((0)) FOR [MoveNumber]
GO

CREATE TABLE [Sudoku].[SolveLog](
    [StepId] [int] IDENTITY(1,1) NOT NULL,
    [CellId] [int] NOT NULL,
    [MoveDateTime] [datetime] NULL,
    [Val] [int] NOT NULL,
    [MoveNumber] [int] NOT NULL,
CONSTRAINT [PK__SolveLog__2434335717D3913E] PRIMARY KEY CLUSTERED
(
    [StepId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TRIGGER [Sudoku].[Cell_Utrig]
   ON  [Sudoku].[Cell]
   AFTER UPDATE
AS
BEGIN
    — SET NOCOUNT ON added to prevent extra result sets from
    — interfering with SELECT statements.
    SET NOCOUNT ON;
    INSERT [Sudoku].[SolveLog]
    (CellId, Val, MoveNumber)
    SELECT CellId, Val, MoveNumber
    FROM inserted
    — Insert statements for trigger here
END

GO

2) Create generic number table and populate it

CREATE TABLE [dbo].[Numbers](
    [Number] [int] NULL
) ON [PRIMARY]

GO

insert dbo.Numbers select 1
insert dbo.Numbers select 2
insert dbo.Numbers select 3
insert dbo.Numbers select 4
insert dbo.Numbers select 5
insert dbo.Numbers select 6
insert dbo.Numbers select 7
insert dbo.Numbers select 8
insert dbo.Numbers select 9

3) Insert rows for a game for all rows/columns and create stored procedure to populate puzzle data and execute

insert into Sudoku.cell
(GameId, Row, Col, SectionRow, SectionCol)

select 1, r.Number, c.Number, gc.Number, gr.Number
from dbo.Numbers r
cross join dbo.Numbers c
cross join dbo.Numbers gr
cross join dbo.Numbers gc
where r.Number <= 3
and c.Number <= 3
and gr.Number <= 3
and gc.Number <= 3

CREATE PROCEDURE [Sudoku].[UpdateCell_ByAbsolutePosition]
    @GameId int,
    @AbsRow int,
    @AbsCol int,
    @Val int,
    @MoveNumber int = 0
AS BEGIN
    UPDATE  c
        SET Val = @Val,
            MoveNumber = @MoveNumber
    FROM [Sudoku].[Cell] c
    WHERE c.AbsoluteRow = @AbsRow
    AND c.AbsoluteCol = @AbsCol
END

GO

EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 1, @AbsCol = 6, @Val = 8
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 1, @AbsCol = 7, @Val = 4
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 1, @AbsCol = 8, @Val = 7
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 2, @AbsCol = 1, @Val = 9
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 2, @AbsCol = 3, @Val = 8
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 2, @AbsCol = 4, @Val = 4
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 2, @AbsCol = 5, @Val = 5
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 3, @AbsCol = 1, @Val = 6
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 3, @AbsCol = 5, @Val = 9
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 3, @AbsCol = 6, @Val = 1
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 3, @AbsCol = 7, @Val = 5
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 3, @AbsCol = 8, @Val = 3
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 4, @AbsCol = 1, @Val = 2
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 4, @AbsCol = 2, @Val = 8
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 4, @AbsCol = 7, @Val = 3
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 4, @AbsCol = 9, @Val = 4
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 5, @AbsCol = 3, @Val = 9
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 5, @AbsCol = 7, @Val = 7
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 6, @AbsCol = 1, @Val = 7
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 6, @AbsCol = 3, @Val = 5
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 6, @AbsCol = 8, @Val = 2
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 6, @AbsCol = 9, @Val = 6
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 7, @AbsCol = 2, @Val = 1
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 7, @AbsCol = 3, @Val = 4
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 7, @AbsCol = 4, @Val = 5
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 7, @AbsCol = 5, @Val = 6
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 7, @AbsCol = 9, @Val = 7
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 8, @AbsCol = 5, @Val = 4
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 8, @AbsCol = 6, @Val = 7
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 8, @AbsCol = 7, @Val = 2
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 8, @AbsCol = 9, @Val = 5
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 9, @AbsCol = 2, @Val = 2
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 9, @AbsCol = 3, @Val = 7
EXEC [Sudoku].[UpdateCell_ByAbsolutePosition] @GameId = 1, @AbsRow = 9, @AbsCol = 4, @Val = 1

4) Create the views that return the candidate values for row, column, and section

CREATE VIEW Sudoku.v_Candidate_RowVal
AS SELECT r.Number AS Row, v.Number as Val — select *
from dbo.Numbers r
INNER JOIN dbo.Numbers v
ON v.Number NOT IN (SELECT c2.Val from Sudoku.Cell c2 WHERE c2.AbsoluteRow = r.Number AND c2.Val IS NOT NULL)

CREATE VIEW Sudoku.v_Candidate_ColVal
AS SELECT r.Number AS Col, v.Number as Val — select *
from dbo.Numbers r
INNER JOIN dbo.Numbers v
ON v.Number NOT IN (SELECT c2.Val from Sudoku.Cell c2 WHERE c2.AbsoluteCol = r.Number AND c2.Val IS NOT NULL)

CREATE VIEW Sudoku.v_Candidate_SectionVal
AS SELECT r.Number AS Section, v.Number as Val — select *
from dbo.Numbers r
INNER JOIN dbo.Numbers v
ON v.Number NOT IN (SELECT c2.Val from Sudoku.Cell c2 WHERE c2.SectionNumber = r.Number AND c2.Val IS NOT NULL)

5) Possible move view

CREATE VIEW Sudoku.v_PossibleMove AS
SELECT c.AbsoluteRow, c.AbsoluteCol, r.Val
FROM Sudoku.Cell c
INNER JOIN Sudoku.v_Candidate_RowVal r on r.Row = c.AbsoluteRow
INNER JOIN Sudoku.v_Candidate_ColVal cv on cv.Col = c.AbsoluteCol and cv.Val = r.Val
INNER JOIN Sudoku.v_Candidate_SectionVal s on s.Section = c.SectionNumber and s.Val = cv.Val
WHERE c.Val IS NULL

6) Next definite view (For more complex games, would need to be modified to select the one with the smallest count rather than count = 1

CREATE VIEW Sudoku.v_NextMove AS
SELECT c.AbsoluteRow, c.AbsoluteCol, max(c.Val) AS Val FROM Sudoku.v_PossibleMove c
GROUP BY c.AbsoluteRow, c.AbsoluteCol having count(*) = 1

7) Stored procedure that solves the puzzle

CREATE PROCEDURE [Sudoku].[Solve] AS
BEGIN
    DECLARE @MoveNumber INT = 1
    WHILE EXISTS (SELECT 0 FROM Sudoku.Cell c WHERE c.val is null)
    BEGIN
        UPDATE c
            SET c.Val = m.Val,
                MoveNumber = @MoveNumber
        FROM Sudoku.Cell c
        INNER JOIN Sudoku.v_NextMove m
            ON m.AbsoluteRow = c.AbsoluteRow
            AND m.AbsoluteCol = c.AbsoluteCol
        WHERE c.Val IS NULL
        SET @MoveNumber = @MoveNumber + 1
    END
END
GO

8) Execute stored procedure and verify results – Note that multiple cells are solved simultaneously on some of the intermediate steps. Even though 47 cells need to be solved, the total number of steps required is actually only 13

EXEC Sudoku.Solve

(1 row(s) affected)

(2 row(s) affected)

(2 row(s) affected)

(2 row(s) affected)

(5 row(s) affected)

(3 row(s) affected)

(4 row(s) affected)

(8 row(s) affected)

(6 row(s) affected)

(6 row(s) affected)

(4 row(s) affected)

(3 row(s) affected)

(1 row(s) affected)

SELECT c.[Val]
      ,c.AbsoluteRow
      ,c.AbsoluteCol
      ,c.[MoveNumber]
  FROM [Games].[Sudoku].[SolveLog] s
  inner join Sudoku.cell c
    on c.CellId = s.CellId
  where s.MoveNumber > 0
  order by movenumber

Val

AbsoluteRow

AbsoluteCol

MoveNumber

2

3

3

1

8

3

9

2

7

3

4

2

1

5

9

3

4

3

2

3

2

2

9

4

3

6

2

4

9

1

9

5

6

5

2

5

4

5

1

5

7

2

2

5

5

1

2

5

3

9

9

6

9

8

2

6

1

4

3

6

9

9

6

7

8

9

5

7

7

4

5

7

3

1

3

7

6

9

7

8

3

8

4

8

4

6

6

8

1

6

5

8

2

1

5

8

5

9

1

8

6

8

3

8

1

1

1

8

4

9

8

9

1

2

7

9

2

7

6

9

3

5

5

9

6

1

4

9

8

8

1

9

1

8

8

10

6

2

8

10

5

5

6

10

9

4

4

10

3

2

6

10

3

7

1

10

8

5

8

11

5

4

8

11

8

6

4

11

6

4

6

11

9

7

8

12

9

6

7

12

2

5

4

12

8

7

7

13

This query outputs the solved puzzle results:

select [1],[2],[3],[4],[5],[6],[7],[8],[9]
from (select AbsoluteRow, AbsoluteCol, val from Sudoku.Cell) as s
pivot(Avg(Val) for AbsoluteCol in ([1],[2],[3],[4],[5],[6],[7],[8],[9]))
as pvt
order by AbsoluteRow

And here are the results:

1

5

3

6

2

8

4

7

9

9

7

8

4

5

3

1

6

2

6

4

2

7

9

1

5

3

8

2

8

1

9

7

6

3

5

4

4

6

9

2

3

5

7

8

1

7

3

5

8

1

4

9

2

6

3

1

4

5

6

2

8

9

7

8

9

6

3

4

7

2

1

5

5

2

7

1

8

9

6

4

3

Voila!

Posted in Problem Solving, SQL Server, SQL Tips and Techniques | Tagged , | 1 Comment

Creating a 3-way 10 Gbe Cluster without a switch– Part 1

After finishing the PhD, I’m back into the lab to test out some new high-speed computing experiments. Recently, I retrieved one of my servers from co-location and put it back into my home-office lab giving me 3 servers with Fusion-IO cards all in the same spot. I’m trying to move around some virtual machines and update the trading optimizer (CapGen) investing databases so thought it would be useful to get all of the servers talking on at least 10 Gb. The Fusion-io cards easily overwhelm the 1 Gb connection since even one of the duos provides 1.5 GB/s which is actually 9 Gb.

A few months back I managed to get 2 servers working using Mellanox Connectx cards on QFSP connections with 20 Gb/s, but that stopped working on me with a driver update (ConnectX not supported on Windows Server 2012 R2), so went to a better supported, although slower connection using 10 Gb/s. To do this, I got the Mellanox 3 EN cards for 2 of the servers and bought the add-on dual 10 Gb/e adapter for one of the HP DL 370 G6 servers. One advantage to using the HP add-on adapter is that it doesn’t require an additional slot although you do trade off 2 of the 1 Gb connectors.

This approach allows the maximum number of Fusion-io (HP IO Accelerator) cards in the server (current at 9 with 8 of them being duos) as shown below.

fio-cards

In this arrangement, each server has a dedicated high-speed connection to the other two servers via the dual interface as shown in the below table without the need for a switch.  Basically, it is just 3 cables connecting each server to the other two servers via the 6 total ports (2 on each server).

image

Server Source Port/ IP Address Destination Server Destination Port/ IP Address
Server 1: 1: 10.0.0.11 Server 2 1: 10.0.0.21
  2: 10.0.0.21 Server 3 1: 10.0.0.31
Server 2 1:10.0.0.21 Server 1 1: 10.0.0.11
  2:10.0.0.22 Server 3 2: 10.0.0.32
Server 3 1:10.0.0.31 Server 1 1:10.0.0.11
  2:10.0.0.32 Server 2 2:10.0.0.22

Below are pictures of the rear of a couple of the servers.

server2

server1b

One of the pain points with setting up a separate private network is that the adapters by default end up in the public class. Some articles have been written about how to fix this with a script, but for my testing I am taking the lazy way out and just turning off the windows firewall on the servers. After having done that, I can may drives directly over the high-speed link and verified ability to achieve the 10 Gb/s throughput by copying files using Fusion-io drives as the source and targets. I am now able to copy a 60GB file from one server to the other are between 40 – 80 GB, this should provide the ability to achieve a live migration of a VM in around a minute.

Now that the infrastructure is in place, I will start experimenting with clustering on the next post and will look into some other alternatives beyond just the Ethernet/ISCSI approach including RDMA. I will also do some experimentation with log shipping and always-on capability with SQL Server. I will also try out the live migration features in Hyper-V to test out the practicality for this on the 10 Gb backbone. Lastly, I will test out my idea for a distributed SQL Server database that sits on top of ISCSI on the high-speed network wherein the database instance is effectively scaled out beyond one instance to multiple servers via the other servers hosting the storage through file server roles.

I hope to finish some testing in the next couple of weeks in my spare time.

Posted in Uncategorized | Leave a comment

Generating a Type-2 Dimension from Multiple Tables with Effective Dates

In slowly changing (SCD) dimensions, type-2 attributes involves ending a dimension row when the attribute value changes with the current date and creating a new row starting from the current date. While this works great to capture history once a warehouse/cube is implemented, it does not address the situation of historical data if such tracking was already occurring. For most scenarios, the history can simply be loaded without the cube processing having the type-2 rules implemented to pre-populate the SCD start/end dates. However in some business such as health care, history may sometimes need to be “re-written”. For example, a claim may not get into the system until several days after it occurs and the claim would need to be effective for a point in time in history already loaded.

This challenge is highlighted in the below examples. A real-world example might be tracking what certifications were held by an individual at a given point in time. For example a software professional may have been an Oracle OCP from 6/1/2010 to 8/31/2010, and a PMP from 7/1 TO 12/31. To track the history of all certifications held by date ranges, 3 rows are needed:

– 6/1/2010 – 6/31/2010 : OCP

– 7/1/2010 – 8/31/2010  : OCP, PMP

– 9/1/2010 – 12/31/2011: PMP only

The spreadsheet illustrates the scenario where sets of data for a dimension row coming from 2 different tables have values effective for different historical periods – the dates associated with the 2 sets of values could be overlapping, equivalent, or covering (i.e. ‘A’ start date before ‘B’ start date with end date also greater than ‘B’ end date)

This is of interest for the scenario where customer already has SCD-type2 tracking in place for historical data across more than one tables which have values controlled by different effective date ranges and desires to consolidate to a single dimension. The goal is a single dimension from multiple source tables with TYPE-2 tracking using a single start/end date to capture history of all values associated to effective dates from the source tables including support for regenerating if the individual historical values or date ranges are changed.

Below is the SQL that demonstrates the solution approach I found for taking multiple histories with different start/end dates associated to historical values and getting them to a consolidated History input with type-2 SCD behavior that emulates what would have happened if SCD tracking had been in place incrementally over the date range of the historical data. Here are the steps:

1) Unpivot start/end dates through unions of the start/end dates from all historical tables into 2 columns

2) Create distinct ranges from the unpivoted dates by joining the distinct start dates to the next available end date

3) Join back to the source tables for all values effective for the segmented date ranges.

The SQL accomplishes the 3 items through views (unpivoted, range, history) to generate the required rows for the type-2 SCD dimension.

Although only 2 tables are involved, this should work the same way for additional tables, it just means adding additional unions in the unpivoted view and additional left joins to the history view.

CREATE TABLE [dbo].[t1](

[k] [int] NOT NULL,

[v] [int] NULL,

[s] [date] NULL,

[e] [date] NULL)

GO

CREATE TABLE [dbo].[t2](

[k] [int] NOT NULL,

[v] [int] NULL,

[s] [date] NULL,

[e] [date] NULL)

CREATE VIEW [dbo].[Unpivoted] AS

SELECT s, e FROM T1 UNION SELECT s, e FROM T2

GO

CREATE VIEW [dbo].[Ranges] AS

SELECT s, (SELECT MIN(e) FROM Unpivoted WHERE e > VS.s) AS e FROM (SELECT DISTINCT s FROM Unpivoted) AS VS

GO

CREATE VIEW [dbo].[History] AS

SELECT R.s, R.e, t1.k as T1_Key, t2.k as T2_Key, T1.v as T1_Value, T2.V as T2_Value FROM Ranges R

LEFT JOIN T1

ON T1.s BETWEEN R.s AND R.e

OR T1.e BETWEEN R.S and R.e

OR T1.s < R.s AND T1.e > R.e

LEFT JOIN T2

ON T2.s BETWEEN R.s AND R.e

OR T2.e BETWEEN R.s AND R.e

OR T2.s < R.s AND T2.e > R.e

INSERT [dbo].[t1] ([k], [v], [s], [e]) VALUES (1, 0, CAST(0x01380B00 AS Date), CAST(0x78380B00 AS Date))

INSERT [dbo].[t1] ([k], [v], [s], [e]) VALUES (2, 1, CAST(0x79380B00 AS Date), CAST(0xF3380B00 AS Date))

INSERT [dbo].[t1] ([k], [v], [s], [e]) VALUES (3, 0, CAST(0xF4380B00 AS Date), CAST(0x11390B00 AS Date))

INSERT [dbo].[t1] ([k], [v], [s], [e]) VALUES (4, 1, CAST(0x12390B00 AS Date), CAST(0x20390B00 AS Date))

INSERT [dbo].[t1] ([k], [v], [s], [e]) VALUES (5, 0, CAST(0x21390B00 AS Date), CAST(0x30390B00 AS Date))

INSERT [dbo].[t1] ([k], [v], [s], [e]) VALUES (6, 1, CAST(0x31390B00 AS Date), CAST(0x4E390B00 AS Date))

INSERT [dbo].[t2] ([k], [v], [s], [e]) VALUES (1, 0, CAST(0x01380B00 AS Date), CAST(0x4A380B00 AS Date))

INSERT [dbo].[t2] ([k], [v], [s], [e]) VALUES (2, 1, CAST(0x4B380B00 AS Date), CAST(0x87380B00 AS Date))

INSERT [dbo].[t2] ([k], [v], [s], [e]) VALUES (3, 0, CAST(0x88380B00 AS Date), CAST(0xB5380B00 AS Date))

INSERT [dbo].[t2] ([k], [v], [s], [e]) VALUES (4, 1, CAST(0xB6380B00 AS Date), CAST(0x11390B00 AS Date))

INSERT [dbo].[t2] ([k], [v], [s], [e]) VALUES (5, 0, CAST(0x12390B00 AS Date), CAST(0x20390B00 AS Date))

INSERT [dbo].[t2] ([k], [v], [s], [e]) VALUES (6, 1, CAST(0x21390B00 AS Date), CAST(0x4E390B00 AS Date))

Below are the results of running the SQL

/*————————

SELECT * FROM T1 – Test table 1

SELECT * FROM T2 – Test table 2

SELECT * FROM UnPivoted – All distinct start/end dates from all rows extracted into 2 columns

SELECT * FROM Ranges – All distinct start dates with the next end date

SELECT * FROM History – Join back to the source tables to get the historical keys and values

————————*/

k           v           s          e

———– ———– ———- ———-

1           0           2014-01-01 2014-04-30

2           1           2014-05-01 2014-08-31

3           0           2014-09-01 2014-09-30

4           1           2014-10-01 2014-10-15

5           0           2014-10-16 2014-10-31

6           1           2014-11-01 2014-11-30

(6 row(s) affected)

k           v           s          e

———– ———– ———- ———-

1           0           2014-01-01 2014-03-15

2           1           2014-03-16 2014-05-15

3           0           2014-05-16 2014-06-30

4           1           2014-07-01 2014-09-30

5           0           2014-10-01 2014-10-15

6           1           2014-10-16 2014-11-30

(6 row(s) affected)

s          e

———- ———-

2014-01-01 2014-03-15

2014-01-01 2014-04-30

2014-03-16 2014-05-15

2014-05-01 2014-08-31

2014-05-16 2014-06-30

2014-07-01 2014-09-30

2014-09-01 2014-09-30

2014-10-01 2014-10-15

2014-10-16 2014-10-31

2014-10-16 2014-11-30

2014-11-01 2014-11-30

(11 row(s) affected)

s          e

———- ———-

2014-01-01 2014-03-15

2014-03-16 2014-04-30

2014-05-01 2014-05-15

2014-05-16 2014-06-30

2014-07-01 2014-08-31

2014-09-01 2014-09-30

2014-10-01 2014-10-15

2014-10-16 2014-10-31

2014-11-01 2014-11-30

(9 row(s) affected)

s          e          T1_Key      T2_Key      T1_Value    T2_Value

———- ———- ———– ———– ———– ———–

2014-01-01 2014-03-15 1           1           0           0

2014-03-16 2014-04-30 1           2           0           1

2014-05-01 2014-05-15 2           2           1           1

2014-05-16 2014-06-30 2           3           1           0

2014-07-01 2014-08-31 2           4           1           1

2014-09-01 2014-09-30 3           4           0           1

2014-10-01 2014-10-15 4           5           1           0

2014-10-16 2014-10-31 5           6           0           1

2014-11-01 2014-11-30 6           6           1           1

(9 row(s) affected)

Posted in Uncategorized | Leave a comment

Dealing with a corrupt transaction log in SQL Server

I just went through the experience of a corrupted transaction log for a large SQL Server database. It was actually not as bad as I thought it would be. I ended up restoring from a backup and then putting the corrupt database into emergency mode to generate scripts for updated objects and dumped out data added since the last backup. Fortunately, it wasn’t too hard to figure that out.

As it turns out, I really don’t think anything was lost on the database since it had no live activity for several hours prior to the failure. Unless I’m misunderstanding the SQL checkpoint feature for the transaction log, a database that is at rest and has no activity is likely to have very little reliance on the transaction log in order to be current. Based on comparison of the data sources used to load the database and what was in the tables and inspection of the code changes made from scripts, there appears virtually no loss of data or metadata.

What was the most distressing of this was the fact that the transaction log was actually on a Raid-1 device. The Raid-1 was based on Windows raid because it utilized 2 Fusion-IO (HP IO Accelerator version) drives. I had even coupled together drives from two different HP IO Accelerators Duos to minimize the impact of a failure at the card level rather than at the module level. Only one of the drives failed. However, instead of simply going to a failed redundancy state, both halves of the Raid device ended up corrupted. This is Windows Server 2008 R2. The problem happened while running a FIO-STATUS –a with the HP IO Accelerator GUI open. There was an issue at one point with caution recommended for running fio-status while a drive is being accessed, but I thought that was resolved with later versions of the driver and I have done it before without issues. The only explanation I can think of is that either there was a more broad failure of the driver itself or there is a problem with Windows software raid correctly supporting FIO drives.

In any case, the below cleared up my database, but it took 17 hours to finish (database is over 500GB with about 3 billion rows and was using page/row compression to keep it under 1 TB). By then, I had utilized the emergency mode and got the database restored from 3 day old backup up and running with current data and objects.

EXEC sp_resetstatus ‘tp_v5’;
ALTER DATABASE tp_v5 SET EMERGENCY
DBCC checkdb(‘tp_v5’)
ALTER DATABASE tp_v5 SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DBCC CheckDB (‘tp_v5’, REPAIR_ALLOW_DATA_LOSS)
ALTER DATABASE tp_v5 SET MULTI_USER
alter database tp_v5 set online

This gave me the below type of messages:

There are 3521253 rows in 19385 pages for object “Load.EquityHistory”.
DBCC results for ‘Web.WebDownload’.
There are 5856171 rows in 25344 pages for object “Web.WebDownload”.
DBCC results for ‘Load.EodError’.
There are 0 rows in 0 pages for object “Load.EodError”.
DBCC results for ‘Trader.TestInterval’.
There are 0 rows in 0 pages for object “Trader.TestInterval”.
DBCC results for ‘Simulator.IntervalSetup’.
There are 0 rows in 0 pages for object “Simulator.IntervalSetup”.
DBCC results for ‘Trader.Portfolio_WhatIf’.
There are 0 rows in 0 pages for object “Trader.Portfolio_WhatIf”.
DBCC results for ‘StrategyInterval’.
There are 56 rows in 1 pages for object “StrategyInterval”.
DBCC results for ‘Load.EtfList’.
There are 776 rows in 5 pages for object “Load.EtfList”.
DBCC results for ‘YahooMapping’.
There are 56 rows in 1 pages for object “YahooMapping”.
DBCC results for ‘EquityAdjust’.
There are 189116 rows in 2583 pages for object “EquityAdjust”.
DBCC results for ‘Trader.SimulationIntervalSetup’.
There are 0 rows in 0 pages for object “Trader.SimulationIntervalSetup”.
DBCC results for ‘Load.EquityIntraday’.
There are 1557783239 rows in 8006190 pages for object “Load.EquityIntraday”.

CHECKDB found 0 allocation errors and 0 consistency errors in database ‘tp_v5’.

Note that I had to actually run with the repair_allow_data_loss to get back online even though the initial CHECKDB without the allow data loss passed OK. Even after the first CheckDb, emergency mode remains set until doing one with check for data loss when you have a critical error such as corruption of the log.

One option I tried without success was to detach the database after putting it into emergency mode and then try to attach it without specifying the log file using the rebuild_log option on the attach. However, that is a fatal mistake because SQL Server will not allow this for a database that was not healthy at the time of the attached. Fortunately, I had the wherewith all to shut down the SQL Server and make a copy of all of the files before trying that experiment.

The first thing to do in any crisis like this is to make sure you have a copy of the database files – even the bad ones. If you do make the mistake of detaching an unhealthy database and you have a complete backup that includes all of the file groups, there is the option of moving the files from the bad database to another location, restoring the backup, stopping SQL Server, replacing the original files with the corrupt files, and starting SQL up to get back to the baseline situation.

A better approach is to simply make sure your database is backed up as often as needed for your business requirements..

Posted in Uncategorized | Leave a comment

Normalizing time-series data using a table function in SQL

Recently I had to normalize quarterly, monthly, and ad-hoc reported metrics into daily rows in order to do correlative analysis. Below is the function that I came up with. The idea is to generate values that are in line with the trend. This can be used with a union on a table that does not contain the gap data to display a smooth transition that moves the values from each actual increment. This is useful for trading strategies that integrate macro data with technical data. For example, I might want to have a strategy that factors in NYSE margin debt or market sentiment to determine that a hedge position might be useful, while also integrating in technical indicators that suggest an appropriate entry or exit point. Most investment strategy generation programs utilize bar data or statistical files that must match up on a daily basis so intermediate rows must be created with values that follow the trend to fill in the gaps.

create function [dbo].[tdf_GenerateDailyGapValues] (
    @StartDate date,
    @EndDate date,
    @StartValue money,
    @EndValue Money)
— Generate intervening values for missing days based on a calendar

returns @GapValues table
(
    GapDate date,
    GapValue money
)
as begin
    declare @DayCount money = datediff(DAY,@StartDate, @EndDate)
    declare @DayChange money = (@EndValue – @StartValue) / @DayCount
    declare @Counter int = 0
    declare @LastGapValue money = @StartValue
    while @Counter < @DayCount – 1
    begin
        set @LastGapValue = @LastGapValue + @DayChange
        set @Counter = @Counter + 1
        insert into @GapValues
        (GapDate, GapValue)
        values (dateadd(day, @Counter, @StartDate), @LastGapValue)
    end
    return
end

Below is sample output from using the function:

SELECT * FROM dbo.tdf_GenerateDailyGapValues
    (‘2014-01-01′,’2014-02-01’,100.00, 200.00)

GapDate    GapValue
2014-01-02    107.1428
2014-01-03    114.2856
2014-01-04    121.4284
2014-01-05    128.5712
2014-01-06    135.714
2014-01-07    142.8568
2014-01-08    149.9996
2014-01-09    157.1424
2014-01-10    164.2852
2014-01-11    171.428
2014-01-12    178.5708
2014-01-13    185.7136
2014-01-14    192.8564

Here is another example of using this.

One problem with this version of the function is that market data is only relevant to business days. Based on this, below is a more practical version that utilizes a calendar to only output values for business days.

create function [dbo].[tdf_GenerateDailyGapValues] (
    @StartDate date,
    @EndDate date,
    @StartValue money,
    @EndValue Money)
— Generate intervening values for missing days based on a calendar
returns @GapValues table
(
    GapDate date,
    GapValue money
)
as begin
    declare @DayCount money
    select @DayCount = COUNT(*) FROM Olap.TradingDayCalendar dc
        WHERE dc.TradingDate > @StartDate and dc.TradingDate < @EndDate
        AND dc.DayNumber = FLOOR(dc.DayNumber)
    IF @DayCount > 0
    begin
        declare @DayChange money = (@EndValue – @StartValue) / @DayCount
        declare @Counter int = 0
        declare @LastGapValue money = @StartValue
        while @Counter < @DayCount – 1
        begin
            set @Counter = @Counter + 1
            if exists(select 0 from Olap.TradingDayCalendar dc
                where dc.TradingDate = Dateadd(day, @Counter, @StartDate) AND DayNumber = FLOOR(DayNumber))
            begin
                set @LastGapValue = @LastGapValue + @DayChange
                insert into @GapValues (GapDate, GapValue)
                values (dateadd(day, @Counter, @StartDate), @LastGapValue)
            end
        end
    end
    return
end

select * from [dbo].[tdf_GenerateDailyGapValues] (‘2014-01-01′,’2014-01-15’,100.00,200.00)

GapDate    GapValue
2014-01-02    107.1428
2014-01-03    114.2856
2014-01-06    121.4284
2014-01-07    128.5712
2014-01-08    135.714
2014-01-09    142.8568
2014-01-10    149.9996
2014-01-13    157.1424
2014-01-14    164.2852

Here is an example application of this that prorates out PE Ratios for a table that contains P/E Ratio data which is only updated every few days:

CREATE VIEW Export.view_Fundamentals_WithGaps
AS
    SELECT TradingSymbol, MarketDate, PERatio, 0 AS GapRow FROM Load.Fundamentals f
    UNION ALL SELECT f.TradingSymbol, t.GapDate, t.GapVAlue, 1 as GapRow
    FROM load.Fundamentals f
    INNER JOIN Load.Fundamentals f2
        ON    f2.TradingSymbol = f.TradingSymbol
        AND f2.MarketDate = f.PriorMarketDate
    CROSS APPLY dbo.tdf_GenerateDailyGapValues(f.PriorMarketDate, f.MarketDate, f2.PERatio, f.PeRatio) t
GO

SELECT * FROM Export.view_Fundamentals_WithGaps WHERE TradingSymbol = ‘FB’ AND MarketDate > ‘2014-02-01’ ORDER BY  MarketDate

TradingSymbol    MarketDate    PERatio    GapRow
FB    2014-02-03    101.80    0
FB    2014-02-04    101.7825    1
FB    2014-02-05    101.765    1
FB    2014-02-06    101.7475    1
FB    2014-02-10    101.73    0

For the example to work, the prior date must be set for each of the rows in the table that contains the source values. A utility stored procedure can handle this as shown below:

CREATE  procedure [dbo].[util_setFundamental_PriorDate]
    @StartDate DATE = NULL,
    @EndDate DATE = NULL
AS BEGIN
— Does Period Upsert Fixes 1 day at a time
    IF @StartDate IS NULL
        SELECT @StartDate = MIN(MarketDate) FROM Load.Fundamentals
    IF @EndDate IS NULL
        SELECT @EndDAte = MAX(MarketDate) FROM Load.Fundamentals
    SET NOCOUNT ON
    DECLARE DayCursor CURSOR
        FOR SELECT TradingDate from olap.TradingDayCalendar   
            WHERE TradingDate > @StartDate
            ORDER BY TradingDate
    DECLARE @MarketDate DATE
    OPEN DayCursor   
    FETCH DayCursor into @MarketDate
    WHILE @@FETCH_STATUS = 0 AND @MarketDate < @EndDate
    BEGIN
        PRINT ‘Processing Date ‘ + CONVERT(VARCHAR(11), @MarketDate)
        UPDATE f
        SET PriorMarketDate = (SELECT MAX(MarketDate) FROM Load.Fundamentals f2
                                WHERE    f2.TradingSymbol = f.TradingSymbol
                                    AND    f2.MarketDate < f.MarketDate)
        FROM Load.Fundamentals f
        WHERE f.MarketDate = @MarketDate
        FETCH DayCursor into @MarketDate
    END
    CLOSE DayCursor
    DEALLOCATE DayCursor
END

Although this seems inefficient with the cursor, it actually performs decently since there are not that many dates to traverse through. This method also minimizes locking by keeping the number of rows updated to be manageable. I was able to populate the prior market dates using this procedure for 1.4 million rows spanning from July, 2009 through February, 2014 in just under 4 minutes as a one-time fix. Since then, I’ve added it to the stored procedure that does the daily load of the data to just do it for the date loaded which takes less than a second each day.

Posted in SQL Server, SQL Tips and Techniques | Tagged | Leave a comment

Converting Hyper-V machines from Windows 2008 R2 to Windows 2012 R2

I am posting this because it wasn’t obvious just from searching the knowledge base. I was recently converting some VMs running under Windows 2008 R2 to Windows 2012 R2 and dismayed to find out that I could not import the VMs into 2012 R2 that I exported from Windows 2008 R2. It turns out this is just a case of doing more than you need to do. All you need to do is shut down the 2008 R2 VMs, copy the files directly to the location you want them on in the Windows 2012 R2 server and then import them.

That is right, you just import them even without exporting them. If you export them, you can’t import them.  This is explained in more detail in http://blogs.technet.com/b/rmilne/archive/2013/10/22/windows-hyper-v-2012-amp-8-1-hyper-v-did-not-find-virtual-machine-to-import.aspx

There are a few caveats, you need to reconnect the network adapter and if you might get prompted for the exact location of the virtual hard drives if you’ve moved things around some from the original setup. Also, be sure to manually edit the .vhd hard drive files to update location for the parent of any differenced discs.

There is also the option to upgrade to the new .VHDX format for the drives. That can be done without updating to generation 2 for the virtual machine configuration (requires Windows 2012, Windows 8 or later to be on the guest VM). There are significant scalability and performance advantages with the .VHDX format, especially for SSD and newer drives that natively use 4K sectors. This is explained in http://blogs.technet.com/b/askpfeplat/archive/2013/09/09/why-you-want-to-be-using-vhdx-in-hyper-v-whenever-possible-and-why-it-s-important-to-know-your-baselines.aspx

I’ve started using the new .vhdx format for my legacy Windows 2008 R2 guest machines and they work well.

Posted in Hyper-v, Virtualization, Windows Server, Windows Server 2008 R2, Windows Server 2012, Windows Server 2012 R2 | Tagged | Leave a comment

Moving Databases from One Instance to Another Painlessly (Attach Statement Generator)

If you’ve ever had a bunch of databases and need to just move them to another instance without having to manually do the attach, these two scripts may be helpful. I recently was faced with this situation and have learned from experience that the time it takes to script something involving more than a few configuration actions is worth it, not just to ensure reliability for the task at hand, but for the likelihood that I will need it again. The script is especially useful if you have lots of files. It just generates the SQL so you can edit. You will probably want to edit for most scenarios to specify the new locations, but that is a editor-processing task to do global search/replace, etc.

There are two scripts involved. One does the meat of the work and generates the concatenation of T-SQL file clauses using a cursor to enumerate the filenames associated with the database. The other simply calls the attachment scalar function and wraps it with the statements needed to actually create the database and specify the attach option.

Below are the two scripts. They work for my purposes, but I do not warranty them. Smile

create function dbo.util_GenerateAttach (@dbid int)
    returns nvarchar(max)
as begin
    /*    Robert Leithiser, Authentic Intelligence 2013-11-23 – Generates the attach statement
        to move over a database. Run this as follows to generate attach statements for all databases
        on an instance:
    
        select ‘create database ‘ + d.name, dbo.util_GenerateAttach(d.dbid) + ‘ for attach;’
        from sys.sysdatabases  d where d.sid <> 01
    */
    declare @filestring nvarchar(max) = ”
    declare x cursor for select ‘(filename = N”’ + a.filename + ”’)’ as filename
    from sys.sysaltfiles a
    where a.dbid = @dbid
    open x
    fetch from x into @filestring
    while @@FETCH_STATUS = 0
    begin
        declare @filename nvarchar(max)
        fetch from x into @filename
        set @filestring = @filestring + ‘,’ + @filename
    end
    close x
    deallocate x
    return @filestring
end
go

create function dbo.util_tdf_GenerateAllAttachStatements ()
    returns table
    /*    Robert Leithiser, Authentic Intelligence 2013-11-23 – Generates the attach statement
        to move over a database. Run this as follows to generate attach statements for all datbases
        on an instance

        select * from dbo.util_tdf_GenerateAllAttachStatements()

    */
    as return select ‘create database ‘ + d.name + ‘ on ‘
     + dbo.util_GenerateAttach(d.dbid) + ‘ for attach;’ as AttachStatement
    from sys.sysdatabases  d
    where sid <> 01
go

Posted in SQL Server, SQL Tips and Techniques | Tagged | 2 Comments

Script to auto-add surrogate primary keys

Here’s an example of how to use information_schema tables to automatically add clustered primary surrogate keys

select ‘alter table ‘ +  table_name + ‘ add ‘ + table_name + ‘_Id int identity;’ from INFORMATION_SCHEMA.tables t

where not exists (select 0
from sys.objects o inner join sys.columns c on o.object_id = c.object_id
where c.is_identity = 1 and o.object_id = object_id(t.TABLE_SCHEMA + ‘.’ + t.TABLE_NAME))
and t.TABLE_TYPE = ‘BASE TABLE’

union all

select ‘alter table ‘ + table_name + ‘ add constraint ‘ + table_name + ‘_pk primary key clustered (  ‘ + table_name + ‘_id );’ from INFORMATION_SCHEMA.tables t

where not exists (select 0
from sys.objects o inner join sys.columns c on o.object_id = c.object_id
where c.is_identity = 1 and o.object_id = object_id(t.TABLE_SCHEMA + ‘.’ + t.TABLE_NAME))
and t.TABLE_TYPE = ‘BASE TABLE’

This will generate the alter commands and the primary keys as shown below if we had tables SalesOffice, Address, SalesRep, CostCenter and they did not have identity columns already.

The not exists check verifies if the table already has a surrogate identity and only create if needed. That would just be done by adding a where to check for existence of an identity column

alter table SalesOffice add SalesOffice_Id int identity;

alter table Address add Address_Id int identity;

alter table SalesRep add SalesRep_Id int identity;

alter table CostCenter add CostCenter_Id int identity;

 

alter table Address add constraint Address_pk primary key clustered (  Address_id );

alter table SalesRep add constraint SalesRep_pk primary key clustered (  SalesRep_id );

alter table CostCenter add constraint CostCenter_pk primary key clustered (  CostCenter_id );

alter table SalesMgr add constraint SalesMgr_pk primary key clustered (  SalesMgr_id );

Posted in SQL Server | Tagged | Leave a comment

Auto-discovery of data relationships minus referential integrity

My experience is that unfortunately far too many databases lack referential integrity in the schema. There are a few reasons for this, almost all of them bad. One of the worse is the “our application controls integrity” rationale with a dialogue that typically goes something like:

Customer: “We control this in our application logic, so don’t need to do it in the database. “

My response: So if your application does it so well, why can’t you also enable it in the database?

Usual customer response: “Well, that would break our application because we temporarily allow inconsistencies for our application to work”

My response: Is your application building the data into the tables in a single transaction so that nothing is committed to the database until all the operations needed to ensure integrity are completed?

Usual customer response: “No, that would be too slow” or “No, that is too complicated” or “No, that would cause locking issues” (concurrency problems) “No, we need to be able to temporarily store the inconsistent data in the database”, etc.

My response: So that means if there is an unexpected error or a system problem in the application then you are left with data integrity problems.

Customer response: “Uh, well, um, yes, I guess so”

My purpose in this post though is not to delve into this design problem and why it is critical to enforce referential integrity in the database, but to discover the integrity that may already exist in the database. There is also another motivator for this post and that is there are often relationships in the data which are not strictly needed for integrity that exist possibly coincidentally or due to some unobvious correlation. Based on that, I think there is value in the tool even if you are working with a database that has strict referential integrity implemented.

Without further ado, below is the technique shown in both the single column flavor where we only look for potential foreign keys and primary keys based on a single column or a flavor for doing multiple columns for foreign keys/primary keys. Both techniques require dynamic SQL to implement – the design for this is to generate the SQL and execute it to return the results back into a variable in the mainline code.

The single-column version is pretty easy to write the SQL for and I plan to do this in the next couple of weeks. The multi-column is trickier, especially to do it in an efficient manner what only visits each combination once. The best implementation is probably a recursive CTE to build that list. I’m going to see if I can convince my adult son Blake who is an excellent SQL developer and could write a book on recursive CTE queries. He has already developed several for various applications that will make your hair stand on end.

The below is based on utilizing the standard information schema views in SQL Server to enumerate tables and columns (Information_Schema.Tables and Information_Schema.Columns). This could also be driven by a table of suspected relationships instead to make more efficient and the process could write back the discovered relationships to use as a baseline for maintenance of the relationship information.

One other interesting item to implement in this would be a “fudge-factor” when checking the relationships. Since by definition, this really is a process for databases that don’t have referential integrity, it is very likely that few relationships will be found in such databases because of the very fact that no referential integrity exists and there is simply bad and inconsistent data. By utilizing a fudge factor when doing the select distinct against the select for the table on the foreign-key validation, we could identify potential data integrity problems where the number of matches is at a sufficient percentage to indicate that something probably should be a foreign key with the mismatches actually being data integrity errors.

image

Note, there is a possibility of false positives, especially if the database is lightly populated – i.e. finding something that looks like a foreign key because it contains only the values from the candidate primary key, but in reality could be just a coincidence of the data.

Posted in SQL Server | Tagged | Leave a comment