SQL Data Warehousing Tips 1 – Building a Period table

Recently, I started a personal project for analyzing stock market trends and calculating the optimal trading strategies.  I’ve learned a lot of new stuff about LINQ and SQL 2008 from this.  Based on that I’m opening a new series on data warehouse tips. 

In my project, one of the things I quickly realized after calculating that I would need to generate about 1 billion rows in order to have sufficient detail for analysis of 5,000 publicly traded stocks with 1 year of history was that I needed to consolidate the data.  It was obvious I needed a way to summarize data into periods and use those as dimensions for analyzing the performance, rather than calculating all of this from the detailed data. 

What would be nice, would be to have periods that could overlap and align.  For example, I want to be able to go back and look at last month, last quarter, last 6 months, and last year.  However each one of these requires a different set of transactions because even though the closing date is the same, the opening date is different.  The variation in open date affects the price at which a stock would be purchased at (or short-sold at), thus leading to a different set of transactions based on an entry/exit strategy.  In order to focus on the technical aspect of this, I won’t get into all of the application design considerations.  Suffice to say, that discrete, yet aligned periods were needed.

So I immediately wrote a stored procedure (final version below) to try to insert my periods and decided to leverage the merge capability.

Below is the table definition of the Period Table

CREATE TABLE [dbo].[Period](
    [PeriodId] [smallint] NOT NULL,
    [StartDate] [date] NOT NULL,
    [EndDate] [date] NOT NULL,
  
[MonthsInPeriod]  AS (datediff(month,[StartDate],dateadd(day,(1),[EndDate]))),
CONSTRAINT [PK_Period] PRIMARY KEY CLUSTERED
(
    [PeriodId] ASC
)
) ON [PRIMARY]

Note, that we can calculate the number of months in the period (assumes all periods are in 1 month increments) easily using a computed field with datediff

And the Merging (upsert) stored procedure

CREATE PROCEDURE [dbo].[UpsertMonthlyPeriod]
    @PeriodDate date,
    @Months int
AS BEGIN
    DECLARE @EndDate date
    SET @EndDate = DATEADD(DD,-1,DATEADD(MM,@Months, @PeriodDate))
    MERGE INTO Period AS Target
    USING (VALUES (@PeriodDate, @EndDate)) AS Source (NewStartDate, NewEndDate)
    ON Target.StartDate = Source.NewStartDate
    AND Target.EndDate =Source.NewEndDate
    WHEN NOT MATCHED BY TARGET THEN
        INSERT (StartDate, EndDate)
        VALUES (NewStartDate, NewEndDate);
END

This is very simple example of the new Merge functionality.  The USING portion identifies the values to use for mapping the source data to the target, which in this case was simply start-date and end-date (remember I am basically allowing overlapping periods for analysis purposes).

The below stored procedure then actually generates the periods by calling the Upsert stored proc:

CREATE PROCEDURE [dbo].[SetupPeriods]
    @StartDate date,
    @Periods int,
    @CreateCummulative bit
AS BEGIN
    DECLARE @x int = 0
    DECLARE @PeriodDate date
    SET @PeriodDate = @StartDate

    WHILE @x < @Periods
    BEGIN
    — Create Monthly periods
        DECLARE @PeriodMonths int
        EXEC dbo.UpsertMonthlyPeriod @PeriodDate = @PeriodDate, @Months=1
    — Create Quarerly periods
        IF @x % 3 = 0 OR @CreateCummulative =1
        BEGIN
            EXEC dbo.UpsertMonthlyPeriod @PeriodDate = @PeriodDate, @Months=3
        END
    — Create Bi-Annual periods
        IF @x % 6 = 0 OR @CreateCummulative =1
        BEGIN
            EXEC dbo.UpsertMonthlyPeriod @PeriodDate = @PeriodDate, @Months=6
        END
    — Create Annual periods
        IF @x % 12 = 0 OR @CreateCummulative =1
        BEGIN
            EXEC dbo.UpsertMonthlyPeriod @PeriodDate = @PeriodDate, @Months=12
        END
        SET @x = @x+1
        SET @PeriodDate = DATEADD(MM,@x,@StartDate)
    END
END

The end result is the data shown below:

PeriodId    StartDate    EndDate    MonthsInPeriod
8041    2008-04-01    2008-04-30    1
8043    2008-04-01    2008-06-30    3
8046    2008-04-01    2008-09-30    6
8040    2008-04-01    2009-03-31    12
8051    2008-05-01    2008-05-31    1
8053    2008-05-01    2008-07-31    3
8056    2008-05-01    2008-10-31    6
8050    2008-05-01    2009-04-30    12
8061    2008-06-01    2008-06-30    1
8063    2008-06-01    2008-08-31    3
8066    2008-06-01    2008-11-30    6
8071    2008-07-01    2008-07-31    1
8073    2008-07-01    2008-09-30    3
8076    2008-07-01    2008-12-31    6
8081    2008-08-01    2008-08-31    1
8083    2008-08-01    2008-10-31    3
8086    2008-08-01    2009-01-31    6
8091    2008-09-01    2008-09-30    1
8093    2008-09-01    2008-11-30    3
8096    2008-09-01    2009-02-28    6
8101    2008-10-01    2008-10-31    1
8103    2008-10-01    2008-12-31    3
8106    2008-10-01    2009-03-31    6
8111    2008-11-01    2008-11-30    1
8113    2008-11-01    2009-01-31    3
8116    2008-11-01    2009-04-30    6
8121    2008-12-01    2008-12-31    1
8123    2008-12-01    2009-02-28    3
9011    2009-01-01    2009-01-31    1
9013    2009-01-01    2009-03-31    3
9021    2009-02-01    2009-02-28    1
9023    2009-02-01    2009-04-30    3
9031    2009-03-01    2009-03-31    1
9041    2009-04-01    2009-04-30    1

10011 2010-01-01…

..

But, wait!  Where did the PeriodId come from?  It’s not an identity field and the numbers appear to follow a pattern.

Can you guess the pattern?

Yes, that’s right – last 2 digits of year + 2 digit month + length of period (except that 12 month period is considered “0”).  This was so that it could still fit in smallint field.  I’ve got millions of rows linking back to the period and want to keep the index small, so didn’t want the extra digit.

The trick for this was an insert/update trigger shown below:

CREATE TRIGGER [dbo].[Period_IU_Trig]
   ON  [dbo].[Period]
   AFTER INSERT, UPDATE
AS
BEGIN
— Set Period Id to be meaningful (YY + MM + Period Length)

    SET NOCOUNT ON;
    UPDATE Period
    SET PeriodId = CAST(CAST(DATEPART(YY,INSERTED.StartDate) – 2000 AS VARCHAR(2)) +
       CAST(RIGHT(100 + (DATEPART(MM,INSERTED.StartDate)),2) AS VARCHAR(2)) +
       CAST((CASE WHEN INSERTED.MonthsInPeriod = 12 THEN 0 ELSE INSERTED.MonthsInPeriod END)AS CHAR(1)) AS SMALLINT)
    FROM INSERTED
    WHERE INSERTED.PeriodId = Period.PeriodId

END

 

Note, I normally don’t advocate having “meaningful” surrogate keys (an oxymoron), but in this case, its very handy in testing to quickly identify the period without going back to the source table.

To present this to the users, I provide a view that formats the dates to show the date range in descending order of the period along with the length in months.  For this scenario, this is easier to work with than a calendar control and works great for a drop down as you can see below

 

image

The stored procedure for formatting this is shown below:

CREATE procedure [Reports].[SelectCurrentPeriod]
AS
SELECT     TOP (100) PERCENT PeriodId,
  CAST(StartDate AS VARCHAR(10)) + ‘ To ‘
  + CAST(EndDate AS VARCHAR(10))
  + ‘ (‘
  + CAST(MonthsInPeriod AS VARCHAR(2))
  + ‘ Months)’ as PeriodDesc
FROM         dbo.Period
WHERE     (EndDate <
                          (SELECT     MAX(LoadDate) AS Expr1
                            FROM          dbo.LoadHistory))

 

There you have it, a period table and some supporting procedures.  For our next tip, we’ll look at how the period table actually comes into play when linking up with the data.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s