High-performance Statistical Queries using Self-joins

In my pursuit of understanding asset markets, I’ve maintained a SQL Server database with a lot of information about the stock market and other indexes. Using some data services along with SSIS, this database has been kept current to the point that it now has over 3 billion records in total including 2.3 billion records in an intraday table. Most of my queries and interest concerns cumulative equity and index end-of-day history which is only 75 million row and the options data since 2003 which is now up to 175 million rows.

To be able to query this level of data, I utilize Fusion-io PCIE SSD storage for the SQL Server database. Using self-joins can produce some very interesting analysis. For example, the below query outlines the performance of a few global indexes where there have been large bounces close to market tops and the ensuing performance afterwards. This query complete in just a few seconds. There are couple of tricks that make this run faster – one is the storing of a relative day number to avoid performance issues with working around weekends for querying prior dated history. The day number is sequential across holidays and weekends so a direct link can be done without a range test. The other trick is that the table is partitioned based on the date which allows a good deal of parallelism.

Here is the query

select h.TradingSymbol, h.MarketDate, h.ChgPct + hprev.ChgPct as TwoDayChg, hprev.PriceAtClose as Price,
(h20.PriceAtClose – h.PriceAtClose) / h.PriceAtClose as Chg1Month,
(h60.PriceAtClose – h.PriceAtClose) / h.PriceAtClose as Chg3Month,
(h180.PriceAtClose – h.PriceAtClose) / h.PriceAtClose as Chg9Month,
(h360.PriceAtClose – h.PriceAtClose) / h.PriceAtClose as Chg18Month,
(h540.PriceAtClose – h.PriceAtClose) / h.PriceAtClose as Chng27Month,

((select MAX(hpast.PriceAtClose) from dbo.EquityHistory hpast
where hpast.TradingSymbol = h.TradingSymbol
and hpast.MarketDate < hprev.MarketDate) – hprev.PriceAtClose) / hprev.PriceAtClose
as PctFromTop

from dbo.EquityHistory h
inner join dbo.EquityHistory hprev
    on hprev.DayNumber = h.DayNumber – 1
    and hprev.TradingSymbol = h.TradingSymbol
inner join dbo.EquityHistory h20
    on h20.TradingSymbol = hprev.TradingSymbol
    and h20.DayNumber = hprev.DayNumber + 20
inner join dbo.EquityHistory h60
    on h60.TradingSymbol = hprev.TradingSymbol
    and h60.DayNumber = hprev.DayNumber + 60
inner join dbo.EquityHistory h180
    on h180.TradingSymbol = hprev.TradingSymbol
    and h180.DayNumber = hprev.DayNumber + 180
inner join dbo.EquityHistory h360
    on h360.TradingSymbol = hprev.TradingSymbol
    and h360.DayNumber = hprev.DayNumber + 360
inner join dbo.EquityHistory h540
    on h540.TradingSymbol = hprev.TradingSymbol
    and h540.DayNumber = hprev.DayNumber + 540
where h.TradingSymbol in (‘^dji’,’^dax’,’^ixic’,’^n225′,’^ftse’,’^djt’,’^gspc’,’^rut’,’^ssec’)
and h.ChgPct + hprev.ChgPct > 6.0
and  (select MAX(hpast.PriceAtClose) from dbo.EquityHistory hpast
where hpast.TradingSymbol = h.TradingSymbol
and hpast.MarketDate < hprev.MarketDate) between 1.0 * h.PriceAtClose and 1.15 * h.PriceAtClose
order by h.MarketDate

And here are the results formatted in Excel:

This is not a good omen for the stock market for the next couple of years based on history.

There is probably not enough data here to draw conclusions, but since the query is against indexes rather than individual stocks, it does seem pretty convincing. It is pretty certain that a query against the entire equity history for this would yield similar bottom line averages, but that would take several minutes to complete.

Here is the query again as it shows with the SQL markup:

query-picture

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s