Relational State Sequences and SQL Server CLR Aggregate functions

My research on continuous improvement problem solving frameworks has convinced me that a problem system must possess complete data about the processes it is trying to learn from including how all the items within a solution attempt have changed. I call this process “relational state tracking”. Note, that tracking all of the information does not mean the system will have to use all of this in order to gain self-improvement and optimization. In fact, the goal of doing complete information tracking isn’t so much to analyze all of the data, but to analyze in multiple small test cases what data is needed for larger cases applying the full set of data for smaller test cases.  The idea is that in order to achieve noise reduction needed for rapid prediction, one must have lots of noise to learn from.

For example, in the Tower of Hanoi puzzle, tracking the state changes between discs and pegs as they relate to having certain values in a certain sequence provides patterns that can be utilized as the number of discs grow.  My doctoral project includes a scenario whereby the goal of a general purpose solver is to find solution for puzzles without any prior knowledge of the puzzle except that required to perform a simulation to search out all possible branches to meet a goal. The motivation for this is to determine feasibility for software to discover solutions to problems where the original solution is not known ahead of time purely by pattern recognition and deduction based on the simulations. The purpose of this blog entry is not to get into my research concerning my doctoral scenario but just to share how this sort of state tracking can be implemented via .NET CLR in SQL Server.

In my case, I want to be able to pull together multiple sequences that represent the truth-falsity of if a particular disc’s peg property (what peg was the disc was on at a certain move) changed to a particular value. The goal is to roll-up all the places where the disc’s peg changed to provide a view of which discs moved on each move regardless of peg. This can help predict what disc to move for larger instances even if we are not certain of what the correct peg should be.

In order to do this, I created 3 aggregation functions using the SQL 2012 Data Tools CLR project. The 3 functions are BinaryStringOr, BinaryStringAnd, BinaryStringXor.  I actually only need the BinaryStringOr for my scenario, but went ahead and created the other ones.  I initially tried to leverage .NET bit strings and use math functions with SQL but the complexity of this became too time-consuming given the relatively small amount of data. So, even though the performance is much worse, I found it easier to just utilize strings containing 1 and 0. This could be done more efficiently using bit strings, however the concept is similar since even bit strings will grow too large to be treated mathematically and will need to be treated bit-by-bit. Even simply 64 bits in a row is too long for a SQL bigint.

The code is shown below for the BinaryStringOr – the code simply checks if the current aggregated string as each new string is received contains a 1 or if the new string contains a 1. In either case, then the result is a 1 to the output string. The practical application of this is to determine if a disc was moved to any peg on any move and set a bit to indicate a state change of the disc. This can also be done for the peg as it correlates to any disc moving onto it. For this code to work, I had to implement the IBinarySeialize rather than the default Format.Native since we are aggregating into a string rather a numeric type.

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

[Serializable]
//[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.UserDefined, MaxByteSize = 8000)]
public struct BinaryStringOr : IBinarySerialize
{
    private SqlString outString;
    private int outLen;

    public void Init()
    {
        outString = “”;
        outLen = 0;
    }

    public void Read(System.IO.BinaryReader r)
    {
        outString = r.ReadString();
    }

    public void Write(System.IO.BinaryWriter w)
    {

        w.Write(outString.Value);
    }

    public void Accumulate(SqlString Value)
    {
        if (outString == “” & !Value.IsNull)
        {
            outString = Value.Value;
            outLen = Value.Value.Length;
        }
        SqlString newOutString = outString;

        if (!Value.IsNull)
        {
            newOutString = “”;
            for (int i = 0; i < Value.Value.Length; i++)
            {
                if (outString.Value.Substring(i, 1) == “1”
                                || Value.Value.Substring(i, 1) == “1”)
                {
                    newOutString = newOutString + “1”;
                }
                else
                {
                    newOutString = newOutString + “0”;
                }

            }
        }
        outString = newOutString;
    }

    public void Merge(BinaryStringOr Group)
    {
        outString = Group.outString;
    }

    public SqlString Terminate()
    {
        return outString;
    }

}

For example, given the following sequences for the discs on different pegs from a game with simply 3 discs, what would the output state sequence be for all state changes made by the disc? This sequence shows the intersection of the disc and a specific peg as a sequence. This is illustrated below:

Disc    Peg    Sequence Map

Disc    Peg    MoveStates
1    1    0000100
1    2    0010000
1    3    1000001
2    1    0000000
2    2    0100000
2    3    0000010
3    1    0000000
3    2    0000000
3    3    0001000

 

For the above, we need to find the sequence for the discs across all pegs. Using the aggregate function from above, we get the following changes for each disc over the sequence of 7 moves required for the optimal solution:

1    1010101
2    0100010
3    0001000

We can see the practicality of this for getting closer to predicting the solution for larger number of discs as the pattern is simply expanded for 4 discs over 15 moves for the optimal solution.

1    101010101010101
2    010001000100010
3    000100000001000
4    000000010000000

The use of the aggregate function is shown below:

SELECT
        InstanceVariableNumber as Disc,
        dbo.BinaryStringOr(MoveStates)
       
    FROM [Games].[InstanceVariableValueStates]
    GROUP BY
    InstanceVariableNumber
    where Games.InstanceVariableCount = 3

A fairly simple machine learning algorithm should be able to spot that that the pattern for the smallest disc is to change state every other move and the next large disc every fourth move, the third every 8th interval and so on.

When the peg states are also examined for patterns for disc counts from 3 to 6, only a couple of the patterns need to be identified and the rest can be deduced due to the standard rules of play effectively generating a rule that only allows 1 choice so that the sequence can be generated instantly. From there, all that is needed is a reversal process to generate the query criteria that reflects the relational sequences.

Peg    Sequence
1    0000100
2    0110000
3    1001011
1    000010000110000
2    100101100000100
3    011000011001011
1    0000100001100000100101100000100
2    0110000110010110000010000110000
3    1001011000001001011000011001011
1    000010000110000010010110000010000110000110010110000010000110000
2    100101100000100101100001100101100000100001100000100101100000100
3    011000011001011000001000011000011001011000001001011000011001011

This entry was posted in Artificial Intelligence, Machine learning, Problem Solving and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s