Results 1 to 8 of 8

Thread: Non Unique Keys - lack of column indexes

Hybrid View

  1. #1
    Junior Member
    Join Date
    Feb 2012
    Posts
    15

    Default Non Unique Keys - lack of column indexes

    Hi,
    I have a simple primitive key value table XTable<UniqueKey,NonUniqueValue>
    UniqueKey is ulong
    NonUniqueValue is string
    
    I need fast look ups by either the Key or the Value (not composite).

    According to the posts the recommended solution is to create another table;however, how do I store the data for XTable<NonUniqueValue,UniqueKey>


    table[NonUniquevalue] = UniqueKey
    
    the code above will not work because my string strings are variable len and sometimes > 32

    Is there a way to make an index instead of trying to make a table with string key?
    Last edited by omasri; 27.04.2012 at 13:41.

  2. #2

    Default

    Quote Originally Posted by omasri View Post
    Hi,
    I have a simple primitive key value table XTable<UniqueKey,NonUniqueValue>
    UniqueKey is ulong
    NonUniqueValue is string
    
    I need fast look ups by either the Key or the Value (not composite).

    According to the posts the recommended solution is to create another table;however, how do I store the data for XTable<NonUniqueValue,UniqueKey>


    table[NonUniquevalue] = UniqueKey
    
    the code above will not work because my string strings are variable len and sometimes > 32

    Is there a way to make an index instead of trying to make a table with string key?
    In theory,

    if you combine in one byte array your string+key from first example table(ulong) (in such sequence) it will give you always unique combination among the whole table.

    string value="test";
    byte[] finalByte = System.Text.Encoding.Ascii.GetBytes(value);

    ulong key = 1;
    finalByte = finalByte.Concat(key.ToByteArray());

    then you can store the final byte as a key in the second table and value for the second table, can be the key from the first table (unique ulongs, for example).

    But in practice, in STSdb R3.5 we are missing
    table.SelectStartsWith(System.TextEncoding.Ascii.G etBytes(value));
    where we could supply the whole string, which we are searching for, or only the start of the string, to get the value as fast as radix trie permits that.
    Last edited by blaze; 27.04.2012 at 22:22.

  3. #3

    Default

    The answer will not be full without visible variants:

    2.

    From NonUnique example value we create a table with the string or byte[] key. And store every non-uinque value only once (grouped variant). As a value for the table we store always growing up 8*N bytes sequnece of ulong keys which refer to the table 1, from example. Very bad and hard to support, in case of removals.

    3.
    NoSql Horizontal scaling.
    From Every non unique string you create a table with the specific prefix:

    table: "myspecialtable/Hello"
    Key: ulong reference to table 1 (where such string can be found) will be always unique, Value can be empty
    table: "myspecialtable/Vacancy"
    K:25489; V:null
    K:27489; V:null
    table: "myspecialtable/Are you there?"
    etc...


    When you need to find the document Id, who holds this string, first you search table: "myspecialtable/" + searchString, then when you found table, just iterate to get all documents Ids where it resides.


    Or even better is to create special table where you will store all strings in unique variants without spaces:
    Table: "UniqueStrings"

    K: "Hello" V: 1
    K: "Vacancy" V: 2
    K: "Are" V:3
    K: "You" V:4

    Special key where you calculate identity:
    K: "@ increment" V: (maximal identity, here is 4)

    and then you create as many tables as words
    myspecialtable/1
    K:25489; V:null Key - reference to the document who holds this word
    K:27489; V:null
    myspecialtable/2
    K:16; V:null
    K:17; V:null
    myspecialtable/3
    myspecialtable/4


    Having such extra table will give you ability quickly to find all documents containing specified word.



    When you remove document (table 1 from example) you don't even need to delete other tables or records, they will continue to hold "dead-links" and you will understand it by necessity, trying to access removed document and react respectively.


    Updating of the document will cause regrouping of the necessary tables and records.

    Harder for the programmer, faster for the software.
    Last edited by blaze; 27.04.2012 at 22:48.

  4. #4
    Junior Member
    Join Date
    Feb 2012
    Posts
    15

    Default

    When you need to find the document Id, who holds this string, first you search table: "myspecialtable/" + searchString, then when you found table, just iterate to get all documents Ids where it resides.


    This is an interesting approach. I will try it

  5. #5
    Junior Member
    Join Date
    Feb 2012
    Posts
    15

    Default

    Keeping with the KEY/VALUE store approach, the ideal solution would be to store all the variable length data in a separate file and have locators pointers back. If creating an XTABLE was not so slow, then I could create a db that just stores strings and use their ulong xtable offsets as their key values in the data. This is better than using a hashcode because it is unique for all strings. I also means all strings are only stored once.

    I tried to do this, but creating the XTABLE was orders or magnitude slower. Maybe you could develop a special type of file and table just for this purpose. People will not be inserting data into the table. All that you need is the ability to store the locator and get the offset. You don't need any of the other attributes of the xtable.

    When I want to search for a string I simply need to do

    ulong key = storeWithVariableLengthData.OpenXTable(New Locator("my very long string is here")).Offset
    We could even replace all " " with "\\" to make the locator search faster.

    The key to this solution is a lightning fast xtable creation. Maybe you write GetStoreString(locator) returns ulong offset instead of XTABLE.
    Last edited by omasri; 28.04.2012 at 09:37.

  6. #6

    Default

    Thanks for the replies, blaze!

    Just want to add one more possible solution. Duplicate the data into two tables.
    XTable<uniqueKey, Record> primary;
    XTable<XKey<nonUniqueKey, uniqueKey>, Record> secondary;
    
    Sounds rude, but there are lot of advantages.

    What we can sacrifice? Compact size and dramatically lower speed when searching by a secondary index? Or data size duplication?

    Having lookup tables, secondary indexes etc. entails to one of the most fundamental problems in the databases: the seek penalty. Seeks are expensive. No matter HDD or SSD. Here, even the different cache techniques, in general are useless.

    STSdb by default works in compress mode, which compact the data ~4-5 times. And if we sacrifice 2x or 3x, we will still better than the raw size.

    In some cases (of course, not always) it is a better solution to duplicate the data - once by the first key and second - by the second one:
        using (StorageEngine engine = StorageEngine.FromFile("test.stsdb"))
        {
            //create tables
            var primary = engine.Scheme.CreateOrOpenXTable<int, Record>(new Locator("primary"));
            var secondary = engine.Scheme.CreateOrOpenXTable<XKey<string, int>, Record>(new Locator("secondary"));
            //set a new keymap to extended the max string subkey length
            secondary.KeyMap = new XKeyMap<string, int>(null, 64, null, sizeof(int)); 
            engine.Scheme.Commit();
    
            //insert into both tables
            int i = 0;
            foreach (var record in GetRecords())
            {
                var uniqueKey = i;
                var nonUniqueKey = record.Field2;
    
                primary[uniqueKey] = record;
    
                XKey<string, int> xkey = new XKey<string, int>(nonUniqueKey, uniqueKey);
                secondary[xkey] = record;
    
                i++;
            }
    
            //commit as one
            Transaction transaction = new Transaction(engine, primary, secondary);
            transaction.Commit();
    
            //search by some nonunique key (ranged search by the first subkey)
            int id = 5;
            string key = id.ToString();
    
            XKey<string, int> fromKey = new XKey<string, int>(key, Int32.MinValue);
            XKey<string, int> toKey = new XKey<string, int>(key, Int32.MaxValue);
    
            int foundedRecords = 0;
            foreach (var row in secondary.Forward(fromKey, toKey))
            {
                Console.WriteLine(row);
                foundedRecords++;
            }
            Console.WriteLine(foundedRecords);
    
            //search by an unique key
            var rec = primary[7];
        }
    
    The GetRecords() generator:
        public IEnumerable<Record> GetRecords()
        {
            for (int i = 0; i < 100000; i++)
            {
                Record record = new Record();
    
                int k = i % 100;
                record.Field1 = k;
                record.Field2 = k.ToString();
    
                yield return record;
            }
        }
    
    The Record class:
        public class Record
        {
            public int Field1 { get; set; }
            public string Field2 { get; set; }
        }
    

  7. #7
    Junior Member
    Join Date
    Feb 2012
    Posts
    15

    Default

    search by an unique key
    var rec = primary[7];

    Will throw an exception if record not found. What is the optimal way to perform the lookup TryGet or Forward()

  8. #8

    Default

    TryGet is for single search, Forward is for ranged one.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
2002 - 2014 STS Soft SC. All Rights reserved.
STSdb, Waterfall Tree and WTree are registered trademarks of STS Soft SC.