Results 1 to 5 of 5

Thread: Slow read performance and other evaluating questions

  1. #1
    Junior Member
    Join Date
    Dec 2014
    Posts
    1

    Default Slow read performance and other evaluating questions

    I am evaluating STSDB. Cool project, but a few issues/questions:

    1.) I did put 1 million simple test objects in a db. Write speed is fast with about 7 seconds but getting one record out of those by key is very slow with about 800 msec ! I tried both table.Find and table[key]. Even SQL is much faster in such scenario !

    2.) How do I handle search on multiple keys ? Looks like secondary keys are no supported ? Assuming I have a customer and I want to find one either by name OR maybe by balance, I only have one key. This is NOT a comopunded key scenario, I sometimes search for name sometimes for balance NOT both. How do I handle this ?

    3.) From what I see there is no (easy) way to handle schema changes (i.e. a type changes, has one more proprty etc) I think I can do it by custom providers, just wondering if there is a built in solution.

    4.) ACID is on the roadmap, when will this be available ? What do we really have today ? Docs mention that all is done in transactions, so what part of ACID is missing ?

    5.) Hot backup is VERY Importat. When is this feature coming ?

    Thanks

    Joe

  2. #2

    Default

    Greetings.

    1) If you have read how our indexing tehcnology WaterfallTree works, you'd know that it works with operations which are gradually poured down the tree until they reach the leaves and are applied. This is one of the reasons why STSdb 4.0 achieves such indexing speeds. But there is a downside to this - sometimes the database may be in a state where not all of the operations are applied and there are still ones in the internal nodes. This may lead to slow speed when trying to do queries with the database, but this is only temporary until all of the operations are applied.

    One workaround for this is the following solution:

    using (IStorageEngine engine = STSdb.FromFile("Data.stsdb4"))
    {
        var table = engine.OpenXTable<long, string>("table1");
    
        for (int i = 0; i < 10000; i++)
            table[i] = i.ToString();
    
        foreach (var item in table)
        {
        }
    
        engine.Commit();
    }
    
    Performing read will force the database to flush all operations into the leaf nodes and to apply them, and the Commit applies all the changes to the storage.

    2) STSdb 4.0 does not support secondary indexes. There are different ways to solve your task. One of them is to use an indexing table. For example:
    var table = engine.OpenXTable<string, int>("indexTable");
    
    Your key is one of the fields you want index to (string Name for example) , and your record will be the index.

    The second table will hold your index and your actual record. In that way you will query the first table, it will give you the index and then you will find your actual record in the second table.

    For example:
    var table2 = engine.OpenXTable<int, Record>("recordTable");
    
    3) STSdb 4.0 does not support scheme changes - you can change the types of the keys and/or records, but the new types must be compatible (to have the same definition) with the old ones. If the new types have a different definition, you have to transfer the data between the old and the new table by yourself.

    4) STSdb 4.0 does not support ACID transactions. The engine method Commit() stores all the changes in all of the tables. A good explanation about commit is given here: Recommend Commit Time.

    5) Hot backup may be supported in the next major release of STSdb, but we cannot guarantee this from now.

  3. #3
    Junior Member
    Join Date
    Nov 2014
    Posts
    21

    Default

    Here's a follow up question. My application has offline and real-time insertion modes.

    The offline mode benefits from STSdb's optimization of bulk inserts of nearly sorted data, and STSdb inserts faster than other databases I've looked at.

    In testing our real-time insertion mode, however, we've seen long read times, as have been discussed in this thread and others. That is problematic for us. My question is about whether there are other ways to avoid long read times.

    Our records vary from ~50 to 3500 bytes.

    We tried insertions of 100,000,000 records, varying the frequency of commits from every 1000 record insertions to every 1,000,000 insertions. For the offline insertion application, infrequent commits work well. For the real-time mode, we tried commits every 1000 records.

    99% of commits were between a few msec to 1 second. 1% were longer, with commit times up to 20 seconds.

    In our application, commit times under a few seconds are fine, although we are concerned about the infrequent, very long delays.

    Read times are more important, however. During this test, as records were inserted, we read sequences of records randomly. (The data is a set of separate record sequences that are added to in parallel into a single table. When reading, we picked one of these sequences randomly and read all of the records that had been inserted so far.) When reading, we kept track of the delay before the first record was returned and the delay before all records had been returned. Once the first record was available, the rest were invariably returned in a small number of msec. However, the delay before the first record had a distribution similar to commit times, with a few percent of the reads having a very long delay.

    In this response:

    http://stssoft.com/forum/threads/481-Slow-read-performance-and-other-evaluating-questions?highlight=commit

    you indicate that these delays are due to the read finding large numbers of operations still held in internal tree nodes, not the leaves, and that reading the entire database will flush all operations to leaf nodes, improving subsequent read times. I can understand doing that in a benchmark but not in most real-world applications. In our test case, the database grows to 24 Gb. Reading all records to flush operations to the leaves of the tree is impractical. We don't and can't know how to read fewer records to get the same flushing to occur.

    You say that's one workaround. Are there others?

    Is it possible to configure STSdb to incrementally push operations to leaf nodes rather than leave that for a read to trigger? Or is it possible to configure STSdb to flush operations to leaf nodes on inserts so that it's more optimized for mixed inserts and reads when that's appropriate and like a waterfall tree when bulk inserts are needed?

    What circumstances would cause the amount of work done by a commit to rise to 20 seconds?
    Thanks,
    Mark Shirley
    Last edited by mhshirley; 09.11.2015 at 10:47.

  4. #4
    Junior Member
    Join Date
    Nov 2014
    Posts
    21

    Default

    Here is a scatter plot with number of records inserted on the X axis and time in msec for the first result when reading sequences of records back. What I'm interested in is the high variance. I would be happy to trade lower insertion performance for lower variance on read times.

    In this case, the records are not being inserted in random order but are closer to that than to key order.

    In the source of 4.0.9, I see a Flush() operation on XTablePortable. Is that what I'm looking for? I'll try that.
    Attached Images Attached Images

  5. #5

    Default

    When we execute read operations in WaterfallTree the records gradually moves from the internal nodes to the leaves. All changes in the W-tree nodes are copy on write. So if we want to keep the current internal tree state while the tree is changes (and have faster reads) we have to execute commits from time to time while we read...

    We suggest you to try to execute commits every couple of thousand reads. This way the current state of the tree will be saved periodically and this will gradually reduce the access time during database usage.

    Another way to improve random access is to try to reduce the size of the W-tree leaf nodes. By default W-tree compresses the records in the leaves, so too big leaves slows down the random access (because of decompression). By default engine.LEAF_NODE_MAX_RECORDS = 64 * 1024, you can try to reduce it, for example to 32 * 1024 or another value.

    If random access speed is still not enough you can try to use your own custom persist to avoid default compression of the leaves. This will avoid the need of decompression, but will increase the overall IO. Here it all depends of the philosophy of data.

    Another way to improve random access is to sacrifice some of the insert speed. You can try to increase the branches of the W-tree. By default engine.INTERNAL_NODE_MAX_BRANCHES = 5. You can try increase it. This will slow down the insert speed but will reduce the height (depth) of the W-Tree. In general the W-Tree parameters are well calibrated for small and slightly varying in size records. So it is possible here to search for improvements.

    My general advice is to try to inject some commits during reads before try the other actions. In our tests this improves random access at least 2-3 times.
    Last edited by a.todorov; 04.12.2015 at 14:51.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
2002 - 2014 STS Soft SC. All Rights reserved.
STSdb, Waterfall Tree and WTree are registered trademarks of STS Soft SC.