Results 1 to 8 of 8

Thread: When objects are inserted into a table, when are they serialized?

  1. #1
    Junior Member
    Join Date
    Nov 2014
    Posts
    21

    Default When objects are inserted into a table, when are they serialized?

    I'm opening the following table:
    var table = engine.OpenXTable<Key, byte[]>("table");

    where Key is a class with a few fields (not important for the question) and the byte array contains the data I want to store.

    Later, the code inserts a byte array into the table:
    table[key] = b;

    And still later, it commits the insertions:
    engine.Commit();

    When is the inserted object (the byte array) serialized? Does that occur at the time of the insertion operation or at the commit?

    I ask because my code inserts a long stream of byte arrays and it would be preferable to reuse them. (They're buffers used when reading data from a socket or file.)

  2. #2

    Default

    Greatings.

    The database keeps records in the internal cache, and when the cache is full, it begins serialization. You have no control when this will happen - it may happen after inserting N number of records, or it may happen after inserting just one.

    You must not modify an existing record after it is inserted. This is because you don't know if the record is still in memory, or already serialized. If you modify it, you may provoke unexpected behaviour from the database.

    After Commit() completes, it is guaranteed that all data is stored.

    You can see this thread for more clarification on Commit: Recommend Commit Time.

  3. #3
    Junior Member
    Join Date
    Nov 2014
    Posts
    21

    Default

    [Added a bit later: I found the 'High Memory Usage During Insert and Lookup' post in the advanced topics and am looking at whether part of my problem is due to my not reducing the cache size.... Still later: Yes it was. Changing cache settings helped a lot, and STSdb is usable in my application. However, I'd still like to ask the question below, which is 'would it be reasonably straightforward to add a callback for objects that have been committed and released from the cache or do user objects get released in so many different places in the code that a callback would be impractical to add?' I've left the question below unchanged.]

    Thank you for your clear reply. Your approach of holding onto user objects in the cache and serializing later makes complete sense for STSdb. I was being dumb. I had just come from benchmarking levelDB for my application and was too focused on byte arrays for everything. Doing it that way would force values read from the cache to be deserialized.

    SDSdb's strengths do fit my application well in many areas, but I think this issue means I'm unable to use it out of the box. I'm describing why I think this in case there are other users that have a similar usage pattern, and because I'm hoping to get your reaction to my option of modifying a copy of the STSdb source (or you'll tell me I'm confused and have a better suggestion).

    More detail: My application is telemetry storage and playback. Keys are small (~16 bytes) and values are limited to byte arrays that vary in size (12-65536 bytes). The application receives a very fast stream of byte arrays. A key is generated from each and then each byte array is inserted into the database. The stream of keys isn't random, but it also isn't already sorted. Concurrently with these writes, sequences of key/value pairs are read, hopefully from the cache. STSdb handles all of this very nicely.

    My problem is the performance of the .NET allocator and garbage collector when using STSdb in this way, not STSdb itself. Because STSdb holds onto the values my app is storing and doesn't notify my app when it releases them, my app's code has lost control of their lifespans.

    In all other parts of my code that receives this fast stream of byte arrays, the arrays are tracked and, when they're free, they're reused. This takes a lot of pressure off of the allocator and GC and helps keep the working set smaller. My current STSdb test doesn't do this and makes my test machine unusable by other processes. (I haven't explored methods of limiting the amount of memory available to the CLR, from outside, to try to force it to play nicely with everything else. I don't know how that would increase GC overhead or exacerbate any problems it already has with heap fragmentation.)

    An option is to have a callback that notifies the app when STSdb has released an object, either because it has been serialized and removed from the cache or it has been deleted from the cache or a subsequent write has overwritten it. Is this a plausible change for a programmer outside your project to make? Is there a better way to approach this problem?
    Last edited by mhshirley; 17.11.2014 at 02:19.

  4. #4

    Default

    Thank you for the post. We will review your case.

  5. #5
    Junior Member
    Join Date
    Nov 2014
    Posts
    21

    Default

    Quote Originally Posted by k.dimitrov View Post
    Thank you for the post. We will review your case.
    Could I solve my problem with a custom persist implementation? Is the persist code guaranteed to be called once after which STSdb releases object for GC? If so, I could return the object to its pool at that time. Deserialization is easier.

    If so, I haven't found where writing a custom persist implementation is documented. Could you please point me toward that?

  6. #6

    Default

    Greetings.

    Providing a custom persist won't solve the problem. The persist logic is used only to specify how an object of a given type is serialized/deserialized, but not when. Due to the working principles of STSdb 4.0, it is impossible to know when an object is actually serialized and released from the database cache.

    If you are searching for low memory footprint, I can suggest you to take a look at the Memory Usage section of the documentation. You can download it from here: Downloads, or you can use it online directly: Online Docs.

  7. #7
    Junior Member
    Join Date
    Nov 2014
    Posts
    21

    Default

    Hi,
    Thanks for the response. I'm trying to reduce pressure on the garbage collector, not the memory footprint itself. First, I'll try to echo back what I think you've said, then I'll describe the application a bit more. Here's my understanding of the lifecycle of an object that an application creates and stores in STSdb:


    1. App creates an object
    2. App stores it in an STSdb table and commits
    3. At commit time, an insert operation is entered into the waterfall tree and the object is entered into the cache
    4. Some time after #3, it's serialized to the persistent store
    5. At some after #3, it's release from the cache
    6. The object can be reclaimed by the garbage collector


    My app is in control of #1 and #2. I think the two actions in step #3 occur at commit time. I think that #4 and #5 both occur some indeterminate time after #3 but are unordered with respect to each other. Finally, I think that #6 is true after both #4 and #5 have completed. Do I have this right?


    I was hoping that #4 and #5 occurred together and atomically, but I've understood you to say that they don't. So, I now understand why a custom serializer won't solve the problem.


    All of this aside, I admit that I'm not certain yet that there really is a problem with using STSdb for my application. I just think that there's likely to be, and I plan to do more benchmarking. Here's more info about the app.


    The app inserts ~1000 objects per second into the database with sizes ranging from 100-65536 bytes. Currently, the average object size is ~1000 bytes


    The objects are generally write-once, read-many, delete-never.


    The database's cache is very useful, and other things the application does with the objects result in database reads in a non-random but hard-to-predict way. I expect many objects to stick around in the database cache long enough to get past garbage collection generation 0.


    The app has no hard real-time requirements, but GC pauses over about a 0.5 sec would be unacceptable. Shorter pauses are ok. Similarly, pauses in the stream of insertions due to commits also need to be similarly short.


    If there were a callback that informed the app when an object had been released, I'm pretty sure I'd use it. The app would allocate a pool of packets and reuse them. Have you all had any more thoughts about such a callback?
    Thanks,
    Mark Shirley

  8. #8

    Default

    Greetings.

    From your post it seems that you did not understand properly the lifecycle. The lifecycle of an object is the following:


    1. Appication creates objects.
    2. Application stores them in STSdb. These objects will probably be in the cache.
    3. At Commit(), all objects not yet stored to disk, are first serialized in memory, and then to the persistent store. Keep in mind that some of the objects might be already in the cache, regardless if they are already stored to disk.
    4. At some time after #3, the objects might be released from the cache - if no new objects are inserted into STSdb, the old ones will remain in the cache.
    5. If the objects are removed from the cache, they can be reclaimed by the garbage collector.


    STSdb 4.0.6 is now complied under .NET Framework 4.5. From version 4.5 there is a GCLatencyMode Enumeration for a
    djusting the time that the garbage collector intrudes in your application. Have you tried that?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
2002 - 2014 STS Soft SC. All Rights reserved.
STSdb, Waterfall Tree and WTree are registered trademarks of STS Soft SC.