I am trying to improve the speed of my application. I have to handle random access to blocks of image data. What I want to do is to implement a read ahead system, where I tell the SDK to read a bigger block then I actually need, and then the next time an access occurs, there will be a good chance that I already have the block in memory. I plan on having this reading occur in a seperate thread as to not block the main application with the long read.
My belief was that there would be an improvement in performance if I request a larger area for decoding from the SDK rather than a smaller area. I believe this because whatever overhead is involved in setting up the decompression for a set of pixels will be reduced if there are less read calls. Is this true?
In the users guide it says this:
"Threadsafety
The SDK may be safely used in multithreaded applications. Locking of SDK objects is NOT provided, however; the application must guarantee each object is accessed serially within a particular thread context."
What does this mean exactlly? I may want to allocate SDK objects (MrSIDReader, LTIScene, LTISeneBuffer) in one thread, and use them in another thread. If I loosely interpret the statement above I will just need to - serial access to the object, so one thread per time will call a member function of the object, or otherwise use the object. Is it safe to deallocate these objects from another thread?
If I follow these rules, can I create a MrSID reader in one thread, and call it's read() method in another thread?
I understand the disclaimer you state, but is it possible that a certain object, like LTIScene and LTISceneBuffer may be a bit more thread safe than you let on? Would I be able to allow multiple read access to these objects? For instance, if I had one Large Block LTISceneBuffer that was used for the MrSidReader read() method in one thread, could another thread create a derived SceneBuffer from that object without worrying about locking? Imageine that a big block read() was in progress, if I keep track of the strips that are already read, Could another thread copy from the already read memory by deriving a new scene buffer from the big one, or directly indexing to the correct memory location, without worry of locking?
Multithreading in DSK 7.0
Moderator: jskiffington
6 posts
• Page 1 of 1
Hi Derrick,
I'll try to answer all your questions below, but first may I ask are you experiencing decode performance issues? Random-access, random-resolution decoding is one of MrSID's strong suits, so if it seems slower than it should be, perhaps there is another issue.
1. Will larger decodes result in better performance?
This will depend completely on your use pattern. Not usually. Unless you have a high confidence that those pixels will a) get used and b) are up next, I think your strategy will hurt performance rather than help it. I would recommend some benchmarking to prove to yourself it would be worthwhile.
With that said, yes it will take slightly longer to decode a given scene if it is broken up into two parts than it will take as one big decode.
2. What is meant by the thread safety disclaimer in the SDK documentation?
Most of the SDK objects, and especially the MrSIDImageReader, are neither reentrant nor thread-safe. This doesn't mean you can't use them in a threaded environment, it just means that you need to mutex their method calls intelligently.
2a. Is it safe to deallocate on a different thread?
Probably safe, yes. Provided that you are using synchronization to protect it from simultaneous or subsequent use by other threads.
2b. Is it safe to create on one thread, call read() on another?
Yes, providing the calls are synchronized.
3. Is it safe to read from an LTISceneBuffer while another thread is using it for a read()?
Ignoring the challenge it would be to keep track of which strip the MrSIDImageReader is on, I will admit it probably would not crash your application to read from this memory area as it was getting filled. The danger would be more that you would read uninitialized data. Again, though, I would want to verify with some benchmarking that you are going to benefit from this strategy before taking it on.
As an alternative implementation, I might suggest writing a filter that would cache the pixels from the previous read() call and offer them up if the subsequent read() call intersects it. Not sure what your requirements are, though.
Let me know if I missed any of your questions.
Kirk.
I'll try to answer all your questions below, but first may I ask are you experiencing decode performance issues? Random-access, random-resolution decoding is one of MrSID's strong suits, so if it seems slower than it should be, perhaps there is another issue.
1. Will larger decodes result in better performance?
This will depend completely on your use pattern. Not usually. Unless you have a high confidence that those pixels will a) get used and b) are up next, I think your strategy will hurt performance rather than help it. I would recommend some benchmarking to prove to yourself it would be worthwhile.
With that said, yes it will take slightly longer to decode a given scene if it is broken up into two parts than it will take as one big decode.
2. What is meant by the thread safety disclaimer in the SDK documentation?
Most of the SDK objects, and especially the MrSIDImageReader, are neither reentrant nor thread-safe. This doesn't mean you can't use them in a threaded environment, it just means that you need to mutex their method calls intelligently.
2a. Is it safe to deallocate on a different thread?
Probably safe, yes. Provided that you are using synchronization to protect it from simultaneous or subsequent use by other threads.
2b. Is it safe to create on one thread, call read() on another?
Yes, providing the calls are synchronized.
3. Is it safe to read from an LTISceneBuffer while another thread is using it for a read()?
Ignoring the challenge it would be to keep track of which strip the MrSIDImageReader is on, I will admit it probably would not crash your application to read from this memory area as it was getting filled. The danger would be more that you would read uninitialized data. Again, though, I would want to verify with some benchmarking that you are going to benefit from this strategy before taking it on.
As an alternative implementation, I might suggest writing a filter that would cache the pixels from the previous read() call and offer them up if the subsequent read() call intersects it. Not sure what your requirements are, though.
Let me know if I missed any of your questions.
Kirk.
Last edited by kirkoman on Tue Sep 29, 2009 4:07 pm, edited 1 time in total.
- kirkoman
- Posts: 11
- Joined: Thu Jan 24, 2008 2:50 pm
- Location: LizardTech
Hi,
Thank you for answering my questions.
No, I am not displeased with the speed, I am just trying to make our application as fast as possible.
I am currently doing benchmarking to see if the approach is worthwhile.
I asked the question because currently I am having trouble allocating a reader in the main thread, and then reading in a worker thread, but according to your response this should be possible. If I can not find what I am doing wrong I will see about posting the example code.
I am not sure how to get your proposed alternative implementation to succeed for what I have in mind. I am already caching pixels returned by previous reads, I have to because my requests are coming in layer by layer, and I don't want to re-read. I am looking for another boost on top of this, if it is possible.
In my requirements, I have an interactive application making relatively small pixel block requests, and I want to reduce the overhead by combining them (when able) into large pixel block requests. I don't want to buffer the small requests as they come in and make one big one in the main thread because it will reduce the interactivity of the application. This is why I have the idea to have the main thread still make the small block read requests if there is a cache miss, but then use that request as a signal to start reading a larger area in another thread. The next time the small request comes in, it may very well have been read in by the second thread. I may try this with one or two reader objects.
Thank you for answering my questions.
No, I am not displeased with the speed, I am just trying to make our application as fast as possible.
I am currently doing benchmarking to see if the approach is worthwhile.
I asked the question because currently I am having trouble allocating a reader in the main thread, and then reading in a worker thread, but according to your response this should be possible. If I can not find what I am doing wrong I will see about posting the example code.
I am not sure how to get your proposed alternative implementation to succeed for what I have in mind. I am already caching pixels returned by previous reads, I have to because my requests are coming in layer by layer, and I don't want to re-read. I am looking for another boost on top of this, if it is possible.
In my requirements, I have an interactive application making relatively small pixel block requests, and I want to reduce the overhead by combining them (when able) into large pixel block requests. I don't want to buffer the small requests as they come in and make one big one in the main thread because it will reduce the interactivity of the application. This is why I have the idea to have the main thread still make the small block read requests if there is a cache miss, but then use that request as a signal to start reading a larger area in another thread. The next time the small request comes in, it may very well have been read in by the second thread. I may try this with one or two reader objects.
- dkarimi
- Posts: 4
- Joined: Sat Sep 26, 2009 8:16 am
- Location: Norcross, GA
So your app is requesting many contiguous blocks individually and you want to consolidate them in the actual decode when possible. And decode on a thread so as not to occupy the main thread. In that case your approach may indeed improve performance. Sounds like you know what you are doing but you did trigger a few more stray thoughts:
- Note you have decoded the requested block twice if you include it in the larger block AND have the main thread decode it on a cache miss. But perhaps this will be offset by your perf gains.
- Note that on a cache miss you will have two read() calls in flight at the same time. This will certainly not work unless you are synchronizing the two calls with a mutex.
- Is your reader initialize()d? If so how is it failing to read()? With an error code or a crash?
- Will be curious about your benchmark; let us know what conclusions you reach.
Kirk.
- Note you have decoded the requested block twice if you include it in the larger block AND have the main thread decode it on a cache miss. But perhaps this will be offset by your perf gains.
- Note that on a cache miss you will have two read() calls in flight at the same time. This will certainly not work unless you are synchronizing the two calls with a mutex.
- Is your reader initialize()d? If so how is it failing to read()? With an error code or a crash?
- Will be curious about your benchmark; let us know what conclusions you reach.
Kirk.
- kirkoman
- Posts: 11
- Joined: Thu Jan 24, 2008 2:50 pm
- Location: LizardTech
With my implementation some of our non interactive applications are seeing a 15% increase in performance. In the user interactive applications, perceivable performance suffers a bit, but it is hardly noticeable. I haven't timed that yet.
My implementation used 2 MrSIDReader object's one handled the queue of big blocks in the second thread, and the other handled the strips upon an L1 and L2 cache miss. In this way I did not have to mutex read operations. I should probably try it, it might actually boost performance if I allow only one reader to access the disk at a time.
I realize that I am reading some information twice, but I sacrificed that as to not make the code that determines block geometry overly slow or complex. I could revisit this, but after watching the cache hit's and misses, I don't think this would offer much speedup. Typically only one strip was read twice.
My implementation used 2 MrSIDReader object's one handled the queue of big blocks in the second thread, and the other handled the strips upon an L1 and L2 cache miss. In this way I did not have to mutex read operations. I should probably try it, it might actually boost performance if I allow only one reader to access the disk at a time.
I realize that I am reading some information twice, but I sacrificed that as to not make the code that determines block geometry overly slow or complex. I could revisit this, but after watching the cache hit's and misses, I don't think this would offer much speedup. Typically only one strip was read twice.
- dkarimi
- Posts: 4
- Joined: Sat Sep 26, 2009 8:16 am
- Location: Norcross, GA
6 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest
- Forum index
- The team • Delete all board cookies • All times are UTC - 7 hours
