Asynchron file loading aka. streaming

Started by
21 comments, last by Katie 14 years, 3 months ago
Hello Folks, I am currently about to implement asynchron file loading for my resource manager on a Mac. I simply want to implement it using boost:threads. I heard that if you stream from HDDs its not that much of a difference to ordinary file loading but still I like the concept and want to give it a try. My first idea was to make one extra thread for my fileManager that loads all the requested things ordered by for example priority levels. On the other hand I head the idea to load multiple files simultaniously and give each loading queue its own thread. (maybe you could limit the maximum amount of threads to a reasonable count) I tend to go with the first version for sake of simplicity. What way would you go? Is there maybe something alot simpler? Thanks!
Advertisement
There is a much easier way to do this. Introducing threading to your system adds a ton of complexity.

Even if you do decide to use threading in your app, the method you described is still not ideal.

The OS provides asynchronous IO functions. Use them. The OS is (potentially) able to do smart things such as reordering the disk reads to reduce load times, chose smarter buffer settings, and otherwise make it faster. You can have multiple requests at the same time, and the OS can (potentially) reorder them to run faster than running the same requests sequentially.

Google shows a few tutorials on AIO, including this one that looks pretty good.
While I wouldn't have said it was simpler, and I'm assuming the Mac has something very much like it, the 'best' solution is probably to go with low level file access.

On Win32 its possible to open a file with the Win32 file I/O functions, find out its size, then tell the IO subsystem to load that data into a chunk of memory in an async manner. The file IO is queued by the file system and you can then sleep the loading thread until the data is loaded at which point you can wake up and deal with the loaded file as required.

I would assume the Mac's native file IO system has something very much like it. I also recall someone mentioning boost's ASIO library as a possible solution for this however I've not looked into myself.

The advantage of this method is that you hand off worrying about file IO timing etc to the OS, you also don't end up busy waiting a CPU core while the data loads. the IO subsystem can also queue up multiple files to load so it's just a matter of fire off the request and wait until your data turns up.
okay thanks guys I will look into that.- I was just hoping if I pulled off something simple myself I could use it cross plattform.
Well, as I said, Boost::ASIO might be able to help you out and would be crossplatform, but other than that you have to hit the OS at the low level as most languages have no concept of async IO; heck last I checked it was a bit of a pain to do in .Net and required unsafe code to do so.
To some extent, I'd second what Frob said. To another extent I can't.

Sure, the OS is able to do smart things about the file reads, but that is under the assumption that you don't have limits on what you are reading, and that you are willing to abide by the limits of the async IO calls on the OS.

Consider you trying to stream data behind a movie. It is important that the video get its X Mb/s of data to keep the video running. Whatever is happening in the background, who cares.

I prefer a setup like the following:
*Every read is issued as a "job" with a size, priority, bandwidth. I read files into buffers (or at least partially), and parse buffers, instead of calling
read( &header );
read( &size );
read( &color );

*Jobs are sorted by "credits". Being serviced removes all credits. Every loop through the servicing thread, all jobs get "priority" number of credits.

*Each loop through the service thread, X jobs are serviced (one for each AIO slot I reserve that is free). Servicing a job means issuing a read of at least MIN size (ie 64K), up to bandwidth*time_from_last_service+fudgefactor bytes.
The service thread then waits for one of the AIO calls to return from its callback, then loops again.

This insures that each file that needs X bandwidth gets close to that much data throughput, the X AIO calls give the OS a way to schedule disk reads efficiently, and the priority credit system insures that all files get service eventually. Breaking up each large read into several small ones of "bandwidth" based size keeps one file read from stalling out service to other file reads. Overall the system seems to provide stable throughput, and behaves better than just issuing random read calls from random subsystems. Since each read has some metadata about how important the read call is. It keeps item A from stalling out item B, which was the main concern when writing said system in the first place. Since the motivation came from issues seen in another project, where background streaming would make foreground streaming choppy (level load behind a movie making the movie choppy)

The hard part about all of this is syncing everything without introducing too much overhead or complexity. Because now your main thread needs to ignore-till-callback or poll each frame to see if the data is loaded.
boost::asio won't work without significant work on your part. It works beautifully for windows disk i/o, and for cross-platform network I/O, but for non-windows disk i/o it just isn't supported. If you want it to work you have to implement a class conforming to the RandomAccessHandle Boost.Asio concept whose interfaces and methods delegate to the underlying OS aio api, which frob linked to some tutorials above. It's not very easy though and requires a lot of work.
Thanks, I think I will look into that aio stuff, I also found another good link I'd like to share: LINK
I've used posix aio on linux in the past and it contains a subtle but annoying suckage that you need to pay attention to at initial design time.

Works like this -- you send a request and tag it with a callback function which comes back to tell you that your request is complete. Peachy.

The obvious way of OOing the interface is to put an interface pointer into the aiocb structure, and have the callback thunk through to the interface and dispatch the call to the object to say it's task is now done.

So you might have terrain zone objects created when a player is nearby (and potentially might enter them), which schedule loading their textures on creation, and get prodded when the data has appeared by the callback. If the work goes away; for example if the player takes a turn that means you no longer need a zone then the obvious thing to do is delete the terrain block. Which in turn, either cancels the AIO task if it hasn't completed or deletes the texture.

And this works. *Almost* all of the time.

The problem is that AIO tasks can become uncancellable -- call cancel and it returns NOTCANCELLED. When it it's in that state, the callback WILL get called at some point in the future and cannot be changed... and therefore you can now no longer safely delete anything that that callback will refer to.

In the end, we had to fix this by making queues of control objects, and tagging them to say whether their data was actually needed or not, to handle the spurious complete notifications from cancels which didn't work.

The reason for this is that underneath, the AIO system is effectively just a worker thread pool doing regular IO[1]. The task becomes uncancellable when one of the threads is actually blocked doing the IO.

So you need to make sure you handle that special case and be aware that a task whose attempted cancellation fails will still call the callback function.


[1] On Linux 2.6 there are just kernel threads doing the work for you -- you can see them on process listings.
Hey thanks for the infos! AIO overall sounds pretty useful and I will definately give it a try in the future.-

Anyways, what is the biggest drawback of threading things on your own appart from the low level optimizations AIO will give me?

For instance what will happen if I load quite a big file in its own thread, will it noticeable slow down the Framerate if I dont load it in chunks?

This might be stupid questions but I never seriously used threading and I would like to understand things a little better!

Thanks so far, that cleared up alot of things allready!

This topic is closed to new replies.

Advertisement