How to properly not block the application when loading

Started by
6 comments, last by JoeJ 9 months, 1 week ago

Hi everybody,
When you load naively, you block the entire application since no one update and render is called.
Even Quake 3 was not blocking the app, so there is ways to avoid that.
It's not a multi threading question since Quake 3 was doing it in single threaded.
The question is how to deal about this properly? What are the possible solution?
I would be interested to know the quake 3 method if someone knows it.
Thank you very much!

Advertisement

Don't know what quake 3 did,, but you can read data from the file system, the network, etc etc in small portions. A different strategy is not to load everything, but just enough to continue.

A bit more tricky might be to process read data in small portions, but that depends much on the data.

Aside from loading asynchronously using threads, the only thing I can think of is that you can break down the loading into smaller pieces, each of which takes less time than a frame. Using this idea, you would load things until X amount of time has passed, e.g. 33ms, then render a frame at that time. If it takes longer than X to complete loading a single item, then you can have the loader for that item call a callback function pointer periodically (e.g. to report progress), within which you can update() and render() (assuming its all happening on 1 thread).

That said, the modern approach would be to just use threads, unless you have some weird restriction to 1 thread.

The OS provides non-blocking functions.

The details depend on the system and the functions. For example, when opening a file you are probably familiar with the flags for reading and writing, but there are more. You can open for writing with truncation as a flag, wiping out the contents. You can flag it to only open an existing file and not create a new one if it didn't exist. You can flag files to be deleted on close, useful for temporary working files. And critically here: you can set a flag to specify that files should not block, often called overlapped I/O.

You can set similar flags on network sockets, if you are programming at that level.

Then many calls will return with additional return codes. You might get a return code of EWOULDBLOCK or EAGAIN or ERROR_IO_PENDING or whatever else for the system. You will need to add code to see if any data is available, and how much, but it is not that much additional work. They return immediately so your process isn't blocked, and yet still are doing the I/O in the background without you needing to create a new process or thread.

frob said:
and yet still are doing the I/O in the background without you needing to create a new process or thread.

Interesting, because i did not know, and file io is the main bottleneck for my pre-processing offline tool.

Currently i use a single thread to do async io. And it was worth to set a higher priority for the thread. Besides i have a job system using 16 threads to do all the processing at default priority.
When the job system is done with a batch of work, rendering and UI is updated from the main thread.

It all works, but i don't know if there's a better way. My CPU utilization definitively suffers from the io thread, but disk utilization is high at least.
The io thread does no other work, just reading / writing files to / from byte buffers. Compression work is done by the job system, for example.

So, do you think it might be worth to try such non-blocking OS functions instead? I assume it's no big difference under the hood, and it might have been just less work to implement. But i lack any experience.

And i have some more general questions about file io:

What is there to say about small vs. large files? I have small and many files. Less than 10MB per file, but millions of files. The alternative would require more RAM and complexity. That's hard to effort, so i did not try larger and less files.

I guess it's pointless to use multiple threads to read / write multiple files in parallel, if they are on the same disk. But if i distribute files over two disks, i assume using one io thread for each would almost double io performance?
Likely this depends on system details, like the number of ‘CPU io lanes’ (if this exists), or RAID (which i do not use).

There are a lot of “it depends” questions there.

Very small files tend to be slower because each is processed individually with overhead, but you only have a large file with a single overhead for big files. When they're coming in at 10MB that's big enough it gets questionable. Millions of files means looking up millions of file table entries.

Fastest methods depend on the system and the hardware. SSDs have different performance than HDDs, which are different again from optical media like DVDs or CDs, which are different again from tape, and all of them can have different performance than remote storage which typically has bottlenecks of transfer rates rather than access speed. Access patterns matter, since accessing sequentially will be different from jumping around. Similarly the position of the data on a disk can make a difference, on those with moving platters and heads it takes time to move around the platters, for solid state drives some data locations can be accessed in parallel while others are serial. Your actual bottlenecks depend tremendously on the hardware being used.

Details about the operating system matter because each system has different caching strategies. Even on a Windows call to CreateFile() you can get different performance depending on details like opening flags and file sharing modes. If you're on a system where either the OS or the hardware compresses the files it can result in different performance; often it means transferring less data which is faster, but some CPU and memory costs which are slower. On many older HDDs compression's reduced size often gave faster performance in spite of decompression time and space.

Details about the system calls you use matter. Some are capable of scheduling transfers in more intelligent ways, and using gather/scatter techniques that best fit the underlying hardware. If you have RAID storage (multiple disks that each have copies of the data) some system calls can read from all the disks in parallel for better performance. I already mentioned drives having the capacity for reading from multiple positions on the media in parallel, but it bears mentioning again. Also on old HDD systems the techniques would often start reading immediately with whatever was closest to the read head, then fill in gaps and irregularities on subsequent passes. On some systems using memory mapping to map a region of a file into memory can be much faster because of how hardware access patterns work, but with other media can be much slower since the entire range must be loaded before it's usable.

Details about your usage pattern matter, along with the API's you're using to access it. Memory mapping a large file very often gives great performance due to low overhead, especially if the data doesn't need to be parsed and can be used directly in place; just point to the spot in memory and use the structures in place, but on certain conditions and hardware it can incur a large performance penalty for cache management. You also avoid the overhead of many read calls. Similarly if you're processing data in a stream, operating in large blocks to read large buffers is likely faster than thousands or millions of tiny calls reading individually, all that overhead adds up. in addition to the ways the system's disk cache works. If your usage pattern requires parsing the data that's more work than if your usage pattern has data in the final form not needing any processing.

Details about flags you pass to the system when working with them can make a difference. You can use system flags to disable buffering, flags to prefer backup semantics or shadow copies, flags to indicate overlapped I/O, flags indicating you're planning to read sequentially from beginning to end versus random access, flags to modify write-through caching, and more. The operating system can do things differently depending on each, optionally take advantage of special hardware features like RAID reads, and more.

There are many huge books on the topic, almost 70 years of research trying many options to get better performance, and performance characteristics change every few years as new hardware and software techniques come around. There have been many times where new technology gets the best performance by pulling back in usage patterns from 30+ years before, other times where they need to start from scratch.

It's an enormous topic.

If you're looking for a few search terms to get started with “overlapped I/O” or “asynchronous I/O” is one major area to search for in your studies, and “memory mapped files” are another where chunks of files are loaded into memory in bulk. Both options are likely to give different performance profiles than what you've got now, since both can do a lot of work in the background while you do other work while waiting for pending I/O to complete. There are plenty of tools that queue up a ton of I/O work and keep the disk queue filled, working on whatever blocks of data are available whenever they come in.

frob said:
It's an enormous topic.

I see. thanks for the oversight! will copy paste for later reference…

On this:

frob said:
Millions of files means looking up millions of file table entries.

I guess it might help if i use sub directories, putting just 100 files in at most. Maybe it helps the file system.

This topic is closed to new replies.

Advertisement