Scalable network/disk IO server - Linux best practices

Started by
26 comments, last by SimonForsman 14 years, 5 months ago
The way async IO should work, in my opinion, is something roughly like this:

(one or two kernel worker threads should be fine for this)
0. pull a request from a queue where user threads post them
1. convert offset/size to a list of pages
2. iterate over the list, sort the pages into two buckets
- page is in the cache or
- page is not in the cache
3. start feeding requests to the disk controller, according to whatever scheduling policies, elevator algorithms, hardware limits, and whatever are in effect
4. for each page that was in the cache, either change the page mapping and set the copy-on-write bit, or do a memcpy, or whatever... thus get data that is in the buffers to the user
5. if any of these in-cache pages says "fault" when we look at it (it could have been discarded since we looked at it last!), ignore the fault, and add the page to the list of pages that have to be read from disk
6. feed more requests to the controller
7. repeat 6 until nothing is left
8. when the hardware signals completion of the last request, check that both lists are empty (just to be sure), and set an event or send a signal, whichever

Maybe I'm being grossly unsapient there, but this cannot be so darn hard? I mean, I'm sure it is hard to get right, like many things... but not much harder than what you have to do for only reading from disk already? :-)
Advertisement
Quote:this cannot be so darn hard?


Changing the technology is not that hard. As always when something is hard, it's changing people that matters.

Another example: With BeOS, we had a file system that supported arbitrary file attributes. MIME type was an attribute. "mail:subject" was an attribute. "image:comment" was an attribute. In fact, you could add whatever attributes you wanted. At the same time, the file system supported indexing attributes. It also supported a simple query language. The kernel would know when file attributes changed, and could update the results of a query in real time.

The end result of this was a kind of file manager window that was "live." You'd enter a query (like "size > 10M and modtime < now - 10 minutes") and you'd have a live view into the state of the file system. Once applications started building on top of this, it was a real enabler. My e-mail inbox consisted of the live query "type == 'email' and mail:status == 'unread'." I could find all documents anywhere on the disk, without having to go dig. And it was all as instant as a database query. This was a real enabling technology.

However, ten or more years later, neither Linux nor Windows has gained these capabilities. Windows threatened to for a while, with WinFS, but I think they tried to over-design and over-reach there, and ended up getting nothing instead. The biggest file system invention for Linux is that the biggest supported single device is now over a petabyte. Whoop-de-doo!

If the file system and kernel people were passionate about user interface and end user experience, perhaps we'd see more of the innovative, transformative changes like these (including a truly efficient, responsive asynchronous I/O interface), but that simply doesn't seem to be the case.

Sorry that this thread turned more into a rant on how hard it is to innovate in an entrenched culture than a list of solutions for efficient file/network I/O :-)
enum Bool { True, False, FileNotFound };
Quote:Original post by hplus0603
Quote:there are really good printer drivers for Linux, if you use hardware from a decent vendor


Where "decent vendor" is defined as "a vendor with Linux drivers"? Most consumers tend to define "decent vendor" as "the vendor that gives me the best output for the cheapest price."


Actually i just didn't want to name the vendor i had in mind.
The same "decent vendor" also has Vista/Win7 drivers for some of the printers that they stopped selling in the late 1980s/early 1990s (Over 20 years ago).
[size="1"]I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

This topic is closed to new replies.

Advertisement