• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
spek

Dealing with corrupted files in Linux

14 posts in this topic

Hey,
 
We have a Linux ARM device that can shut off any moment (power loss). No big deal, except that there is chance that the device was storing statistics in a file. Even though the chance is minimal, and the device will still be powered for ~1 second to finish things, I'd like to make sure the (Qt C++) code can deal with corrupted files.
 
For the info:
- Application gets forced to quit (so the Qt "aboutToQuit" events and such aren't called!)
- File system is stored on a SD card
- Running Linux Ubuntu
- I'm storing a few small text files once in a while for statistics or settings.
- The files are loaded at start-up, and written every X minutes or on a certain event.
 
 
First of all, what exactly happens if my program gets forced to quit (by the OS), during a filewrite operation (say a loop that writex string lines)? I assume the file is just incomplete then, but can it also get corrupted by this?
 
Second, if a file gets corrupted due a power loss, does it only affect that single file, or could there be more problems? I guess not, but when
reading about the topic they sometimes mention the whole File System getting corrupted. That sounds serious.
 
Third, in case the file gets corrupted somehow, what will happen if I try to open it next time?
(file->open(QIODevice::ReadOnly)
I want to test somehow if the file can be opened, and if its complete (maybe by putting a "EOF" as a last line).
 
Last, in case I detect a bad file, I could jump back to an earlier backup. But can I also delete the corrupted file, or overwrite it with the backup?
 
 
Regards,
Rick
 
0

Share this post


Link to post
Share on other sites

It depends on what filesystem you use.

 

ext3/4 only does journaling on meta data by default (Which protects against filesystem corruption but not against data corruption) but they can be made to do data journaling as well if you mount them with data=journal, this causes them to write all data completely to the log before writing them to the actual file, if the system crashes/loses power/whatever while the data is written the system will complete the write when it restarts, if it crashes while the log is being written the corrupt log entry is simply ignored (leaving you with the data as it was before the write started).

 

ZFS and BTRFS should do data journaling by default. (BTRFS is still fairly immature though and hasn't really been tested that much, ZFS is well tested in the field but its license isn't GPL compatible. You need a userspace driver to use ZFS in Linux which is a bit slow and you can't run the OS itself from a ZFS partition. (If you wish to use ZFS i'd recommend using some BSD or Solaris based OS instead)

 

Your files cannot be corrupted if the OS kills the application (the actual filewrite is done by the OS, your application just tells the OS what it wants written). if you are issuing multiple write requests(in a loop for example) and the OS kills your application(Why would it do that ?) then you might end up with an incomplete file.

 

Just remember that data journaling adds quite a bit of overhead on all write operations.

Edited by SimonForsman
0

Share this post


Link to post
Share on other sites

Sounds good, though I don't know which system is being used by this device. My instinct sais journaling isn't enabled, as the manuals didn't mention though they did warn me about being careful with writing files. But I could be wrong. Is there some way I can check what kind of filesystem is used, and if this option is enabled?  (I'm a Linux noob)

 

In case I can't use journaling, what would be the answers on the same questions then?

 

Thanks,

Rick

0

Share this post


Link to post
Share on other sites

Normally rewriting the contents of a file is a no-no. If it's a small text file (as you say), then you should atomically replace it with a newer one.

What I normally do is:

 

* Create a new temporary file in the same directory (this means it will probably be on the same filesystem)

* Rewrite the contents of the new (temporary file)

* Call fdatasync on the new file

* rename the new file over the previous file - this is an atomic operation

 

In particular, if your program crashes (and the OS continues to run), at no point will an incomplete or broken file exist.

 

Unfortunately, this might not be enough for the case where the system loses power. Calling fdatasync on the new file is probably a good idea, but still might not be sufficient. It is unclear to me whether there is still some case where the rename() will succeed, and result in the new file renamed over the old file, but with its contents not fully written.

 

In any case, you're dependent on the underlying hardware not lying to the OS about whether data has been written persistently.

2

Share this post


Link to post
Share on other sites

you can type mount in the console to see what partitions you got mounted, what filesystems they use and what options are enabled.

 

Allthough it isn't very likely that data journaling is enabled on a desktop distribution such as Ubuntu. (There is a quite significant performance penalty for it, it most likely uses ext3 or ext4 with meta data journaling only).

 

 

If you can't use data journaling your best bet is to rotate your files so that you have something to fall back on in case a file gets corrupt.

1

Share this post


Link to post
Share on other sites

Typing mount tells me the type is "ext4". There is nothing mentioned about journaling though. Maybe I can enable this option, then again I doubt if I should if the manufracturer didn't do it.

 

What I made now is an auto back-up at startup; if the stored file is newer than the backed up version, the program will overwrite the backup with the newer version. That doesn't fix any problems yet, but at least we have some data in case bad things happen. Eventually a restore-script could be made in case things get really nasty.

 

What we could do is trying to open the most recent file. If it fails for whatever reason, it should open the backup instead, and delete the corrupted file. You would lose your latest changes, but so be it then. I think it rarely happens in practice, as the file-writes happen on a low frequency or when explicitely storing something on the device (which is not the typical moment to shut down the power at the same time).

0

Share this post


Link to post
Share on other sites

First of all, what exactly happens if my program gets forced to quit (by the OS), during a filewrite operation (say a loop that writex string lines)? I assume the file is just incomplete then, but can it also get corrupted by this?

 

You would get a truncated file.  Unless you're explicitly opening for rewrite and positioning the file pointer yourself, your file is getting overwritten, not rewritten, so it will ne get corrupted.

 

Second, if a file gets corrupted due a power loss, does it only affect that single file, or could there be more problems? I guess not, but when
reading about the topic they sometimes mention the whole File System getting corrupted. That sounds serious.

 

Non-journalling filesystems can get their metadata corrupted by an unexpected power loss.  That's exactly why journalled filesystems were introduced. In all the years I've used Linux with ext3 (and more recently ext4) filesystems I've never seen metadata corrupted by sudden and unexpected power loss.  The journalling uses a commit/rollback semantic.

 

Third, in case the file gets corrupted somehow, what will happen if I try to open it next time?
(file->open(QIODevice::ReadOnly)
I want to test somehow if the file can be opened, and if its complete (maybe by putting a "EOF" as a last line).

 

If your data is corrupt, you'll be able to open and read the file (because the metadata is OK), but it may contain unexpected or missing data.  If the metadata is corrupt, you will probably not be able to find your file to open it.

 

If you really do have about a second of frantic saving before the power rails are pulled, it's likely that the OS will be able to flush any outstanding buffers.  If you are using a journalled file system (without data journalling) the worst case scenario is that your app did not send some buffers to the OS and you have some truncated files.

 

Remember, Linux is used on a majority of the systems running the internet and are responsible for trillions of dollars in data.  Easily millions of dollars have been spent over the years making data integrity extremely robust.  That's not to say you can't do an end-run:  most people use FAT filesystems on SD cards, where all integrity bets are off.  You might want to double-check what you're using on the SD card.

1

Share this post


Link to post
Share on other sites

The flash card is in the FAT format. Journalling seems to be turned off, though I'm not sure. When typing "mount" in the terminal, journalling doesn't get mentioned. What I do get is something like this:

/dev/root on / type ext4 (ro, relatime.barrier = 1, data=ordered)

 

Well, the files aren't really critical just as long the device is still able to fall back on a back-up somehow, and doesn't lock itself, I'd be happy.

 

 

About using "fsync(data)" and performing an atomic rename, some questions.

If I understand it correctly, "file.write( bla ...)" may not directly write (to save lots of individual write cycles). It has to be forced with fsync / fsyncdata. However, note that I'm using the Qt QFile. Is the syncing strategy any different then?

 

As for the rename, Qt has a "QFile::rename", but I'm not sure if that is an atomic operation, and it doesn't seem to replace the previous older file either,

0

Share this post


Link to post
Share on other sites
Have you considered using some kind of database engine for storing your statistics instead? Atomicity is one of the problems that database engines usually solve. Even SQLite should do.

http://www.sqlite.org/transactional.html

If it is important that the statitics don't get corrupted, you should consider this.
1

Share this post


Link to post
Share on other sites

Well it are only a few statistics (its a vehicle, stuff like hours and distance are stored), so that might be a bit of overkill. And if the files do get corrupted, it's not the end of the world. At least not if the program just keeps working and can store files again. But thanks nevertheless for the hint!

0

Share this post


Link to post
Share on other sites

Well it are only a few statistics (its a vehicle, stuff like hours and distance are stored), so that might be a bit of overkill. And if the files do get corrupted, it's not the end of the world. At least not if the program just keeps working and can store files again. But thanks nevertheless for the hint!

 

markr has the correct solution above.  

 

You should always follow this pattern. It doesn't matter if you are writing to disk or making an assignment operator in code. It does not matter if the code is insignificant temporary code that will probably go away.  It doesn't matter if the pattern feels like overkill. The pattern applies to copies of everything, not just files on disk. It is the pattern to follow.

 

Get into the habit of whenever you are replacing something, first make a temporary object that has all the properties you need, and after it is successfully created swap the temporary for the old one.

1

Share this post


Link to post
Share on other sites

@spec:

 

There is no API in POSIX (or anywhere else, probably) to atomically replace the contents of a file. This is in contrast to, say, SQL where we can do an UPDATE statement and reasonably expect it to be atomic.

 

If I replace the contents of a text file, this normally involves (at least) three operations: open, at least one write and close. If the open mode is for writing and truncating, the file is truncated immediately after the open. If a program crashes between the open and the write, then we end up with an empty file.

 

This is why we use the rename() trick - it means we can atomically replace the file.

 

The reason for the fsync is different however. Normally writes to the filesystem - data and metadata - are not done immediately. Instead, the kernel waits a short time before completing the write (maybe about 30 seconds on a desktop-tuned otherwise idle system). This is either because the IO hardware is still busy from another operation, or to optimise several writes into one operation (for example, in some cases a write never needs to be done at all because it is subsequently overwritten or the file is deleted while still in memory). Either way it is an optimisation the kernel makes to allow a process to continue to run while its disc writing is completed in the background.

 

Neither write() nor close() normally flushes the contents of the file to disc. It will simply be in (volatile) RAM, and if the power is lost at this point, the filesystem will probably look on reboot as if the write was never done. Unless you're really unlucky, then power will fail part way through the file creation and you may end up with either a partially written file or an empty file.

 

The fsync() is done so that the data are *definitely* in the file on the disc before it's renamed, so that we can guarantee we don't end up with an as-yet-unwritten file atomically renamed over the previous one, resuting in an empty file. fsync blocks the calling process until the file is actually on permanent storage (assuming no part of the IO subsystem "lies")

 

I suspect that using QFile makes no difference to this, as it's just a wrapper for the POSIX IO functions. The QT rename function is also likely to be a wrapper for the standard rename function, hence also atomic.

 

---
 

As others have suggested, you could use sqlite, which jumps through some pretty complicated hoops to ensure that transactions are atomic (i.e. are rolled back completely after a power failure etc). But it may be overkill.

1

Share this post


Link to post
Share on other sites

Agreed with you guys that making a temporary file first is the smart & clean way. Not the first time that files got crushed because of some bug in an updated version of the software!

 

I'm writing everything in a "temp" file first now. Not 100% sure if this is the recommended way in Qt, but to force the filewrite I have:

QFile f( "somelocation/myfile.dat.temp" );
f.write( monkeybusiness );
 
fsync( f.handle() );
f.close();

 

But I'm still a bit concerned about the rename. The only way I know (in Qt) is:

QFile::delete( targetFileName ); // remove previous, otherwise the rename doesn't work
QFile::rename( tempFileName, targetFileName );

But... what if stuff stops between the delete & rename? I could load the temp file in case the "targetFileName" doesn't exist (meaning the original file get deleted, but the temp didn't get renamed yet). But maybe there is a cleaner way.

 

 

 

By the way, as for power loss, notice that the hardware still remains on for a short moment. In practice my application gets forced to quit BEFORE the hardware actually shuts down. A more realistic problem is that the app closes between operations. That may also mean a file does not get closed properly. Could that cause problems after a restart?

 

Cheers

0

Share this post


Link to post
Share on other sites

There is a special filesystem for SD-Cards available with linux. Because writing to a SD-Card is very slow compared to a rotating hard-disk the chance of incomplete IO operation on such a device is higher. Writing the new content to a tmp file and than renaming it is a quite good idea. Makeing a backup from the previous version as well. This is as secure as journaling file-systems but even work with FAT or any other non journaling file-system.

Write a checksum at the end of your file. Only if this checksum (SHA CRC or whatever you think is suitable)  is valid the file has been completly written to disk.

0

Share this post


Link to post
Share on other sites

@spek:

 

You don't need to (indeed, you should not) delete the old file when renaming.

 

The rename() call guarantees to atomically replace an existing file. That is its posix behaviour; I doubt very much that the Qt wrapper affects that.

 

If we do delete() then rename, then you're right, there is a window of time when the file doesn't exist (which can potentially cause data loss)

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0