Jump to content

  • Log In with Google      Sign In   
  • Create Account

Dealing with corrupted files in Linux


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
14 replies to this topic

#1 spek   Prime Members   -  Reputation: 996

Like
0Likes
Like

Posted 01 July 2013 - 02:49 AM

Hey,
 
We have a Linux ARM device that can shut off any moment (power loss). No big deal, except that there is chance that the device was storing statistics in a file. Even though the chance is minimal, and the device will still be powered for ~1 second to finish things, I'd like to make sure the (Qt C++) code can deal with corrupted files.
 
For the info:
- Application gets forced to quit (so the Qt "aboutToQuit" events and such aren't called!)
- File system is stored on a SD card
- Running Linux Ubuntu
- I'm storing a few small text files once in a while for statistics or settings.
- The files are loaded at start-up, and written every X minutes or on a certain event.
 
 
First of all, what exactly happens if my program gets forced to quit (by the OS), during a filewrite operation (say a loop that writex string lines)? I assume the file is just incomplete then, but can it also get corrupted by this?
 
Second, if a file gets corrupted due a power loss, does it only affect that single file, or could there be more problems? I guess not, but when
reading about the topic they sometimes mention the whole File System getting corrupted. That sounds serious.
 
Third, in case the file gets corrupted somehow, what will happen if I try to open it next time?
(file->open(QIODevice::ReadOnly)
I want to test somehow if the file can be opened, and if its complete (maybe by putting a "EOF" as a last line).
 
Last, in case I detect a bad file, I could jump back to an earlier backup. But can I also delete the corrupted file, or overwrite it with the backup?
 
 
Regards,
Rick
 


Sponsor:

#2 SimonForsman   Crossbones+   -  Reputation: 6109

Like
0Likes
Like

Posted 01 July 2013 - 03:52 AM

It depends on what filesystem you use.

 

ext3/4 only does journaling on meta data by default (Which protects against filesystem corruption but not against data corruption) but they can be made to do data journaling as well if you mount them with data=journal, this causes them to write all data completely to the log before writing them to the actual file, if the system crashes/loses power/whatever while the data is written the system will complete the write when it restarts, if it crashes while the log is being written the corrupt log entry is simply ignored (leaving you with the data as it was before the write started).

 

ZFS and BTRFS should do data journaling by default. (BTRFS is still fairly immature though and hasn't really been tested that much, ZFS is well tested in the field but its license isn't GPL compatible. You need a userspace driver to use ZFS in Linux which is a bit slow and you can't run the OS itself from a ZFS partition. (If you wish to use ZFS i'd recommend using some BSD or Solaris based OS instead)

 

Your files cannot be corrupted if the OS kills the application (the actual filewrite is done by the OS, your application just tells the OS what it wants written). if you are issuing multiple write requests(in a loop for example) and the OS kills your application(Why would it do that ?) then you might end up with an incomplete file.

 

Just remember that data journaling adds quite a bit of overhead on all write operations.


Edited by SimonForsman, 01 July 2013 - 03:56 AM.

I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

#3 spek   Prime Members   -  Reputation: 996

Like
0Likes
Like

Posted 01 July 2013 - 03:58 AM

Sounds good, though I don't know which system is being used by this device. My instinct sais journaling isn't enabled, as the manuals didn't mention though they did warn me about being careful with writing files. But I could be wrong. Is there some way I can check what kind of filesystem is used, and if this option is enabled?  (I'm a Linux noob)

 

In case I can't use journaling, what would be the answers on the same questions then?

 

Thanks,

Rick



#4 markr   Crossbones+   -  Reputation: 1653

Like
2Likes
Like

Posted 01 July 2013 - 04:09 AM

Normally rewriting the contents of a file is a no-no. If it's a small text file (as you say), then you should atomically replace it with a newer one.

What I normally do is:

 

* Create a new temporary file in the same directory (this means it will probably be on the same filesystem)

* Rewrite the contents of the new (temporary file)

* Call fdatasync on the new file

* rename the new file over the previous file - this is an atomic operation

 

In particular, if your program crashes (and the OS continues to run), at no point will an incomplete or broken file exist.

 

Unfortunately, this might not be enough for the case where the system loses power. Calling fdatasync on the new file is probably a good idea, but still might not be sufficient. It is unclear to me whether there is still some case where the rename() will succeed, and result in the new file renamed over the old file, but with its contents not fully written.

 

In any case, you're dependent on the underlying hardware not lying to the OS about whether data has been written persistently.



#5 SimonForsman   Crossbones+   -  Reputation: 6109

Like
1Likes
Like

Posted 01 July 2013 - 04:11 AM

you can type mount in the console to see what partitions you got mounted, what filesystems they use and what options are enabled.

 

Allthough it isn't very likely that data journaling is enabled on a desktop distribution such as Ubuntu. (There is a quite significant performance penalty for it, it most likely uses ext3 or ext4 with meta data journaling only).

 

 

If you can't use data journaling your best bet is to rotate your files so that you have something to fall back on in case a file gets corrupt.


I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

#6 spek   Prime Members   -  Reputation: 996

Like
0Likes
Like

Posted 01 July 2013 - 05:23 AM

Typing mount tells me the type is "ext4". There is nothing mentioned about journaling though. Maybe I can enable this option, then again I doubt if I should if the manufracturer didn't do it.

 

What I made now is an auto back-up at startup; if the stored file is newer than the backed up version, the program will overwrite the backup with the newer version. That doesn't fix any problems yet, but at least we have some data in case bad things happen. Eventually a restore-script could be made in case things get really nasty.

 

What we could do is trying to open the most recent file. If it fails for whatever reason, it should open the backup instead, and delete the corrupted file. You would lose your latest changes, but so be it then. I think it rarely happens in practice, as the file-writes happen on a low frequency or when explicitely storing something on the device (which is not the typical moment to shut down the power at the same time).



#7 Bregma   Crossbones+   -  Reputation: 5133

Like
1Likes
Like

Posted 01 July 2013 - 06:44 AM

First of all, what exactly happens if my program gets forced to quit (by the OS), during a filewrite operation (say a loop that writex string lines)? I assume the file is just incomplete then, but can it also get corrupted by this?

 

You would get a truncated file.  Unless you're explicitly opening for rewrite and positioning the file pointer yourself, your file is getting overwritten, not rewritten, so it will ne get corrupted.

 

Second, if a file gets corrupted due a power loss, does it only affect that single file, or could there be more problems? I guess not, but when
reading about the topic they sometimes mention the whole File System getting corrupted. That sounds serious.

 

Non-journalling filesystems can get their metadata corrupted by an unexpected power loss.  That's exactly why journalled filesystems were introduced. In all the years I've used Linux with ext3 (and more recently ext4) filesystems I've never seen metadata corrupted by sudden and unexpected power loss.  The journalling uses a commit/rollback semantic.

 

Third, in case the file gets corrupted somehow, what will happen if I try to open it next time?
(file->open(QIODevice::ReadOnly)
I want to test somehow if the file can be opened, and if its complete (maybe by putting a "EOF" as a last line).

 

If your data is corrupt, you'll be able to open and read the file (because the metadata is OK), but it may contain unexpected or missing data.  If the metadata is corrupt, you will probably not be able to find your file to open it.

 

If you really do have about a second of frantic saving before the power rails are pulled, it's likely that the OS will be able to flush any outstanding buffers.  If you are using a journalled file system (without data journalling) the worst case scenario is that your app did not send some buffers to the OS and you have some truncated files.

 

Remember, Linux is used on a majority of the systems running the internet and are responsible for trillions of dollars in data.  Easily millions of dollars have been spent over the years making data integrity extremely robust.  That's not to say you can't do an end-run:  most people use FAT filesystems on SD cards, where all integrity bets are off.  You might want to double-check what you're using on the SD card.


Stephen M. Webb
Professional Free Software Developer

#8 spek   Prime Members   -  Reputation: 996

Like
0Likes
Like

Posted 01 July 2013 - 07:07 AM

The flash card is in the FAT format. Journalling seems to be turned off, though I'm not sure. When typing "mount" in the terminal, journalling doesn't get mentioned. What I do get is something like this:

/dev/root on / type ext4 (ro, relatime.barrier = 1, data=ordered)

 

Well, the files aren't really critical just as long the device is still able to fall back on a back-up somehow, and doesn't lock itself, I'd be happy.

 

 

About using "fsync(data)" and performing an atomic rename, some questions.

If I understand it correctly, "file.write( bla ...)" may not directly write (to save lots of individual write cycles). It has to be forced with fsync / fsyncdata. However, note that I'm using the Qt QFile. Is the syncing strategy any different then?

 

As for the rename, Qt has a "QFile::rename", but I'm not sure if that is an atomic operation, and it doesn't seem to replace the previous older file either,



#9 wack   Members   -  Reputation: 1305

Like
1Likes
Like

Posted 01 July 2013 - 09:24 AM

Have you considered using some kind of database engine for storing your statistics instead? Atomicity is one of the problems that database engines usually solve. Even SQLite should do.

http://www.sqlite.org/transactional.html

If it is important that the statitics don't get corrupted, you should consider this.

#10 spek   Prime Members   -  Reputation: 996

Like
0Likes
Like

Posted 01 July 2013 - 02:38 PM

Well it are only a few statistics (its a vehicle, stuff like hours and distance are stored), so that might be a bit of overkill. And if the files do get corrupted, it's not the end of the world. At least not if the program just keeps working and can store files again. But thanks nevertheless for the hint!



#11 frob   Moderators   -  Reputation: 21297

Like
1Likes
Like

Posted 01 July 2013 - 03:10 PM

Well it are only a few statistics (its a vehicle, stuff like hours and distance are stored), so that might be a bit of overkill. And if the files do get corrupted, it's not the end of the world. At least not if the program just keeps working and can store files again. But thanks nevertheless for the hint!

 

markr has the correct solution above.  

 

You should always follow this pattern. It doesn't matter if you are writing to disk or making an assignment operator in code. It does not matter if the code is insignificant temporary code that will probably go away.  It doesn't matter if the pattern feels like overkill. The pattern applies to copies of everything, not just files on disk. It is the pattern to follow.

 

Get into the habit of whenever you are replacing something, first make a temporary object that has all the properties you need, and after it is successfully created swap the temporary for the old one.


Check out my personal indie blog at bryanwagstaff.com.

#12 markr   Crossbones+   -  Reputation: 1653

Like
1Likes
Like

Posted 01 July 2013 - 08:40 PM

@spec:

 

There is no API in POSIX (or anywhere else, probably) to atomically replace the contents of a file. This is in contrast to, say, SQL where we can do an UPDATE statement and reasonably expect it to be atomic.

 

If I replace the contents of a text file, this normally involves (at least) three operations: open, at least one write and close. If the open mode is for writing and truncating, the file is truncated immediately after the open. If a program crashes between the open and the write, then we end up with an empty file.

 

This is why we use the rename() trick - it means we can atomically replace the file.

 

The reason for the fsync is different however. Normally writes to the filesystem - data and metadata - are not done immediately. Instead, the kernel waits a short time before completing the write (maybe about 30 seconds on a desktop-tuned otherwise idle system). This is either because the IO hardware is still busy from another operation, or to optimise several writes into one operation (for example, in some cases a write never needs to be done at all because it is subsequently overwritten or the file is deleted while still in memory). Either way it is an optimisation the kernel makes to allow a process to continue to run while its disc writing is completed in the background.

 

Neither write() nor close() normally flushes the contents of the file to disc. It will simply be in (volatile) RAM, and if the power is lost at this point, the filesystem will probably look on reboot as if the write was never done. Unless you're really unlucky, then power will fail part way through the file creation and you may end up with either a partially written file or an empty file.

 

The fsync() is done so that the data are *definitely* in the file on the disc before it's renamed, so that we can guarantee we don't end up with an as-yet-unwritten file atomically renamed over the previous one, resuting in an empty file. fsync blocks the calling process until the file is actually on permanent storage (assuming no part of the IO subsystem "lies")

 

I suspect that using QFile makes no difference to this, as it's just a wrapper for the POSIX IO functions. The QT rename function is also likely to be a wrapper for the standard rename function, hence also atomic.

 

---
 

As others have suggested, you could use sqlite, which jumps through some pretty complicated hoops to ensure that transactions are atomic (i.e. are rolled back completely after a power failure etc). But it may be overkill.



#13 spek   Prime Members   -  Reputation: 996

Like
0Likes
Like

Posted 02 July 2013 - 01:56 AM

Agreed with you guys that making a temporary file first is the smart & clean way. Not the first time that files got crushed because of some bug in an updated version of the software!

 

I'm writing everything in a "temp" file first now. Not 100% sure if this is the recommended way in Qt, but to force the filewrite I have:

QFile f( "somelocation/myfile.dat.temp" );
f.write( monkeybusiness );
 
fsync( f.handle() );
f.close();

 

But I'm still a bit concerned about the rename. The only way I know (in Qt) is:

QFile::delete( targetFileName ); // remove previous, otherwise the rename doesn't work
QFile::rename( tempFileName, targetFileName );

But... what if stuff stops between the delete & rename? I could load the temp file in case the "targetFileName" doesn't exist (meaning the original file get deleted, but the temp didn't get renamed yet). But maybe there is a cleaner way.

 

 

 

By the way, as for power loss, notice that the hardware still remains on for a short moment. In practice my application gets forced to quit BEFORE the hardware actually shuts down. A more realistic problem is that the app closes between operations. That may also mean a file does not get closed properly. Could that cause problems after a restart?

 

Cheers



#14 Tribad   Members   -  Reputation: 854

Like
0Likes
Like

Posted 02 July 2013 - 03:29 AM

There is a special filesystem for SD-Cards available with linux. Because writing to a SD-Card is very slow compared to a rotating hard-disk the chance of incomplete IO operation on such a device is higher. Writing the new content to a tmp file and than renaming it is a quite good idea. Makeing a backup from the previous version as well. This is as secure as journaling file-systems but even work with FAT or any other non journaling file-system.

Write a checksum at the end of your file. Only if this checksum (SHA CRC or whatever you think is suitable)  is valid the file has been completly written to disk.



#15 markr   Crossbones+   -  Reputation: 1653

Like
0Likes
Like

Posted 02 July 2013 - 08:07 AM

@spek:

 

You don't need to (indeed, you should not) delete the old file when renaming.

 

The rename() call guarantees to atomically replace an existing file. That is its posix behaviour; I doubt very much that the Qt wrapper affects that.

 

If we do delete() then rename, then you're right, there is a window of time when the file doesn't exist (which can potentially cause data loss)






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS