View more

View more

View more

### Image of the Day Submit

IOTD | Top Screenshots

### The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.

# Stunned after reading a data sheet

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

8 replies to this topic

### #1samoth  Members

Posted 07 November 2013 - 11:03 AM

I just read through the data sheet of a 4TB Seagate hard drive which says "Nonrecoverable Read Errors per Bit: 1 sector per 1014. Guess that's none better or worse than any other harddisk's error rate, only maybe you've never looked or the manufacturer doesn't publish it.

Anyway, this got me thinking, 1014 bits is only about 11.5 terabytes. It's a 4TB disk. Therefore, assuming you fill this disk using its complete capacity, there is an 1/3 chance that you have an unrecoverable defect sector. After only one complete write cycle? Note that the emphasis is on "nonrecoverable", I am certainly aware that one or the other bit may always flip by accident, that's just how things work -- but the drive's FEC will routinely correct that.
If you do 3 complete write cycles, you are statistically more or less guaranteed to have at least one nonrecoverable defect sector. Not seriously?

I must have some fallacy in my reasoning? Surely harddisks must despite all be a little more reliable than to allow 3 write cycles?

Edited by samoth, 07 November 2013 - 11:03 AM.

### #2Hodgman  Moderators

Posted 07 November 2013 - 04:40 PM

assuming you fill this disk using its complete capacity, there is an 1/3 chance that you have an unrecoverable defect sector.
If you do 3 complete write cycles, you are statistically more or less guaranteed to have at least one nonrecoverable defect sector. Not seriously?
I must have some fallacy in my reasoning?

Your statistics are a bit off. If 1 cycle has a 33.3% chance of error, then 3 cycles have got a 70% chance of error, not ~100%... So this is obviously misleading.

Does a "1 per 1014" rate mean that each individual read has a 1/1014 chance of error? Does it mean that the mean number of reads before 1 error occurs is 1014? That after 1014 reads, there's a 50/50 chance that there's been an error or not?

Depending on how that figure is defined, the level of badness could be quite different

e.g. if each read operation has a 1/1014 chance of an error, that's the same as each op having a 1-(1/1014) chance of no error. If you then multiply itself four trillion times (((1-(1/1014))4000000000000)), then the chance of no error after 4 trillion ops is actually 96% (or 4% chance of error after 4 trillion ops).

Edited by Hodgman, 07 November 2013 - 04:49 PM.

### #3TheChubu  Members

Posted 08 November 2013 - 05:55 AM

And, statistically speaking, 4 trillion ops its a lot of ops.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

### #4samoth  Members

Posted 08 November 2013 - 09:27 AM

Your statistics are a bit off. If 1 cycle has a 33.3% chance of error, then 3 cycles have got a 70% chance of error, not ~100%... So this is obviously misleading.

How so? 1 cycle reaches 1/3 of the number of bits written they give in their data sheet. Unless drives are built so they work 100% reliably up to some limit and then suddenly explode, it seems reasonable to expect that passing 1/3 the way to the "target number" gives a 1/3 chance of a failure.

3 cycles, on the other hand reach (actually it's a little more) the number of bits they give in their data sheet as target number for "1 unrecoverable sector". Which, according to what's written means that 1 sector will be unrecoverable. Or something else, depending on what the definition means exactly.

But, reaching the "target" (whatever it is), is 100%, not 70%. No?

Does a "1 per 1014" rate mean that each individual read has a 1/1014 chance of error? Does it mean that the mean number of reads before 1 error occurs is 1014? That after 1014 reads, there's a 50/50 chance that there's been an error or not?

Good question. Who knows what the definition is, they don't tell.

I would read it as "after writing 1014 bits, it's allowed to have at most one unrecoverable sector". This interpretation comes from my general expectation is that a harddisk has zero failures (unless you hit it with a hammer or submerge it in water while powered on).

Of course that won't happen in practice. Cosmic radiation, radioactive decay, your cat walking close by, name whatever you like... a bit may always flip for no apparent reason. Both in RAM and on magnetic storage. Rarely, but it happens. For that, drives have error correction.

I have a fully functional 2 year old Samsung drive (if you can ever consider a Samsung drive fully functional) which reports 1924915 Hardware_ECC_Recovered events. This is considered 100/100 ("perfectly good") as user-displayed value, and indeed the drive has never actually shown any kind of failure. It has a reallocation count of 0 and a pending count of 5 (these values didn't change over the last 3 months, so apparently the controller isn't yet sure whether or not to remap those 5 sectors, they're probably the ones causing the ECC_Recovered events).

So I guess (but, who knows!) that the numbers they provide are something they guarantee (or rather specify, you don't really have a guarantee) as upper bound. Though, of course, you don't know when random failure happens. The harddisk might not even spin up when you power it for the first time, that's unlikely but possible (and it would be a one-billion sector failure with zero bits written...). So that "upper bound" would, probably, have to be seen as something like the boundary of the e.g. 99% confidence interval?

Or... something else? Maybe it's a completely made up bogus number that just looks very technical for marketing?

How do you even measure this in a somewhat reliable way? You'd have to test hundreds of drives until they produce a number of random failures (say, a dozen), count the exact number, and divide by the total number of bits written to get that many. That would be immensely time-consuming and expensive.

Maybe, after all, they did such a measurement in 1980 which said 1010 and since disks are bigger nowadays and technology advances, they simply "extrapolated" it to 1014. I wouldn't know, and there's hardly a way one could verify this

e.g. if each read operation has a 1/1014 chance of an error

In reality, each read operation has a much higher chance of an error, that's not the same thing, however. The drive will apply error correction and retry (again, with error correction) before reporting an error. And you might be lucky trying again next time.

It's not the same as a sector going defect over time, either. Drives will regularly reallocate sectors (copying data elsewhere) when the FEC is triggered more often than they like or when some other metric tells the controller that the signal/noise ratio in one location isn't that great. This is a "normal" thing.

They're saying unrecoverable sector, which means that no matter what the drive does and no matter how often you try, you're not getting your data back. You wrote data to disk, the drive reported "OK, you're good to go", and now the data is gone. Forever. As if you invested in Lehman's.

Now, inevitably, you're going to say "this is what backups are for". What can I say but "they are, and they are not". Not only do you not know if your backup doesn't contain an unrecoverable sector, but also the idea of a backup is not to plan losing data, but to be prepared if it happens (note the wording: if, not when).

4 trillion ops its a lot of ops

Not such a lot for a harddisk, though.

My Windows system disk has an average of 6.75 million writes per hour (dividing S.M.A.R.T. Lifetime_Writes by Power_On_Hours). Since it is a SSD, I am very careful not to use the system disk for volatile data (I'd hate having to setup Windows again because of disk failure). I don't install anything I do not really need, and I do not copy anything to the system disk that needs not absolutely be there.
Swap and temp are on ramdisk (but Windows gives a fuck, it still writes half of its temp files to C:\Windows\Temp), and data as well as programs that I update often (e.g. GCC) go onto the second disk (ironically also a SSD, but it would be less painful to replace than the system disk, obviously). Bulk data and things like downloads are hosted on the NAS, and that's where tasks like unpacking zip files and such happen, too (funny thing to download from the internet only to store on the network again, but heh).

Ironically, although the second drive is "used" a lot more than the first, it has fewer write operations than the system drive (only 36,000 per hour). The WD disk in the NAS doesn't provide a lifetime writes figure, so I can't tell how many writes it sees per hour.

I have all services such as indexing, superfetch, .NET optimization and every other shit that constantly accesses disk for no good reason disabled. No software on the system that is not necessary. Still, Windows manages to generate 600-900 writes per minute when you have no program open and walk away, i.e. not even touching the mouse. If you have a program like Firefox open (but not doing anything!), the number of writes is approximately doubled (presumably because it syncs its settings to disk every few seconds, all writes go to the user profile).

Now, 4 trillion is about 592,000 times 6.75 million, so you might say "ridiculous, never going to happen", but it really isn't. Yes, that is 67 years before failure, but this number is based on wrong figures. First, those are "writes", not "sectors written". Some of these writes will certainly have been larger than one sector, but in any case they were much larger than 1 bit (the numbers are relative to bits written, remember, and a device doesn't write less than one full sector!).  In the case of that Seagate drive this thread is about, sectors are 4096 bytes, so it's more like 18 hours, not 67 years.

Also, if I hadn't disabled most Windows stuff, and if swap and temp didn't go onto ramdisk, and so on, I'd have anywhere from 50 to 100 times more writes as I have now (which is very optimistic, doing a single build alone generates several thousand temporary files, each with a dozen or so writes, so it might as well be 1000 times more). And suddenly those assumed 67 years (let's assume that although they say "bits", they really meant "sectors") are something between 8 months and 1 1/2 years, which seems very close and tangible.

### #5swiftcoder  Senior Moderators

Posted 08 November 2013 - 10:12 AM

I must have some fallacy in my reasoning? Surely harddisks must despite all be a little more reliable than to allow 3 write cycles?

There was a big stink a few years ago about this issue and how it interacted with increasing hard drive capacities and the reliability of RAID5.

RAID5 has built in redundancy, but unfortunately, once one of the raid'd disks fails, you have to read the entire remaining array to rebuild the missing disk. If your RAID5 array is over 12 GB, then the manufacturer's specifications state that you are pretty much guaranteed to hit an unrecoverable error before you finish rebuilding the missing disk. And since you don't have any data redundancy while rebuilding the array, any error here results in actual data loss...

Edited by swiftcoder, 08 November 2013 - 10:15 AM.

Tristam MacDonald - Software Engineer @ Amazon - [swiftcoding] [GitHub]

### #6frob  Moderators

Posted 08 November 2013 - 12:05 PM

Google has also published their HDD failure rates (pdf) and they have a rather large number of disk drives to build a representative sample from.

The drive manufacturers failure rates are overstated.

Check out my book, Game Development with Unity, aimed at beginners who want to build fun games fast.

Also check out my personal website at bryanwagstaff.com, where I occasionally write about assorted stuff.

### #7Khaiy  Members

Posted 08 November 2013 - 07:14 PM

How so? 1 cycle reaches 1/3 of the number of bits written they give in their data sheet. Unless drives are built so they work 100% reliably up to some limit and then suddenly explode, it seems reasonable to expect that passing 1/3 the way to the "target number" gives a 1/3 chance of a failure.

3 cycles, on the other hand reach (actually it's a little more) the number of bits they give in their data sheet as target number for "1 unrecoverable sector". Which, according to what's written means that 1 sector will be unrecoverable. Or something else, depending on what the definition means exactly.

But, reaching the "target" (whatever it is), is 100%, not 70%. No?

That's like saying that if you flip a coin and get heads, you are guaranteed to get tails if you were to flip it again. Each operation has a chance of success or failure which is independent of all the operations that have already taken place, no matter if they were successful or not. In a case like that you can't just add the probabilities, though you can multiply them. If you were to conduct tests of an infinite number of individual cycles, you could expect the proportion of cycles with exactly one failure to approach 1/3. If we take the per-cycle likelihood only (which is a little dodgy given that each cycle is a grouping of a lot of individual operations, but maybe the best we can do here) you have a ~72% chance of going through just three cycles with no errors.

Good question. Who knows what the definition is, they don't tell.

I would read it as "after writing 1014 bits, it's allowed to have at most one unrecoverable sector". This interpretation comes from my general expectation is that a harddisk has zero failures (unless you hit it with a hammer or submerge it in water while powered on).

If they don't say, we can't know, but I would bet that it's the mean number of bits written until an operation leaves a sector unrecoverable across some number of hard drives randomly taken off the line. That's the textbook approach for manufacturers to test this sort of thing. It's a passive observation of the failure rate of this particular model (or maybe style) of drive, not a threshold each drive has to clear before being shipped.

So I guess (but, who knows!) that the numbers they provide are something they guarantee (or rather specify, you don't really have a guarantee) as upper bound. Though, of course, you don't know when random failure happens. The harddisk might not even spin up when you power it for the first time, that's unlikely but possible (and it would be a one-billion sector failure with zero bits written...). So that "upper bound" would, probably, have to be seen as something like the boundary of the e.g. 99% confidence interval?

Or... something else? Maybe it's a completely made up bogus number that just looks very technical for marketing?

How do you even measure this in a somewhat reliable way? You'd have to test hundreds of drives until they produce a number of random failures (say, a dozen), count the exact number, and divide by the total number of bits written to get that many. That would be immensely time-consuming and expensive.

It could be any of those, I guess, but it sounds to me like a description of the mechanical failure rate at which these drives produce unrecoverable sectors. It's very very simple to measure, you just pull random drives off the line (maybe with some stratification from different lots), run them for X operations each, and then count the total unrecoverable failures each drive had and slap that number over the total number of operations undertaken and there's your failure ratio.

Assuming that the distribution of errors is Normal (which I would expect from a mechanical failure) you can get a pretty accurate "typical" failure rate without having to test all that many drives. A confidence interval is the confidence with which you can state that the true population mean (the true average failure rate of all of these particular hard drives ever produced) lies within between two numbers X and Y, it doesn't say anything about the likelihood of a particular outcome.

The scenario you're describing, with a one billion sector failure, would have a probability of (1/(10^14))1,000,000,000, the chance of an event that happens every ~1/1014 times multiplied by a billion independent trials. R wouldn't even report that to me with e notation, it just spat out 0.

In reality, each read operation has a much higher chance of an error, that's not the same thing, however. The drive will apply error correction and retry (again, with error correction) before reporting an error. And you might be lucky trying again next time.

It's not the same as a sector going defect over time, either. Drives will regularly reallocate sectors (copying data elsewhere) when the FEC is triggered more often than they like or when some other metric tells the controller that the signal/noise ratio in one location isn't that great. This is a "normal" thing.

They're saying unrecoverable sector, which means that no matter what the drive does and no matter how often you try, you're not getting your data back. You wrote data to disk, the drive reported "OK, you're good to go", and now the data is gone. Forever. As if you invested in Lehman's.

But even an unambiguous error has to occur in a spot that matters, right? Like, if an error occurs in a temporary file that just gets overwritten shortly afterwards anyhow, it's a pretty silent problem. Obviously there will be critical sectors, but the odds of an error occurring exactly in one are far, far lower than the odds of one occurring anywhere over an arbitrary number of sectors.

For reference, the error rate in DNA transcription is ~1/10,000 base pairs, which is far higher than the rate in these hard drives. There's error correction in DNA transcription too (sort of), but a much better defense is that only about 5% of human DNA codes for anything and so the chance of an error happening in an important sequence in the first place are very small. Most of the mistakes happen in the regions that have no functional purpose other than their presence.

Edited by Khaiy, 08 November 2013 - 07:17 PM.

-------R.I.P.-------

Selective Quote

~Too Late - Too Soon~

### #8Hodgman  Moderators

Posted 08 November 2013 - 11:54 PM

Your statistics are a bit off. If 1 cycle has a 33.3% chance of error, then 3 cycles have got a 70% chance of error, not ~100%... So this is obviously misleading.

How so?

That's how statistics work.

If we assume that 1 event = 1/3 chance of error, that's the same 1 event = 2/3 chance of no error.
3 events then have (2/3)*(2/3)*(2/3) chance of no errors, or 29.6%, which is the same as a 70.3% chance of 1+ errors.

So when you've said that 1 pass = 100% chance of error, so therefore 1/3 pass = 33.3% chance of error, you've already started violating statistics.

Swift's link seems to imply that this statistic means that when reading any bit on the drive, there's a 1/1014 chance that it will be unrecoverable.

So, the chance of being able to recover any bit is 1-(1/1014).

Disc-makers user 1000 bytes per KB/MB/GB/TB instead of 1024, so a 4TB drive is 4000000000000 bytes, or 32000000000000 bits.

If you're trying to clone a whole drive, the chance of success (zero unrecoverable reads) is then (1-(1/1014))32000000000000, or 72.6%...

That's a 27.38% chance that 1 or more unrecoverable reads will occur, which yep, is a bit worrying.

If you instead buy a drive with 1015 reliability, then the chance of failure goes down to 3% though...

### #9swiftcoder  Senior Moderators

Posted 12 November 2013 - 02:19 PM

I just ran across this rather relevant article from a storage company. Worth a read.

Tristam MacDonald - Software Engineer @ Amazon - [swiftcoding] [GitHub]

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.