Why did my SSD just die?

Started by
23 comments, last by 21st Century Moose 8 years, 5 months ago

Never had an SSD fail so far. Do you mean that they reach the end of their wear-level as shown by various SSD-tools, for example Intels and Samsungs bundled tools, or that they just die for some malfunction reason?

I have an Intel 510 120 GB that's going on 5 years and has been full or near full at least half of that and have had to endure a lot of source-code storage and temp build files from Visual Studio so at least somewhat well used.

Advertisement

Never had an SSD fail so far. Do you mean that they reach the end of their wear-level as shown by various SSD-tools, for example Intels and Samsungs bundled tools, or that they just die for some malfunction reason?

They typically die for other reasons.

There have been many different writeups, some are quite comprehensive. This one from Tom's Hardware back in 2011 covered data released by Google and SoftLayer, and from technology analysts. They basically had many hundred thousand drives they could watch and monitor over the course of five years.

This graph probably describes the rates better than any other:
google_afrage_450.png

That is the annualized failure rate, looking at constantly-used drives in big datacenters like at Google. Some are used continuously, others used more sporadically, some used 24/7, others used only a few hours per week. Thousands of drives in lots of situations across a wide range of usage patterns.

Note that these were corporate server failure rates. You almost certainly use your drives much less, so extend the graph dates by a few years.

The graph describes that a small number of drives arrive just fine, but then about 3% fail within the first three months. Then another roughly 2% fail between 3-6 months. Then another roughly 2% fail between six months to a year. By the end of five years, about 40% of these big drives have failed.


They can die for any number of reasons. When drives warm and cool parts expand and contract, the boards and circuitry may not handle the expansion and contraction; maybe it causes something to crack, maybe something cools too fast or too slow and expands wrong triggering a short. Maybe something overheats. Maybe they are hit by random radiation that kills a circuit. Maybe it was a subtle manufacturing defect that took a small amount of wear and tear to uncover.

Their survey discusses that specific vendors are more likely to have better drives and others are more likely to have worse drives, certain models are known for reliability and others are known for rapid failure, but ALL models from ALL vendors had unpredictable failures. Some were very small, about a quarter of one percent AFR, but that is still not zero; that is one in four hundred that dies young.

No matter the reason, no matter the brand, a small number of drives die early. Some drives fail within weeks. Other drives can run continuously for a decade without problems. But sooner or later every drive will die. You don't know when that is. You need to be prepared for it.

Personally I don't trust a work disk drive at all after three years, and I don't trust my home's disk drives after five years. At that point they are due for replacement.


They typically die for other reasons.

...

Great post, thanks!

Just to drive Frob's point home -- wearing of the flash (not defect) is really a non-issue. The write ratings on flash are firstly a lower-bound, and flash cells commonly go well beyond their rating without issues, some have been recorded going 10s of times longer than their rating. Secondly, even at 'only' 3000 cycles, that's around 8 years of entire replacing the drive's contents 3 times each day, more than 10 years if you give them the weekend's off. Wear-leveling multiplies that by a factor proportional to the contents of your disk that do not change.

throw table_exception("(? ???)? ? ???");

I had a Mushkin SSD for 6 months and it died, and reboot didn't show up as well. I got it replaced from warranty, however it has been 6 months and I still haven't plugged it in. I already had to re-install windows on a standard hard drive while waiting to ship there, and back for a replacement, so I didn't want to re-install windows 2 weeks later again.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

I already had to re-install windows on a standard hard drive while waiting to ship there, and back for a replacement, so I didn't want to re-install windows 2 weeks later again.

Haha, I will need to install both windows and linux, ...the latter one will take me days to get it back to the way it was. I should have kept more backups:).

In the meantime I tested it on other PCs. Seems like it's totally gone.

I'm sure I remember the marketing spiel for the first solid-state hard disks saying things like "say goodbye to hard disk failure" and that they would be more reliable than a disk with moving parts. I've had ide disks that have lasted ten years in constant use, until the average for a solid state matches up to that I think they were perhaps fibbing a bit...

I've never had an HDD failure (except an expensive NAS/"enterprise" grade one in my company NAS)...
The first SSD that I owned died in 6 months (no-show in BIOS, unrecoverable). But since then I've had 5 more SSDs and no failures.

I'm not sure that's enough data to even pull anecdotal evidence from.
I'd be interested to know the large scale SSD vs HDD fail rates. There's been some SSD scares/controversies, but I don't know about a trend.
IMHO they are an essential component in any new PC now, and I trust them :)
(but still: backup!)

I think that SSD lifespan is much that of RAM like. Pick large width bus, slower frequencies, get it cooled, and you should be set for a lifespan of a DDR3 1333MHz solid brand RAM. Not too long still.

I wonder why there are not rather HDDs with 1GB cache and stuff like that.

I've seen plenty of HDD failures, also in the enterprise/server area, where things are very different when a disk has 10s or 100s of people hitting it constantly and concurrently for an 8 to 10 hour period each day.

My understanding is that the failure characteristics of HDD and SSD are very different. HDD builds up slowly, starting with bad blocks, and it can be quite some time before a failing disk is actually unusable. SSD tends to die in style; it goes out instantly. So it's easy to understand that a hypothetically more robust technology that goes out instantly can actually seem more failure-prone than one that takes a longer time to die: there may actually be a lot of HDDs out there that are in the process of failing, but because it's just damaged blocks and sectors the disk is still (mostly) usable and it doesn't appear to be that way.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement