Random Access Memories

May 11, 2009

Why Raid 5 stops working in 2009

by @ 7:20 am. Filed under Business, Personal, Technology

I noticed this on Zdnet, it’s well worth reading.

Why Raid 5 stops working in 2009
By Robin Harris, July 18th, 2007

The storage version of Y2k? No, it’s a function of capacity growth and RAID 5’s limitations. If you are thinking about SATA RAID for home or business use, or using RAID today, you need to know why.

RAID 5 protects against a single disk failure. You can recover all your data if a single disk breaks. The problem: once a disk breaks, there is another increasingly common failure lurking. And in 2009 it is highly certain it will find you.

Disks fail
While disks are incredibly reliable devices, they do fail. Our best data – from CMU and Google – finds that over 3% of drives fail each year in the first three years of drive life, and then failure rates start rising fast.

With 7 brand new disks, you have ~20% chance of seeing a disk failure each year. Factor in the rising failure rate with age and over 4 years you are almost certain to see a disk failure during the life of those disks.

But you’re protected by RAID 5, right? Not in 2009.

Reads fail
SATA drives are commonly specified with an unrecoverable read error rate (URE) of 10^14. Which means that once every 100,000,000,000,000 bits, the disk will very politely tell you that, so sorry, but I really, truly can’t read that sector back to you.

One hundred trillion bits is about 12 terabytes. Sound like a lot? Not in 2009.

Disk capacities double
Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we’ll have 2 TB drives.

With a 7 drive RAID 5 disk failure, you’ll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an URE.

So the read fails. And when that happens, you are one unhappy camper. The message “we can’t read this RAID volume” travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected – you thought! – data is gone. Oh, you didn’t back it up to tape? Bummer!

So now what?
The obvious answer, and the one that storage marketers have begun trumpeting, is RAID 6, which protects your data against 2 failures. Which is all well and good, until you consider this: as drives increase in size, any drive failure will always be accompanied by a read error. So RAID 6 will give you no more protection than RAID 5 does now, but you’ll pay more anyway for extra disk capacity and slower write performance.

Gee, paying more for less! I can hardly wait!

The Storage Bits take
Users of enterprise storage arrays have less to worry about: your tiny costly disks have less capacity and thus a smaller chance of encountering an URE. And your spec’d URE rate of 10^15 also helps.

There are some other fixes out there as well, some fairly obvious and some, I’m certain, waiting for someone much brighter than me to invent. But even today a 7 drive RAID 5 with 1 TB disks has a 50% chance of a rebuild failure. RAID 5 is reaching the end of its useful life.

Leave a Reply

You must be logged in to post a comment.

Lasivian's small corner of the web.
(Please wipe your browser before entering so you don't track in mud)

Internal Pages:

Categories:


Misc:

Si hoc legere scis mimium eruditionis habes

Does your brain hurt yet?

retesostft vntphoim enuni toegtieittyft nece n tiog siheun sec eevd go doyvweelprnnstt ievtg h i tieosddfrntea ytiedtt uryrieyhmhsug rer hieoywle unie tnxeref nfls ettdsiedte fnsiei fdhfZ

(I can't remember how I encrypted this cipher , the first person to crack it gets 10$ via Paypal.)

My Email:

01001010011101010111001101110 10000100000011010000110100101 10010001101001011011100110011 10010000001101101011110010010 00000110010101101101011000010 11010010110110000100000011010 01011011100010000001110000011 01100011000010110100101101110 00100000011100110110100101100 11101101000011101000000110100 00101000001101000010100100110 00110000101110011011010010111 01100110100101100001011011100 10000000110011101101101011000 01011010010110110000101110011 000110110111101101101
WTF?

How long the USA has been under corporate rule:

Search Posts:

Archives:

May 2009
S M T W T F S
 12
3456789
10111213141516
17181920212223
24252627282930
31  

External Links:

Things i've read lately:

QR Codes

(Scan these on your cellphone)

My website URL

My E-mail

other:

36 queries. 0.064 seconds