Database Reliability Engineering – How to decide RAID for your Database Infrastructure ?

What is RAID?

RAID stands for Redundant Array of Independent Disks (Originally, the term RAID was defined as Redundant Array of Inexpensive Disks). The name indicates that the disk drives are independent, and are multiple in number. RAID storage uses multiple disks in order to provide fault tolerance, to improve overall performance, and to increase storage capacity in a system. This is in contrast with older storage devices that used only a single disk drive to store data.How the data is distributed between these drives depends on the RAID level used. The main advantage of RAID, is the fact that, to the operating system the array of disks can be presented as a single disk.RAID is fault tolerant because in most of the RAID level’s data is redundant in multiple disks, so even if one disk fails,or even two sometimes, the data will be safe and the operating system will not be even aware of the failure. DATA loss is prevented due to the fact that data can be recovered from the disk that are not failed. RAID should not be confused with data backup. Although some RAID levels do provide redundancy, We always recommend a separate storage system for backup and disaster recovery purposes.

Benefits of RAID

High performance READ / WRITE provides faster transactions
Redundant data storage to prevent data loss in case of a disk failure
Combining several hard drives to provide large capacity

When should I use RAID?

RAID is extremely useful if reliability and data redundancy are important to you. Even if you take backups, you will need to take the time to restore those backups and those backups could be hours or days old, resulting in data loss. RAID allows you to survive a drive loss without data loss and in many cases without any downtime.
RAID is also useful if you are having disk IO issues, where applications are waiting on the disk to perform tasks. Going with RAID will provide you additional throughput by allowing you to read and write data from multiple drives instead of a single drive. Additionally, if you go with hardware RAID, the hardware RAID card will include additional memory to be used as cache, reducing the strain put on the physical hardware and increase overall performance.

RAID Storage Techniques

RAID can be stored using various data storage techniques:

striping
mirroring
parity
combination of the above

Striping is a way of writing data to member disks in which the data flow is split onto the blocks of a certain size and then written to the disks in turn.

Mirroring stores identical data copies on the array member disks.

Parity is a type of data organization where splitting data onto the blocks (striping) is used along with calculation of a certain checksum which then is written to the member disks.

Standard RAID Levels

Standard RAID levels are based on simple and basic hardware configurations, and are ideal for a wide range of businesses and individuals. Typical standard level is RAID 0, 1, 2,3,4,5, 6 and 10. Each of these provides a unique combination of redundancy and performance.

While levels 1, 5, and 6 provide some degree of fault tolerance, level 0 doesn’t but offers the fastest performance. RAID 1 is the most reliable in data security while level 5 provides the best balance between performance, fault tolerance, and reliability.

RAID 0

A RAID 0 level uses block-striping to spread data across multiple physical disks. This has the fastest I/O performance since it writes or copies small different parts of a file to -, or from -, multiple disks simultaneously.

It requires a minimum of two physical drives and provides the maximum disk space, which is the total of the individual device capacities. However, it does not offer any data redundancy or fault tolerance, and is best for organizations looking for performance. A failure in any of the disks in a RAID 0 array results in complete loss of data, including data saved in the good drives.

RAID 0 level is best for applications processing non-critical data but requires high performance.

RAID 1

RAID 1 mirrors data on two or more disks without parity. The level requires at least two drives and total usable space equals the size of a single disk.All the disks have identical copies of data. In case of a disk failure, the system continues to use the existing disk or disks in good working condition.

RAID 1 level provides better data redundancy and is ideal for applications where data availability is critical. This is a simple technology with basic fault tolerance but no performance improvements since it must write the data twice.

This is ideal for applications where data availability and redundancy are important.

RAID 2

RAID 2 uses bit-level striping with parity compared to block striping in RAID 0. Additionally, it uses Hamming code for error detection and therefore requires disks without self-disk error checking option. Since most of the modern disks have this feature, the level is rarely used. In addition, it requires an extra disk to store parity information for error detection purposes. Effective disk capacity is n-1 where n is the number of disks.

RAID 2 works like RAID 0 but uses bit-level striping along with an error protection mechanism to protect data loss due to corruption. This is resource extensive and not widely used.

RAID 3

RAID 3 uses byte-level striping with parity for rebuilding data. It requires a minimum of three drives, of which one stores the parity information. The level has high-level data transfer rates for large files since data is accessed in parallel but slower on small files.

This level performs better for long sequential data transfers such as video but not in applications where there are many requests such as a database. In case the disk with parity crashes, there is no way of rebuilding data. The level is not used much and just like RAID 2, its usable capacity is n-1.

RAID 4

RAID 4 is almost similar to RAID 3 but uses block-level striping. It combines block-level striping across multiple disks with a dedicated parity disk. The level requires a minimum of three disks where one is reserved for parity information. Data from each drive is accessed independently at only one block at a time hence slow operations. In addition, writing operations are slower since the system must write the parity information.

This is ideal for sequential data access. However, the parity disk may slow the write applications. The level is rarely used.

RAID 5

RAID 5 has block level striping along with distributed parity. This is a cost-effective, all-round configuration that balances between redundancy, performance, and storage capacity.

Striping improves the read I/O performance while parity is important for reconstructing data in case of disk failure. However, it cannot survive multiple disk failures and takes longer to rebuild data since the process involves calculating parity from each of the available drives. It requires a minimum of three disks but has a usable space of n-1 disk.

RAID 5 level is suitable for applications and file servers with limited storage devices.

RAID 6

RAID 6 uses block striping like RAID 5 but with a dual distributed parity. The two blocks of parity information provide additional redundancy and fault tolerance. This level can survive two concurrent disk failures. However, it is expensive; requiring at least four drives while giving a usable space is n-2 disks.

It is more reliable and common in SATA environments and applications such as disk-based backups and data archives where there is a need for long data retention. It is also suitable for environments where data availability is more important than performance.

Drawbacks of the level 6 include the additional disk for the double parity information as well as being complex to implement compared to level 5. Due to the dual parity, the write and restore speeds are slower.

RAID 10 (Mirroring + Striping)

RAID 10 requires at least 4 drives and is a combination of RAID 1 (mirroring) and RAID 0 (striping), getting you both increased speed and redundancy. This is often the recommended RAID level if you’re looking for speed and still require redundancy. In a four-drive configuration, two mirrored drives hold half of the striped data and another two mirror the other half of the data.

This means you can lose any single drive, and then possibly even a 2nd drive without losing any data. Just like RAID 1, you’ll only have the capacity of half the drives, but you will see improved read and write performance and also have the fast rebuild time of RAID 1.

Things You Must Consider when Choosing RAID Level

Storage

It is believed that majority of users using RAID is for more and flexible storage capacity. Hence, storage is undoubtedly the first of most crucial things to think of. Different RAID levels will bring up different amounts of usable space remaining. For instance, RAID 0 has no parity information and no mirroring. Thus, it can give you full use of all drives, namely 100% usable storage of the drives. Yet, in RAID 1 or 10, you can only get 50% capacity since it contains a mirrored copy of data.

Performance

In addition, maybe you longing for RAID are for better performance, namely the faster speed. When you choose a RAID level, you should put the performance as the priority in consideration. For example, if you do not care about data loss and only desire speed, RAID 0 is the best choice. Nevertheless, if not only do you hope performance but also data redundancy, you can consider to opt for RAID 10.

Data Protection

As mentioned in the above point, data redundancy is a critical factor in RAID. It refers to the level of data protection. More specifically, are you concerned about data loss in your machines. If the data is vitally important for you, you don’t want to experience data corruption and you cannot persist in regular and consistent data backups, you’d better pay special attention to the ability of data protection of RAID. Put simply, it’s better to choose RAID 10 or 60. In them, if one disk fails, the system won’t fail at once and you needn’t kiss your data goodbye.

Rebuild

Furthermore, you have to take the rebuild factor into account. When a disk in the RAID system fails or the entire server malfunction, you need to perform rebuilds. Therefore, when selecting RAID level, you need to ensure the maximum number of rebuild times at any one point for the RAID controller. Also, in the rebuild, it is necessary to replace the space disks back to the original location after replacing the failed disk? In a nutshell, you need to think of the level of difficulty in RAID rebuilds.

Cost-effectiveness

Last but not least, budget, namely cost is also a matter of necessity in choosing a RAID level. Unquestionably, you need to select an affordable one. In general, RAID 0 is the most cost effective from the perspective of capacity and performance. In comparison, RAID 1/10 is more expensive. All in all, when choosing RAID level, it is extremely significant to think of the cost-effectiveness.

Conclusion – RAID definitely is not the answer for all your Database Reliability concerns !

RAID does not equate to 100% uptime – There is still a risk of a RAID card failure, though that is significantly lower than a drive failure and there are still software and other hardware causes for system downtime.
RAID does not replace backups – RAID can protect you against a drive failure, but it will not protect you from data corruption, human error, or security issues. There are plenty of reasons other than a drive failure that you should keep backups, so do not take RAID as a replacement for backups.
RAID does not allow you to dynamically increase the size of the array – If you need more disk space you cannot simply add another drive to the array, you would need to start from scratch, rebuilding/reformatting the array.

2 Comments on RAID – Redundant Storage for Database Reliability

Federico Razzoli August 9, 2019 at 1:39 pm

Great summary!

I think that another important point is that writing to a RAID array write-back cache is very fast, because the application regains control as soon as the data is written in RAID’s memory. Of course this requires a battery-backed RAID, or at least capacitors, to ensure that data is preserved in the cache in case of a power outage. As long as writes are not bigger than the cache, this is very fast.
A drawback is that some learning cycle is needed, to periodically check the status of the battery, and this can heavily affect performance – but you can schedule your learning cycles when the workload is low.
- MinervaDB Corporation August 20, 2019 at 12:25 pm
  
  Technically battery-backed RAID is often compelling. But, Operationally such systems are super complex / expensive

The WebScale Database Infrastructure Operations Experts

Committed to Building Optimal, Scalable, Highly Available, Fault-Tolerant, Reliable and Secured WebScale Database Infrastructure Operations

RAID – Redundant Storage for Database Reliability