<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    Summary:&nbsp; yes, Linux RAID is viable.<br>

    <br>

    I have run Linux RAID for years in a few small environments (~a few

    TB, a few users) and one larger one (2 years ago, around 6TB as

    virtual disk storage for ~150 virtual machines).<br>

    <br>

    We moved to Sun 7000 series storage for the larger environment.&nbsp; The

    immediate problem we were trying to solve was performance, but the

    reason it wasn't solvable on Linux was some bugs in Linux LVM

    causing volumes to hang until reboot and interfering with reboot.&nbsp; I

    will note that we were running RedHat EL 5.4 there where other

    places I've run significant Linux servers have been Debian or Ubuntu

    and I've not had any problems.<br>

    <br>

    Linux RAID is quite easy to monitor -- "out of the box" "mdadm" will

    send mail with drive status emails.&nbsp; Additionally, I wrote some

    simple Xymon scripts to monitor the drive SMART info (I'm a bit of a

    drive temp nut for drive longevity).<br>

    <br>

    I have never seen any issue rebuilding a raid 1, 10 or 5 from a

    single drive failure and I've had a single instance of a 2 drive

    failure on a raid6 that rebuilt with no issues.&nbsp; It has only

    required downtime on older systems without drive hot-swap

    capability.&nbsp; <br>

    <br>

    Daniel's point about read failure during rebuild is a good one --

    with a RAID5 of multi-TB levels I do indeed have that concern.&nbsp; RAID

    is showing its design age in this respect.&nbsp; I combat this with

    periodic SMART drive self-tests (exactly what they do varies from

    drive to drive) and also by background full volume reads.&nbsp; I'd be

    happy to have a more detailed discussion on why this helps if anyone

    is interested.<br>

    <br>

    Some serious drawbacks of Linux vs an "enterprise" storage system

    (my direct experience is Sun 7000) include:<br>

    <ul>

      <li>thick provisioning</li>

      <li>snapshots affect run-time performance on the primary volume</li>

      <li>lack of integrated monitoring tools (e.g. NFS IOps by client)<br>

      </li>

      <li>no efficient box-to-box transfer tools (i.e. some equivalent

        to "zfs send")</li>

    </ul>

    Some advantages:<br>

    <ul>

      <li>Cheap, of course</li>

      <li>OSS community support is, frankly, better than the paid

        enterprise support I've experienced in terms of response time

        and useful information available.<br>

      </li>

      <li>LVM's on-line migration features allow you to manipulate

        (reshape arrays, upgrade drives) on the fly without much

        difficulty.</li>

      <li>Direct access to everything.&nbsp; No "you can't get there from

        here".</li>

    </ul>

    Let me emphasize the utility of LVM to live migrate data amongst

    volumes -- it's a capability I seriously miss when using the Sun

    boxes.&nbsp; I understand NetApp can do things like this but have never

    used NetApp.<br>

    <br>

    Overall I believe we've experienced more down time on the Sun 7000

    series.&nbsp; We've had 3 issues that were a bug/problem in the Sun box

    (plus a drive failure where resilvering had significant performance

    impact.).&nbsp; We've had at least 3 other issues where we asked more

    from the box than it could give and it entered a performance domain

    several orders of magnitude worse than "normal" operations. <br>

    <br>

    On the other hand, we might have been able to squeeze 1500 IOPs out

    of our 20 drive Linux system and we're usually getting around 5000

    IOPs out of our 40 drive Sun system.&nbsp; If we had a larger read load I

    suspect we could get even higher performance.<br>

    <br>

    While we've been very happy with the Sun box performance in "normal"

    operations and it is very cost effective on both per TB and per IOps

    metrics and I've yet to lose any data, I have become very wary of

    the edges of its performance domain and no longer consider it

    suitable for mission critical applications unless I have performance

    based (not connectivity based) fail-over capability.&nbsp; Linux also

    exhibited edge of envelope performance degradations but the slope

    was neither as steep or as high. <br>

    <br>

    --<br>

    Dewey<br>

    <br>

  </body>

</html>