Expanding a Linux LVM PV and underlying RAID5

Background

Normally, we run our servers with RAID1 with a pair of disks added at a time. Since we also use DRBD on top of our LVM LVs, we have 3 servers in play (3-way DRBD replication) — this means adding new disks in groups of 6, which is sort of spendy.

So, with our new servers, we are looking into switching out the RAID1s (grouped into a single VG) to a single RAID5 under the LVM PV (in a single VG). Linux RAID5s can be expanded on the fly, this lets us grow the data disk by adding only 1 disk at a time (3 across the 3 servers). The bulk of the application servers are not really I/O intensive, so we’re not really worried about the RAID5 performance hit.

Here’s the setup on our test of this process:

  • Ubuntu 8.04 LTS Server
  • AMD 64 x2 5600+ with 8GB
  • 3 x 1TB SATA2 drives, 2 x 750GB SATA2 drives
  • Xen’ified 2.6.18 kernel with Xen 3.3.1-rc4
  • Linux RAID, LVM2, etc, etc.

We configured the partitions on the data disks as 2GB partitions, just so the sync doesn’t take *forever*.

Disk /dev/sde: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00046fcd

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1        1217     9775521   fd  Linux raid autodetect
/dev/sde2            1218        1704     3911827+  82  Linux swap / Solaris
/dev/sde3            1705        1946     1959930   fd  Linux raid autodetect

The initial RAID 5 configuration:

md1 : active raid5 sda3[0] sdc3[2] sdb3[1]
      3919616 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

LVM configuration:

root@xen-80-31:~# pvdisplay /dev/md1
  --- Physical volume ---
  PV Name               /dev/md1
  VG Name               testvg
  PV Size               3.74 GB / not usable 3.75 MB
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              956
  Free PE               956
  Allocated PE          0
  PV UUID               h6qBlQ-RCy3-YeE9-zQXw-j1oa-bg8K-2JULo1

root@xen-80-31:~# vgdisplay testvg
  --- Volume group ---
  VG Name               testvg
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               3.73 GB
  PE Size               4.00 MB
  Total PE              956
  Alloc PE / Size       0 / 0
  Free  PE / Size       956 / 3.73 GB
  VG UUID               iOVFKf-8iSV-k1VK-G37u-Ivns-9uqx-vZCduc

Process to expand the data device

Initial RAID5 configuration:

root@xen-80-31:~# mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Thu Jan  8 04:33:16 2009
     Raid Level : raid5
     Array Size : 3919616 (3.74 GiB 4.01 GB)
  Used Dev Size : 1959808 (1914.20 MiB 2006.84 MB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Thu Jan  8 17:52:46 2009
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 8fd9e7d9:0dae82af:b836248b:2f509f91
         Events : 0.4

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3

Add the new 2GB device to the RAID5 — it will show up as a “spare” device initially:

root@xen-80-31:~# mdadm --add /dev/md1 /dev/sdd3
mdadm: added /dev/sdd3
root@xen-80-31:~# mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Thu Jan  8 04:33:16 2009
     Raid Level : raid5
     Array Size : 3919616 (3.74 GiB 4.01 GB)
  Used Dev Size : 1959808 (1914.20 MiB 2006.84 MB)
   Raid Devices : 3
  Total Devices : 4
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Thu Jan  8 17:52:46 2009
          State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 8fd9e7d9:0dae82af:b836248b:2f509f91
         Events : 0.4

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3

       3       8       51        -      spare   /dev/sdd3

Grow the RAID5 to include the new device:

root@xen-80-31:~# lvs
  LV    VG     Attr   LSize Origin Snap%  Move Log Copy%
  test1 testvg -wi-a- 1.00G
root@xen-80-31:~# mdadm --grow /dev/md1 --raid-devices=4
mdadm: Need to backup 384K of critical section..
mdadm: ... critical section passed.

/proc/mdstat:

md1 : active raid5 sdd3[3] sda3[0] sdc3[2] sdb3[1]
      3919616 blocks super 0.91 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      [=>...................]  reshape =  7.9% (156192/1959808) finish=1.3min speed=22313K/sec

While the reshape is running, the VG is still active:

root@xen-80-31:~# lvcreate -L 1g -n test2 testvg
  Logical volume "test2" created
root@xen-80-31:~# lvs
  LV    VG     Attr   LSize Origin Snap%  Move Log Copy%
  test1 testvg -wi-a- 1.00G
  test2 testvg -wi-a- 1.00G

After reshaping complete:

root@xen-80-31:~# mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Thu Jan  8 04:33:16 2009
     Raid Level : raid5
     Array Size : 5879424 (5.61 GiB 6.02 GB)
  Used Dev Size : 1959808 (1914.20 MiB 2006.84 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Thu Jan  8 18:44:32 2009
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 8fd9e7d9:0dae82af:b836248b:2f509f91
         Events : 0.1336

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3

The new space is not reflected yet in the PV or VG – grow the PV:

root@xen-80-31:~# pvresize /dev/md1
  Physical volume "/dev/md1" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized
root@xen-80-31:~# pvdisplay
  --- Physical volume ---
  PV Name               /dev/md1
  VG Name               testvg
  PV Size               5.61 GB / not usable 1.44 MB
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              1435
  Free PE               923
  Allocated PE          512
  PV UUID               h6qBlQ-RCy3-YeE9-zQXw-j1oa-bg8K-2JULo1

root@xen-80-31:~# vgdisplay
  --- Volume group ---
  VG Name               testvg
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               5.61 GB
  PE Size               4.00 MB
  Total PE              1435
  Alloc PE / Size       512 / 2.00 GB
  Free  PE / Size       923 / 3.61 GB
  VG UUID               iOVFKf-8iSV-k1VK-G37u-Ivns-9uqx-vZCduc

In another test, growing 3 x 200GB partitions by 1 more 200GB partition on our system took around 150 minutes, so the reshaping process is not super speedy. Even though our test showed that you can still perform I/O against the back end data store (RAID5) while it is reshaping, it would probably be best to keep I/O to a minimum.

UPDATE: We repeated the test with 3 x 700GB partitions and added a 4th 700GB partition — reshaping time took about 8.5h with no external I/O performed to the LVM/RAID5 device.

Advertisements

One Response to Expanding a Linux LVM PV and underlying RAID5

  1. daniellench says:

    this is perfect!!! i’ve been looking for a raid5 and LVM walk through for about 2 hrs now. i have an opensuse machine that is getting a new xen opensuse ifolder guest and needed to get an lvm up and running, the whole raid and lvm thing was a little confusing but this helps out alot. BOOKMARKED!!

    thanks a bunch sam i understand this completely!

    best,
    dan

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: