Aus EUserv Wiki

(Unterschied zwischen Versionen)

Wechseln zu: Navigation, Suche

Version vom 19:08, 13. Jan. 2015

Languages:

Deutsch • English

Using the Linux Rescue system

Inhaltsverzeichnis

1 Activating the Linux Rescue system
2 Connecting to the Linux Rescue system
3 Resetting the root password
- 3.1 Preparation
- 3.2 Implementation
4 Disabling the firewall
- 4.1 Preparation
- 4.2 Implementation
5 Restoring a faulty software RAID
6 Checking / Restoring a faulty filesystem
7 Checking the hard disk drives
8 Hardware RAID
9 Technical Limitations

Using the Linux Rescue system

Activating the Linux Rescue system

You have to activate the Linux Rescue system via the customer service center. The following Wiki guide shows you how to activate the Linux Rescue system:

Activate the Rescue system

Connecting to the Linux Rescue system

After the activation you can connect with the Linux Rescue system. Here you can find a Wiki guide covering that topic:

Connect with the Rescue system

Resetting the root password

Preparation

You have to connect with the Linux Rescue system to change the root password. Please proceed as follows:

Activate the Rescue system via the customer service center.
Connect to the Rescue system via SSH.

Implementation

Please proceed as follows to change the root password:

Enter your installed system in a chroot environment (replace X with the relevant partition number):

mount /dev/sdaX /mnt/custom          //(root partition)
mount /dev/sdaX /mnt/custom/boot     //(boot partition)

cd /mnt/custom

mount --bind /dev dev
mount --bind /sys sys
mount --bind /proc proc

chroot . /bin/bash

Enter the following command as root:

 passwd

Enter the new password.
Enter the new password again.
Exit the chroot environment and unmount the partitions:

exit
umount dev sys proc boot
cd ..
umount custom

Deactivate the Rescue system via the customer service center.
Perform a web reset via the customer service center.

You have successfully changed the root password. You can now connect to your system with the new assigned password.

Disabling the firewall

Preparation

You have to connect to the Linux Rescue system in order to disable the firewall. Please proceed as follows:

Activate the Rescue system via the customer service center.
Connect to the Rescue system via SSH.

Implementation

Please proceed as follows to change the root password:

Enter your installed system in a chroot environment (replace X with the relevant partition number):

mount /dev/sdaX /mnt/custom          //(root partition)
mount /dev/sdaX /mnt/custom/boot     //(boot partition)

cd /mnt/custom

mount --bind /dev dev
mount --bind /sys sys
mount --bind /proc proc

chroot . /bin/bash

CentOS/Red Hat/Fedora

Enter the following command as root user:

 chkconfig --level 2345 iptables off

Debian/Ubuntu

Enter the following command as root user:

 update-rc.d -f iptables remove

Restoring a faulty software RAID

Checking of MBR / GPT partition

To check if the partition tables of the hard disk drives are formatted as MBR or GPT, proceed as follows:

Activate the Rescue system via the customer service center.
Connect to the Rescue system via SSH.
Check the format of the partition table of your hdd with the following command (Replace X with the hard disk which has to be checked):

parted -s /dev/sdX print

The partition table has the format MBR:

Partition Table: mbr

The partition table has the format GPT:

Partition Table: gpt

RAID1 with MBR partition

Preparation

You have to connect to the Linux Rescue system to restore a faulty software RAID. Please proceed as follows:

Activate the Rescue system via the customer service center.
Connect to the Rescue system via SSH.
Check the format of the partition table: Checking of MBR / GPT partition
Check the state of the software RAID with the following command:

 cat /proc/mdstat

An intact RAID1 partition has the status 'U'. That means all involved partitions are ok.

Example output:

Personalities : [raid1]
md3 : active raid1 sda4[0] sdb4[1]
      1847608639 blocks super 1.2 [2/2] [UU]
 
md2 : active raid1 sda3[0] sdb3[1]
      1073740664 blocks super 1.2 [2/2] [UU]
 
md1 : active raid1 sda2[0] sdb2[1]
      524276 blocks super 1.2 [2/2] [UU]
 
md0 : active raid1 sda1[0] sdb1[1]
      8387572 blocks super 1.2 [2/2] [UU]
 
unused devices: <none>

A faulty RAID1 array has the status '_'. That either means one hard disk drive is missing / faulty or RAID partition has failed.

Example output:

Personalities : [raid1]
md3 : active raid1 sda4[0]
      1843414335 blocks super 1.2 [2/1] [U_]

md2 : active raid1 sda3[0]
      1073740664 blocks super 1.2 [2/1] [U_]

md1 : active raid1 sda2[0]
      524276 blocks super 1.2 [2/1] [U_]

md0 : active raid1 sda1[0]
      12581816 blocks super 1.2 [2/1] [U_]

unused devices: <none>

In this example the partitions on the second hard disk drive sdb are not displayed. This indicates either a faulty hard disk drive or RAID partition.

Implementation

In order to restore the RAID, please proceed as follows:

Enter the following command (Please follow the sequence carefully! (sfdisk -d source system | sfdisk target system):

 sfdisk -d /dev/sda | sfdisk /dev/sdb

Enter the following command to rescan the partition table:

 sfdisk -R /dev/sdb

Use the following command to check if the hard disk drives sda and sdb have the same partition sizes:

 cat /proc/partitions

If all partitions are present, you can assemble them into the RAID:

mdadm /dev/md0 -a /dev/sdb1
mdadm /dev/md1 -a /dev/sdb2
mdadm /dev/md2 -a /dev/sdb3
mdadm /dev/md3 -a /dev/sdb4

Now the partitions will be restored. This process may take some time depending on the partition size. The status can be queried with the following command:

 cat /proc/mdstat

RAID1 with GPT partition

Preparation

You have to connect to the Linux Rescue system to restore a faulty software RAID. Please proceed as follows:

Activate the Rescue system via the customer service center.
Connect to the Rescue system via SSH.
Check the format of the partition table: Checking of MBR /GPT partition
Check the state of the software RAID with the following command:

 cat /proc/mdstat

An intact RAID1 partition has the status 'U'. That means all involved partitions are ok.

Example output:

Personalities : [raid1]
md3 : active raid1 sda4[0] sdb4[1]
      1847608639 blocks super 1.2 [2/2] [UU]
 
md2 : active raid1 sda3[0] sdb3[1]
      1073740664 blocks super 1.2 [2/2] [UU]
 
md1 : active raid1 sda2[0] sdb2[1]
      524276 blocks super 1.2 [2/2] [UU]
 
md0 : active raid1 sda1[0] sdb1[1]
      8387572 blocks super 1.2 [2/2] [UU]
 
unused devices: <none>

A faulty RAID1 array has the status '_'. That either means one hard disk drive is missing / faulty or RAID partition has failed.

Example output:

Personalities : [raid1]
md3 : active raid1 sda4[0]
      1843414335 blocks super 1.2 [2/1] [U_]

md2 : active raid1 sda3[0]
      1073740664 blocks super 1.2 [2/1] [U_]

md1 : active raid1 sda2[0]
      524276 blocks super 1.2 [2/1] [U_]

md0 : active raid1 sda1[0]
      12581816 blocks super 1.2 [2/1] [U_]

unused devices: <none>

In this example the partitions on the second hard disk drive sdb are not displayed. This indicates either a faulty hard disk drive or RAID partition.

Implementation

To restoring the RAID, proceed as follows:

Enter the following command to copy the partition table of sda to the new hdd sdb:

sgdisk -R /dev/sdb /dev/sda

Assign a new random UUID to the hdd:

sgdisk -G /dev/sdb

Now the hdd can be mounted to the RAID:

mdadm /dev/md0 -a /dev/sdb1
mdadm /dev/md1 -a /dev/sdb2
mdadm /dev/md2 -a /dev/sdb3
mdadm /dev/md3 -a /dev/sdb4

Now the partitions will be restored. This process may take some time depending on the partition size. The status can be queried with the following command:

cat /proc/mdstat

RAID5/6 with MBR partition

Preparation

You have to connect to the Linux Rescue system in order to restore a RAID5/6. Please proceed as follows:

Activate the Rescue system via the customer service center.
Connect to the Rescue system via SSH.
Check the format of the partition table: Checking of MBR / GPT partition
Check the state of the software RAID with the following command:

 cat /proc/mdstat

An intact RAID5 partition has the status 'U'. That means all involved partitions are ok.

Example output:

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 sda7[0] sdc7[2] sdb7[1]
5842954752 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

A faulty RAID5/6 array has the status '_'. That either means one hard disk drive is missing / faulty or RAID partition has failed.

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 sda7[0] sdc7[2] sdb7[1]
5842954752 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]

In this example the fourth hard disk drive in RAID5 is missing.

Implementation

In order to restore the RAID, please proceed as follows:

Enter the following command (Please follow carefully the sequence! (sfdisk -d source system | sfdisk target system):

 sfdisk -d /dev/sda | sfdisk /dev/sdd

Enter the following command to rescan the partition table:

 sfdisk -R /dev/sdd

Use the following command to check if the hard disk drives sda and sdb have the same partition sizes:

 cat /proc/partitions

If all partitions are present, you can mount these into the RAID:

mdadm /dev/md0 -a /dev/sdd1
mdadm /dev/md1 -a /dev/sdd2
mdadm /dev/md2 -a /dev/sdd3
mdadm /dev/md3 -a /dev/sdd4

Now the partitions will be restored. This process may take some time depending on the partition size. The status can be queried with the following command:

 cat /proc/mdstat

RAID5/6 with GPT partition

Preparation

You have to connect to the Linux Rescue system in order to restore a RAID5/6. Please proceed as follows:

Activate the Rescue system via the customer service center.
Connect to the Rescue system via SSH.
Check the format of the partition table: Checking of MBR /GPT partition
Check the state of the software RAID with the following command:

 cat /proc/mdstat

An intact RAID5 partition has the status 'U'. That means all involved partitions are ok.

Example output:

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 sda7[0] sdc7[2] sdb7[1]
5842954752 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

A faulty RAID5/6 array has the status '_'. That either means one hard disk drive is missing / faulty or RAID partition has failed.

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 sda7[0] sdc7[2] sdb7[1]
5842954752 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]

In this example the fourth hard disk drive in RAID5 is missing.

Implementation

To restore the RAID, proceed as follows:

Enter the following command to copy the partition table of sda to the new hdd sdd:

sgdisk -R /dev/sdd /dev/sda

Assign a new random UUID to the hdd:

sgdisk -G /dev/sdd

Now the hdd can be mounted to the RAID:

mdadm /dev/md0 -a /dev/sdd1
mdadm /dev/md1 -a /dev/sdd2
mdadm /dev/md2 -a /dev/sdd3
mdadm /dev/md3 -a /dev/sdd4

Now the partitions will be restored. This process may take some time depending on the partition size. The status can be queried with the following command:

cat /proc/mdstat

Checking / Restoring a faulty filesystem

Checking / Restoring a filesystem of a physical hard disk drive

In order to check the filesystem of a physical hard disk drive you have to connect with the Linux Rescue system. Please proceed as follows:

Activate the Rescue system via the customer service center.
Connect to the Rescue system via SSH.

Enter the following command to start the check of the filesystem (Replace X with the relevant partition):

 fsck /dev/sdX

fsck performs checking and repairing of a Linux file system.

Important: Don't run fsck on a mounted filesystem!

Checking the filesystem type

Enter the following command to check which filesystem type is used (Replace X with the relevant partition):

 parted -s /dev/sdX print

ext2/3/4

To restore the faulty filesystem from type ext2/3/4, enter the following command (Replace X with the relevant partition):

fsck.ext3 /dev/sdX
fsck.ext2 /dev/sdX
...

xfs

To restore the faulty filesystem from type xfs, enter the following command (Replace X with the relevant partition):

 xfs_check /dev/sdX

 xfs_repair /dev/sdX

Checking the filesystem of a software RAID

In order to check the filesystem of a software RAID you have to connect with the Linux Rescue system. Please proceed as follows:

Activate the Rescue system via the customer service center.
Connect to the Rescue system via SSH.

Enter the following command to start the check of the filesystem (Replace X with the relevant partition):

 fsck /dev/mdX

fsck performs checking and repairing of a Linux file system.

Important: Don't run fsck on a mounted filesystem!

Checking the filesystem of a hardware RAID

Checking the hard disk drives

In order to check the hard disk drives you have to connect with the Linux Rescue system. Please proceed as follows:

Activate the Rescue system via the customer service center.
Connect to the Rescue system via SSH.

Hard drive check with smartctl / smartmontools

Hard disk drive with smartctl / smartmontools for normal hard disk drives

Please proceed as follows in order to check your hard disk drives with smartmontools:

Start a short hard disk drive check with the following command (Replace X with the relevant hard disk drive):

 smartctl -t short /dev/sdX

Start a long hard disk drive check with the following command (Replace X with the relevant hard disk drive):

 smartctl -t long /dev/sdX

Hard disk drive with smartctl / smartmontools for hard disk drives on hardware RAID controllers

In order to perform a short check for hard disk drives on 3ware hardware RAID controllers please proceed as follows:

Enter the following command to start a short test (Replace X with the number of the relevant controller port on which the hard disk drive is connected. Please notice: The first hard disk drive is connected on port 0.):

 smartctl -d 3ware,X -t short /dev/twa0

Enter the following command to start a long test:

 smartctl -d 3ware,X -t long /dev/twa0

Evaluation of the results

Enter the following command to display the results of the hard disk drive tests:

 smartctl -l selftest /dev/sdX

The following output example shows that the hard disk drive health is ok:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      4970         
# 2  Long offline        Completed without error       00%      4972

The following output example shows that the hard disk drive health is not ok ("read failure"):

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       20%       717         555027747

Reporting errors to the support

Reporting errors of normal hard disk drives

In order to report errors of the hard disk drive to the support, please specify the output of the following command:

 smartctl -a /dev/sdX

Reporting errors of hard disk drives behind hardware RAID controllers

In order to report errors of the hard disk drive on 3ware RAID controllers to the support, please specify the output of the following command (Replace X with the number of the relevant controller port on which the hard disk drive is connected.):

 smartctl -d 3ware,X -a /dev/twa0

Hardware RAID

Basics / General information

Checking the status of the controller

3ware RAID controllers

In order to check the status of 3ware RAID constrollers, you have to be connected with the Linux Rescue system. Please proceed as follows:

Activate the Rescue system via the customer service center:
Connect to the Rescue system via SSH.
Enter the following command to identify the ID of the controller (usually it is 0):

 dmesg | grep 3ware

The following output is displayed (the controller ID is the number behind scsi):

 [    5.487015] scsi4 : 3ware 9000 Storage Controller

Enter the following command to read the hardware RAID controller information (Replace X with the relevant controller ID):

 tw_cli /cX show

The following exmaple output is possible:

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    OK             -       -       -       149.001   RiW    ON     

VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   149.05 GB SATA  0   -            SAMSUNG HD160JJ     
p1    OK             u0   149.05 GB SATA  1   -            SAMSUNG HD160JJ

In this case the RAID is in a perfect condition.

Unit UnitType Status %RCmpl %V/I/M Port Stripe Size(GB)

u0 RAID-1 REBUILDING 23% - - - 149.001 u0-0 DISK DEGRADED - - p0 - 149.001 u0-1 DISK OK - - p1 - 149.001 u0/v0 Volume - - - - - 149.001

In this case the RAID performs a rebuild. The faulty hard disk drive is the one that is connected on port 0.

LSI RAID controllers

In order to check the status of LSI RAID constrollers, you have to be connected with the Linux Rescue system. Please proceed as follows:

Activate the Rescue system via the customer service center:
Connect to the Rescue system via SSH.
Enter the following command:

 megacli -AdpAllInfo -aAll

An output of information about the LSI controller is displayed.

Checking the status of the hard disk drives

3ware RAID controllers

In order to start a check of hard disk drives behind 3ware RAID controllers with smartmontools, please proceed as follows:

Enter the following command to start a short test (Replace X with the relevant controller port, on which the hard disk drive is connected. Please note that the first hard disk drive is connected on port 0):

 smartctl -d 3ware,X -a /dev/twa0

LSI RAID controllers

In order to start a check of the hard disk drives behind LSI RAID controllers with smartmontools, please proceed as follows:

Enter the following command to identify the device ID of the hard disk drive:

 storcli /c0 /eall /sall show

You can access your hard disk drive with the following command (Replace <X> with the relevant hard disk drive and <N> with the device ID):

 smartctl -a -d megaraid,N  /dev/sdX

Reporting errors to the support

3ware RAID controllers

In order to report errors of your hard disk drive behind a 3ware RAID controller to the support specify the output of the following command:

 smartctl -d 3ware,X -a /dev/twa0

LSI RAID controllers

In order to report errors of your hard disk drive behind a LSI RAID controller to the support specify the output of the following command:

 smartctl -a -d megaraid,N  /dev/sdX

Technical Limitations

The following technical limitations are known at the moment:

Rescue-System is not usable if your server is under a DDOS attack

@@ Zeile 709: / Zeile 709: @@
    smartctl -a -d megaraid,N  /dev/sdX
+= Technical Limitations =
+The following technical limitations are known at the moment:
+* Rescue-System is not usable if your server is under a DDOS attack

Ansichten

Persönliche Werkzeuge

Navigation

Werkzeuge

Manual Linux Rescue System/en

Aus EUserv Wiki

Version vom 19:08, 13. Jan. 2015

Inhaltsverzeichnis

Activating the Linux Rescue system

Connecting to the Linux Rescue system

Resetting the root password

Preparation

Implementation

Disabling the firewall

Preparation

Implementation

Restoring a faulty software RAID

Checking of MBR / GPT partition

RAID1 with MBR partition

Preparation

Implementation

RAID1 with GPT partition

Preparation

Implementation

RAID5/6 with MBR partition

Preparation

Implementation

RAID5/6 with GPT partition

Preparation

Implementation

Checking / Restoring a faulty filesystem

Checking / Restoring a filesystem of a physical hard disk drive

Checking the filesystem type

ext2/3/4

xfs

Checking the filesystem of a software RAID

Checking the filesystem of a hardware RAID

Checking the hard disk drives

Hard drive check with smartctl / smartmontools

Hard disk drive with smartctl / smartmontools for normal hard disk drives

Hard disk drive with smartctl / smartmontools for hard disk drives on hardware RAID controllers

Evaluation of the results

Reporting errors to the support

Reporting errors of normal hard disk drives

Reporting errors of hard disk drives behind hardware RAID controllers

Hardware RAID

Basics / General information

Checking the status of the controller

3ware RAID controllers

LSI RAID controllers

Checking the status of the hard disk drives

3ware RAID controllers

LSI RAID controllers

Reporting errors to the support

3ware RAID controllers

LSI RAID controllers

Technical Limitations