Manual Linux Rescue System/en

Aus EUserv Wiki

(Unterschied zwischen Versionen)
Wechseln zu: Navigation, Suche
(Checking the RAM)
 
Zeile 801: Zeile 801:
Now you can execute the program with the following pattern:
Now you can execute the program with the following pattern:
-
<pre>./make <Amount of memory> <Passes></pre>
+
<pre>./memtester <Amount of memory> <Passes></pre>

Aktuelle Version vom 07:14, 15. Jan. 2020

Using the Linux Rescue system

Inhaltsverzeichnis

Using the Linux Rescue system

Activating the Linux Rescue system

You have to activate the Linux Rescue system via the customer service center. The following Wiki guide shows you how to activate the Linux Rescue system:

Activate the Rescue system


Connecting to the Linux Rescue system

After the activation you can connect with the Linux Rescue system. Here you can find a Wiki guide covering that topic:

Connect with the Rescue system


Resetting the root password

Preparation

You have to connect with the Linux Rescue system to change the root password. Please proceed as follows:

  • Activate the Rescue system via the customer service center.
  • Connect to the Rescue system via SSH.


Implementation

Please proceed as follows to change the root password:

  • Enter your installed system in a chroot environment (replace X with the relevant partition number):
mount /dev/sdaX /mnt/custom          //(root partition)
mount /dev/sdaX /mnt/custom/boot     //(boot partition)

cd /mnt/custom

mount --bind /dev dev
mount --bind /sys sys
mount --bind /proc proc

chroot . /bin/bash
  • Enter the following command as root:
 passwd
  • Enter the new password.
  • Enter the new password again.
  • Exit the chroot environment and unmount the partitions:
exit
umount dev sys proc boot
cd ..
umount custom
  • Deactivate the Rescue system via the customer service center.
  • Perform a web reset via the customer service center.

You have successfully changed the root password. You can now connect to your system with the new assigned password.


Disabling the firewall

Preparation

You have to connect to the Linux Rescue system in order to disable the firewall. Please proceed as follows:

  • Activate the Rescue system via the customer service center.
  • Connect to the Rescue system via SSH.


Implementation

Please proceed as follows to change the root password:

  • Enter your installed system in a chroot environment (replace X with the relevant partition number):
mount /dev/sdaX /mnt/custom          //(root partition)
mount /dev/sdaX /mnt/custom/boot     //(boot partition)

cd /mnt/custom

mount --bind /dev dev
mount --bind /sys sys
mount --bind /proc proc

chroot . /bin/bash

CentOS/Red Hat/Fedora

Enter the following command as root user:

 chkconfig --level 2345 iptables off

Debian/Ubuntu

Enter the following command as root user:

 update-rc.d -f iptables remove


Restoring a faulty software RAID

Checking of MBR / GPT partition

To check if the partition tables of the hard disk drives are formatted as MBR or GPT, proceed as follows:

  • Activate the Rescue system via the customer service center.
  • Connect to the Rescue system via SSH.
  • Check the format of the partition table of your hdd with the following command (Replace X with the hard disk which has to be checked):
parted -s /dev/sdX print

The partition table has the format MBR:

Partition Table: mbr

The partition table has the format GPT:

Partition Table: gpt

RAID1 with MBR partition

Preparation

You have to connect to the Linux Rescue system to restore a faulty software RAID. Please proceed as follows:

  • Activate the Rescue system via the customer service center.
  • Connect to the Rescue system via SSH.
  • Check the format of the partition table: Checking of MBR / GPT partition
  • Check the state of the software RAID with the following command:
 cat /proc/mdstat
An intact RAID1 partition has the status 'U'. That means all involved partitions are ok.

Example output:

Personalities : [raid1]
md3 : active raid1 sda4[0] sdb4[1]
      1847608639 blocks super 1.2 [2/2] [UU]
 
md2 : active raid1 sda3[0] sdb3[1]
      1073740664 blocks super 1.2 [2/2] [UU]
 
md1 : active raid1 sda2[0] sdb2[1]
      524276 blocks super 1.2 [2/2] [UU]
 
md0 : active raid1 sda1[0] sdb1[1]
      8387572 blocks super 1.2 [2/2] [UU]
 
unused devices: <none>
A faulty RAID1 array has the status '_'. That either means one hard disk drive is missing / faulty or RAID partition has failed.

Example output:

Personalities : [raid1]
md3 : active raid1 sda4[0]
      1843414335 blocks super 1.2 [2/1] [U_]

md2 : active raid1 sda3[0]
      1073740664 blocks super 1.2 [2/1] [U_]

md1 : active raid1 sda2[0]
      524276 blocks super 1.2 [2/1] [U_]

md0 : active raid1 sda1[0]
      12581816 blocks super 1.2 [2/1] [U_]

unused devices: <none>
In this example the partitions on the second hard disk drive sdb are not displayed. This indicates either a faulty hard disk drive or RAID partition.

Implementation

In order to restore the RAID, please proceed as follows:

  • Enter the following command (Please follow the sequence carefully! (sfdisk -d source system | sfdisk target system):
 sfdisk -d /dev/sda | sfdisk /dev/sdb
  • Enter the following command to rescan the partition table:
 sfdisk -R /dev/sdb
  • Use the following command to check if the hard disk drives sda and sdb have the same partition sizes:
 cat /proc/partitions
  • If all partitions are present, you can assemble them into the RAID:
mdadm /dev/md0 -a /dev/sdb1
mdadm /dev/md1 -a /dev/sdb2
mdadm /dev/md2 -a /dev/sdb3
mdadm /dev/md3 -a /dev/sdb4

Now the partitions will be restored. This process may take some time depending on the partition size. The status can be queried with the following command:

 cat /proc/mdstat


RAID1 with GPT partition

Preparation

You have to connect to the Linux Rescue system to restore a faulty software RAID. Please proceed as follows:

  • Activate the Rescue system via the customer service center.
  • Connect to the Rescue system via SSH.
  • Check the format of the partition table: Checking of MBR /GPT partition
  • Check the state of the software RAID with the following command:
 cat /proc/mdstat
An intact RAID1 partition has the status 'U'. That means all involved partitions are ok.

Example output:

Personalities : [raid1]
md3 : active raid1 sda4[0] sdb4[1]
      1847608639 blocks super 1.2 [2/2] [UU]
 
md2 : active raid1 sda3[0] sdb3[1]
      1073740664 blocks super 1.2 [2/2] [UU]
 
md1 : active raid1 sda2[0] sdb2[1]
      524276 blocks super 1.2 [2/2] [UU]
 
md0 : active raid1 sda1[0] sdb1[1]
      8387572 blocks super 1.2 [2/2] [UU]
 
unused devices: <none>
A faulty RAID1 array has the status '_'. That either means one hard disk drive is missing / faulty or RAID partition has failed.

Example output:

Personalities : [raid1]
md3 : active raid1 sda4[0]
      1843414335 blocks super 1.2 [2/1] [U_]

md2 : active raid1 sda3[0]
      1073740664 blocks super 1.2 [2/1] [U_]

md1 : active raid1 sda2[0]
      524276 blocks super 1.2 [2/1] [U_]

md0 : active raid1 sda1[0]
      12581816 blocks super 1.2 [2/1] [U_]

unused devices: <none>
In this example the partitions on the second hard disk drive sdb are not displayed. This indicates either a faulty hard disk drive or RAID partition.

Implementation

To restoring the RAID, proceed as follows:

  • Enter the following command to copy the partition table of sda to the new hdd sdb:
sgdisk -R /dev/sdb /dev/sda

Assign a new random UUID to the hdd:

sgdisk -G /dev/sdb

Now the hdd can be mounted to the RAID:

mdadm /dev/md0 -a /dev/sdb1
mdadm /dev/md1 -a /dev/sdb2
mdadm /dev/md2 -a /dev/sdb3
mdadm /dev/md3 -a /dev/sdb4

Now the partitions will be restored. This process may take some time depending on the partition size. The status can be queried with the following command:

cat /proc/mdstat


RAID5/6 with MBR partition

Preparation

You have to connect to the Linux Rescue system in order to restore a RAID5/6. Please proceed as follows:

  • Activate the Rescue system via the customer service center.
  • Connect to the Rescue system via SSH.
  • Check the format of the partition table: Checking of MBR / GPT partition
  • Check the state of the software RAID with the following command:
 cat /proc/mdstat
An intact RAID5 partition has the status 'U'. That means all involved partitions are ok.

Example output:

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 sda7[0] sdc7[2] sdb7[1]
5842954752 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
A faulty RAID5/6 array has the status '_'. That either means one hard disk drive is missing / faulty or RAID partition has failed.
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 sda7[0] sdc7[2] sdb7[1]
5842954752 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]

In this example the fourth hard disk drive in RAID5 is missing.

Implementation

In order to restore the RAID, please proceed as follows:

  • Enter the following command (Please follow carefully the sequence! (sfdisk -d source system | sfdisk target system):
 sfdisk -d /dev/sda | sfdisk /dev/sdd
  • Enter the following command to rescan the partition table:
 sfdisk -R /dev/sdd
  • Use the following command to check if the hard disk drives sda and sdb have the same partition sizes:
 cat /proc/partitions
  • If all partitions are present, you can mount these into the RAID:
mdadm /dev/md0 -a /dev/sdd1
mdadm /dev/md1 -a /dev/sdd2
mdadm /dev/md2 -a /dev/sdd3
mdadm /dev/md3 -a /dev/sdd4

Now the partitions will be restored. This process may take some time depending on the partition size. The status can be queried with the following command:

 cat /proc/mdstat


RAID5/6 with GPT partition

Preparation

You have to connect to the Linux Rescue system in order to restore a RAID5/6. Please proceed as follows:

  • Activate the Rescue system via the customer service center.
  • Connect to the Rescue system via SSH.
  • Check the format of the partition table: Checking of MBR /GPT partition
  • Check the state of the software RAID with the following command:
 cat /proc/mdstat
An intact RAID5 partition has the status 'U'. That means all involved partitions are ok.

Example output:

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 sda7[0] sdc7[2] sdb7[1]
5842954752 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
A faulty RAID5/6 array has the status '_'. That either means one hard disk drive is missing / faulty or RAID partition has failed.
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 sda7[0] sdc7[2] sdb7[1]
5842954752 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]

In this example the fourth hard disk drive in RAID5 is missing.

Implementation

To restore the RAID, proceed as follows:

  • Enter the following command to copy the partition table of sda to the new hdd sdd:
sgdisk -R /dev/sdd /dev/sda

Assign a new random UUID to the hdd:

sgdisk -G /dev/sdd

Now the hdd can be mounted to the RAID:

mdadm /dev/md0 -a /dev/sdd1
mdadm /dev/md1 -a /dev/sdd2
mdadm /dev/md2 -a /dev/sdd3
mdadm /dev/md3 -a /dev/sdd4

Now the partitions will be restored. This process may take some time depending on the partition size. The status can be queried with the following command:

cat /proc/mdstat

Checking / Restoring a faulty filesystem

Checking / Restoring a filesystem of a physical hard disk drive

In order to check the filesystem of a physical hard disk drive you have to connect with the Linux Rescue system. Please proceed as follows:

  • Activate the Rescue system via the customer service center.
  • Connect to the Rescue system via SSH.

Enter the following command to start the check of the filesystem (Replace X with the relevant partition):

 fsck /dev/sdX

fsck performs checking and repairing of a Linux file system.

Important: Don't run fsck on a mounted filesystem!


Checking the filesystem type

Enter the following command to check which filesystem type is used (Replace X with the relevant partition):

 parted -s /dev/sdX print


ext2/3/4

To restore the faulty filesystem from type ext2/3/4, enter the following command (Replace X with the relevant partition):

fsck.ext3 /dev/sdX
fsck.ext2 /dev/sdX
...


xfs

To restore the faulty filesystem from type xfs, enter the following command (Replace X with the relevant partition):

 xfs_check /dev/sdX
 xfs_repair /dev/sdX


Checking the filesystem of a software RAID

In order to check the filesystem of a software RAID you have to connect with the Linux Rescue system. Please proceed as follows:

  • Activate the Rescue system via the customer service center.
  • Connect to the Rescue system via SSH.

Enter the following command to start the check of the filesystem (Replace X with the relevant partition):

 fsck /dev/mdX

fsck performs checking and repairing of a Linux file system.

Important: Don't run fsck on a mounted filesystem!


Checking the filesystem of a hardware RAID

Converting of a filesystem

Converting from ext3 to ext4

Preparation

To convert an existing ext3 filesystem to an ext4 filesystem, you'll have to add the corresponding filesystem features. After changing the options you'll have to run a filesystem check.

For preparation, proceed as follows:

  • Perform a data backup of all(!) important files (including your configuration files under /etc)
  • Determine the exact name of the partition which has to be converted (using the following commands):
sudo fdisk -l
sudo blkid 


The first command will give you a listing of all hard disks and their partitions. The blkid command prints the unique ID of each of those partitions.

Please notice: The partition, which has to be changed needs to be unmounted!

Implementation

To convert an existing ext3 filesystem to an ext4 filesystem, proceed as follows:

  • Add the ext4-specific options with the following command:
sudo tune2fs -O extents,uninit_bg,dir_index /dev/<device_file>

Replace <device_file> with the just determined partition name.

  • Perfom a filesystem check with the following command (Please notice: The partition, which has to be changed needs to be unmounted!)
sudo fsck -fCVD /dev/<device_file>
  • Mount the partition as ext4 with the following command:
sudo mount /dev/<device_file> /mnt 

Replace <device_file> with the name of the partition you just converted.

  • Open the file /mnt/etc/fstab with an editor.
  • Replace the entry "ext3" with "ext4" of the corresponding partition:
/dev/<device_file>    /   ext4 relatime   0   1

Replace <device_file> with the name of the partition you just converted.

  • Save the file, disable the Rescue System and perform a 'normal' system start.

Now the ext4 filesystem will be used.

Checking the hard disk drives

In order to check the hard disk drives you have to connect with the Linux Rescue system. Please proceed as follows:

  • Activate the Rescue system via the customer service center.
  • Connect to the Rescue system via SSH.


Hard drive check with smartctl / smartmontools

Hard disk drive with smartctl / smartmontools for normal hard disk drives

Please proceed as follows in order to check your hard disk drives with smartmontools:

  • Start a short hard disk drive check with the following command (Replace X with the relevant hard disk drive):
 smartctl -t short /dev/sdX
  • Start a long hard disk drive check with the following command (Replace X with the relevant hard disk drive):
 smartctl -t long /dev/sdX


Hard disk drive with smartctl / smartmontools for hard disk drives on hardware RAID controllers

In order to perform a short check for hard disk drives on 3ware hardware RAID controllers please proceed as follows:

  • Enter the following command to start a short test (Replace X with the number of the relevant controller port on which the hard disk drive is connected. Please notice: The first hard disk drive is connected on port 0.):
 smartctl -d 3ware,X -t short /dev/twa0
  • Enter the following command to start a long test:
 smartctl -d 3ware,X -t long /dev/twa0


Evaluation of the results

Enter the following command to display the results of the hard disk drive tests:

 smartctl -l selftest /dev/sdX

The following output example shows that the hard disk drive health is ok:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      4970         
# 2  Long offline        Completed without error       00%      4972

The following output example shows that the hard disk drive health is not ok ("read failure"):

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       20%       717         555027747


Reporting errors to the support

Reporting errors of normal hard disk drives

In order to report errors of the hard disk drive to the support, please specify the output of the following command:

 smartctl -a /dev/sdX


Reporting errors of hard disk drives behind hardware RAID controllers

In order to report errors of the hard disk drive on 3ware RAID controllers to the support, please specify the output of the following command (Replace X with the number of the relevant controller port on which the hard disk drive is connected.):

 smartctl -d 3ware,X -a /dev/twa0


Hardware RAID

Basics / General information

Checking the status of the controller

3ware RAID controllers

In order to check the status of 3ware RAID constrollers, you have to be connected with the Linux Rescue system. Please proceed as follows:

  • Activate the Rescue system via the customer service center:
  • Connect to the Rescue system via SSH.
  • Enter the following command to identify the ID of the controller (usually it is 0):
 dmesg | grep 3ware

The following output is displayed (the controller ID is the number behind scsi):

 [    5.487015] scsi4 : 3ware 9000 Storage Controller
  • Enter the following command to read the hardware RAID controller information (Replace X with the relevant controller ID):
 tw_cli /cX show

The following exmaple output is possible:

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    OK             -       -       -       149.001   RiW    ON     

VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   149.05 GB SATA  0   -            SAMSUNG HD160JJ     
p1    OK             u0   149.05 GB SATA  1   -            SAMSUNG HD160JJ    

In this case the RAID is in a perfect condition.

Unit UnitType Status  %RCmpl  %V/I/M Port Stripe Size(GB)


u0 RAID-1 REBUILDING 23% - - - 149.001 u0-0 DISK DEGRADED - - p0 - 149.001 u0-1 DISK OK - - p1 - 149.001 u0/v0 Volume - - - - - 149.001

In this case the RAID performs a rebuild. The faulty hard disk drive is the one that is connected on port 0.


LSI RAID controllers

In order to check the status of LSI RAID constrollers, you have to be connected with the Linux Rescue system. Please proceed as follows:

  • Activate the Rescue system via the customer service center:
  • Connect to the Rescue system via SSH.
  • Enter the following command:
 megacli -AdpAllInfo -aAll

An output of information about the LSI controller is displayed.


Checking the status of the hard disk drives

3ware RAID controllers

In order to start a check of hard disk drives behind 3ware RAID controllers with smartmontools, please proceed as follows:

  • Enter the following command to start a short test (Replace X with the relevant controller port, on which the hard disk drive is connected. Please note that the first hard disk drive is connected on port 0):
 smartctl -d 3ware,X -a /dev/twa0


LSI RAID controllers

In order to start a check of the hard disk drives behind LSI RAID controllers with smartmontools, please proceed as follows:

  • Enter the following command to identify the device ID of the hard disk drive:
 storcli /c0 /eall /sall show
  • You can access your hard disk drive with the following command (Replace <X> with the relevant hard disk drive and <N> with the device ID):
 smartctl -a -d megaraid,N  /dev/sdX


Reporting errors to the support

3ware RAID controllers

In order to report errors of your hard disk drive behind a 3ware RAID controller to the support specify the output of the following command:

 smartctl -d 3ware,X -a /dev/twa0


LSI RAID controllers

In order to report errors of your hard disk drive behind a LSI RAID controller to the support specify the output of the following command:

 smartctl -a -d megaraid,N  /dev/sdX

Checking the RAM

In order to perform a check of the server's memory the memtester utility can be used. It's available on the EUserv mirror and can be obtained from the following link:

http://mirror.euserv.net/misc/memtester.tar.gz


To perform the check, please proceed as follows:


  • Log in to the Rescue-System
  • Download memtester. Use the following command:
wget http://mirror.euserv.net/misc/memtester.tar.gz
  • Extract the archive. Use the following command:
tar xfz memtester.tar.gz
  • Change to the extracted directory. Use the following command:
cd memtester
  • Compile the program. Use the following command:
make

Now you can execute the program with the following pattern:

./memtester <Amount of memory> <Passes>


The amount of memory can be determined with the command free -m. The respective value can be found under the total column.


Example:

total       used       free     shared    buffers     cached
Mem:          3821       3444        376          3          1       2717
-/+ buffers/cache:        724       3096
Swap:         1953          5       1947

In order to check the memory two times in a row you can use the following command:

./memtester 3821 2