Project

General

Profile

Backblaze storage pods

In case of storage pod hard disk problems it can be difficult to identify the disk from the error messages. Linux error messages identify disks by ATA number or by sdX such as sdc. For hardware maintenance, disks need to be identified by backplane number and slot.

A report like this allows error messages to be translated to physical location:

Backplane Socket  sdx ata      Serial
        1      1  sdc  ata7.00 Y6N1KE53FTMB
        2      1  sdd  ata8.00 WD-WMC1T1802712
        3      1  sde  ata9.00 WD-WCC4ELZ72L40
        5      1  sdf ata11.00 Y6N1KE56FTMB
        7      1  sdg ata13.00 VDGKJ03D
        8      1  sdh ata14.00 WD-WX21D25R5PP4
       10      1  sdi ata16.00 19P1K5R6FTMB
The report is generated by this bash function. msg is a messaging function which terminates the script when called with E for error
#--------------------------
# Name: generate_report
# Purpose:
#    * Generates SATA backplane usage report
# Usage: generate_report
# Global variable set: none
# Outputs: writes dated report file; removes if same as last one
# Returns:
#   0 on success and warning.  Does not return on error
#--------------------------
function generate_report {
    local buf cmd last_out_fn oIFS out out_dir out_fn
    declare -A ata 

    #  Get the ATA number for each sdX
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # Based on syntaxerror's script in
    # https://serverfault.com/questions/244944/linux-ata-errors-translating-to-a-device-name
    oIFS=$IFS
    while IFS=' ' read Path HostFull sdx
    do
        IFS=: h=($HostFull)
        HostMain=${h[0]}; HostMid=${h[1]}; HostSub=${h[2]}
        if echo $Path | grep -q '/usb[0-9]*/'; then
            msg I "Device $sdx is not an ATA device, it is a USB device" 
        else
            ata["$sdx"]=ata$(< "$Path/host$HostMain/scsi_host/host$HostMain/unique_id").$HostMid$HostSub
        fi  
    done < <(
        for i in /sys/block/sd*
        do  
            readlink $i \
                | sed \
                    -e 's|\.\./devices|/sys/devices|' \
                    -e 's|/host[0-9]\{1,2\}/target| |' \
                    -e 's|/[0-9]\{1,2\}\(:[0-9]\)\{3\}/block/| |'
        done
    )
    IFS=$oIFS

    # Generate report data
    # ~~~~~~~~~~~~~~~~~~~~
    out=$'Backplane Socket  sdx ata      Serial\n'
    out+=$(
        while read backplane socket sdx serno
        do
            msg D "backplane: $backplane, socket: $socket, sdx: $sdx, serno: $serno" 
            ((backplane=backplane-6+1))
            ((socket++))
            printf '%9s %6s %4s %8s %s\n' $backplane $socket $sdx ${ata[$sdx]} $serno
        done < <(
            lshw -class disk 2>&1 \
                | grep -E '^       (bus info|logical name|serial)' \
                | sed -e 's/^[[:space:]]*//' \
                | xargs -L 3 \
                | grep -Ev 'scsi@(0|1):0.0.0' \
                | sed -e 's/bus info: scsi@//' -e 's|logical name: /dev/||' \
                | sed -e 's/serial: //' -e 's/:/ /' -e 's/\.0\.0//' \
                | sort -n
            )
    )

    # Check reports directory
    # ~~~~~~~~~~~~~~~~~~~~~~~
    out_dir=/var/backup/sata_backplane_usage
    buf=$(ck_file "$out_dir" d:rwx 2>&1)
    [[ $buf != '' ]] && msg E "$buf" 

    # Get name of most recent report
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    last_out_fn=$(ls -1rt "$out_dir" | tail -1)
    [[ $last_out_fn != '' ]] && last_out_fn=$out_dir/$last_out_fn

    # Write report to file
    # ~~~~~~~~~~~~~~~~~~~~
    out_fn=$out_dir/$(date +%Y-%m-%d@%H:%M:%S).report
    msg I "Writing report $out_fn" 
    buf=$(echo "$out" > "$out_fn" 2>&1)
    [[ $buf != '' ]] && msg E "Writing report: $buf" 

    # Remove report file if same as previous one
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    if [[ $last_out_fn != '' ]]; then
        cmd=(diff --brief "$last_out_fn" "$out_fn")
        buf=$("${cmd[@]}" 2>&1)
        if (($?==0)); then
            msg I "Removing report $out_fn because identical to the last report" 
            buf=$(rm "$out_fn" 2>&1)
            [[ $buf != '' ]] && msg E "Removing $out_fn: $buf" 
        elif (($?==1)); then
            msg W "SATA backplane usage changed.  Reports in $out_dir" 
        elif (($?==2)); then
            msg E "${cmd[*]}: $buf" 
        fi
    fi

    return 0
}  #  end of function generate_report
The full script is available at A Backblaze storage pod storage management utility for Linux