Backblaze storage pods¶
In case of storage pod hard disk problems it can be difficult to identify the disk from the error messages. Linux error messages identify disks by ATA number or by sdX such as sdc. For hardware maintenance, disks need to be identified by backplane number and slot.
A report like this allows error messages to be translated to physical location:
Backplane Socket sdx ata Serial
1 1 sdc ata7.00 Y6N1KE53FTMB
2 1 sdd ata8.00 WD-WMC1T1802712
3 1 sde ata9.00 WD-WCC4ELZ72L40
5 1 sdf ata11.00 Y6N1KE56FTMB
7 1 sdg ata13.00 VDGKJ03D
8 1 sdh ata14.00 WD-WX21D25R5PP4
10 1 sdi ata16.00 19P1K5R6FTMB
The report is generated by this bash function. msg is a messaging function which terminates the script when called with E for error
#--------------------------
# Name: generate_report
# Purpose:
# * Generates SATA backplane usage report
# Usage: generate_report
# Global variable set: none
# Outputs: writes dated report file; removes if same as last one
# Returns:
# 0 on success and warning. Does not return on error
#--------------------------
function generate_report {
local buf cmd last_out_fn oIFS out out_dir out_fn
declare -A ata
# Get the ATA number for each sdX
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Based on syntaxerror's script in
# https://serverfault.com/questions/244944/linux-ata-errors-translating-to-a-device-name
oIFS=$IFS
while IFS=' ' read Path HostFull sdx
do
IFS=: h=($HostFull)
HostMain=${h[0]}; HostMid=${h[1]}; HostSub=${h[2]}
if echo $Path | grep -q '/usb[0-9]*/'; then
msg I "Device $sdx is not an ATA device, it is a USB device"
else
ata["$sdx"]=ata$(< "$Path/host$HostMain/scsi_host/host$HostMain/unique_id").$HostMid$HostSub
fi
done < <(
for i in /sys/block/sd*
do
readlink $i \
| sed \
-e 's|\.\./devices|/sys/devices|' \
-e 's|/host[0-9]\{1,2\}/target| |' \
-e 's|/[0-9]\{1,2\}\(:[0-9]\)\{3\}/block/| |'
done
)
IFS=$oIFS
# Generate report data
# ~~~~~~~~~~~~~~~~~~~~
out=$'Backplane Socket sdx ata Serial\n'
out+=$(
while read backplane socket sdx serno
do
msg D "backplane: $backplane, socket: $socket, sdx: $sdx, serno: $serno"
((backplane=backplane-6+1))
((socket++))
printf '%9s %6s %4s %8s %s\n' $backplane $socket $sdx ${ata[$sdx]} $serno
done < <(
lshw -class disk 2>&1 \
| grep -E '^ (bus info|logical name|serial)' \
| sed -e 's/^[[:space:]]*//' \
| xargs -L 3 \
| grep -Ev 'scsi@(0|1):0.0.0' \
| sed -e 's/bus info: scsi@//' -e 's|logical name: /dev/||' \
| sed -e 's/serial: //' -e 's/:/ /' -e 's/\.0\.0//' \
| sort -n
)
)
# Check reports directory
# ~~~~~~~~~~~~~~~~~~~~~~~
out_dir=/var/backup/sata_backplane_usage
buf=$(ck_file "$out_dir" d:rwx 2>&1)
[[ $buf != '' ]] && msg E "$buf"
# Get name of most recent report
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
last_out_fn=$(ls -1rt "$out_dir" | tail -1)
[[ $last_out_fn != '' ]] && last_out_fn=$out_dir/$last_out_fn
# Write report to file
# ~~~~~~~~~~~~~~~~~~~~
out_fn=$out_dir/$(date +%Y-%m-%d@%H:%M:%S).report
msg I "Writing report $out_fn"
buf=$(echo "$out" > "$out_fn" 2>&1)
[[ $buf != '' ]] && msg E "Writing report: $buf"
# Remove report file if same as previous one
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
if [[ $last_out_fn != '' ]]; then
cmd=(diff --brief "$last_out_fn" "$out_fn")
buf=$("${cmd[@]}" 2>&1)
if (($?==0)); then
msg I "Removing report $out_fn because identical to the last report"
buf=$(rm "$out_fn" 2>&1)
[[ $buf != '' ]] && msg E "Removing $out_fn: $buf"
elif (($?==1)); then
msg W "SATA backplane usage changed. Reports in $out_dir"
elif (($?==2)); then
msg E "${cmd[*]}: $buf"
fi
fi
return 0
} # end of function generate_report
The full script is available at A Backblaze storage pod storage management utility for Linux