Procedimientos de Recuperación de Metal Desnudo
Bare metal recovery enables complete server restoration from scratch, including bootloader, filesystem, and all data. This guide covers full system backup strategies, bootable rescue environments, GRUB restoration, LVM recovery, and network boot procedures for comprehensive infrastructure disaster recovery.
Tabla de Contenidos
- Bare Metal Recovery Overview
- Full System Backup Strategies
- Creating Bootable Rescue Environments
- GRUB Restoration
- LVM Recovery
- Network Boot Recovery
- Automated Recovery
- Recovery Testing
- Conclusion
Descripción General de Recuperación de Metal Desnudo
Bare metal recovery requires:
- Hardware: Target machine or spare hardware
- Bootable Media: USB, CD, or network boot
- System Image: Complete disk backup including partitions
- Recovery Tools: Imaging software (dd, tar, Clonezilla)
- Boot Configuration: GRUB/bootloader restoration
- Testing: Verified procedures before disaster
# Bare metal recovery capabilities matrix
cat > /tmp/recovery-matrix.txt << 'EOF'
Recovery Method | Recovery Time | Data Integrity | Hardware Match
----------------|---------------|----------------|----------------
Full disk image | 30-60 min | Exact | Required
Filesystem tar | 1-4 hours | Very high | Not required
Incremental bkp | Variable | High | Not required
Network clone | 30-60 min | Exact | Preferred
Logical restore | 4-24 hours | Good | Not required
Best choice depends on:
- RTO (Recovery Time Objective)
- RPO (Recovery Point Objective)
- Hardware availability
- Data consistency requirements
EOF
cat /tmp/recovery-matrix.txt
Estrategias de Copia de Seguridad de Sistema Completo
Usar dd para Copia de Seguridad de Disco
# Create full disk image with dd
create_disk_image_dd() {
local source_device=$1
local image_file=$2
local block_size=4M
echo "Creating disk image: $source_device -> $image_file"
# Check if device exists
if [ ! -b "$source_device" ]; then
echo "Error: Device not found: $source_device"
return 1
fi
# Create image with progress
dd if="$source_device" \
of="$image_file" \
bs="$block_size" \
conv=noerror,sync \
status=progress
if [ $? -eq 0 ]; then
echo "✓ Disk image created successfully"
ls -lh "$image_file"
else
echo "✗ Disk image creation failed"
return 1
fi
}
# Restore disk image with dd
restore_disk_image_dd() {
local image_file=$1
local target_device=$2
echo "WARNING: This will overwrite device: $target_device"
read -p "Are you sure? (type 'yes' to continue): " confirmation
if [ "$confirmation" != "yes" ]; then
echo "Cancelled"
return 1
fi
echo "Restoring disk image to: $target_device"
dd if="$image_file" \
of="$target_device" \
bs=4M \
conv=noerror,sync \
status=progress
if [ $? -eq 0 ]; then
echo "✓ Disk restoration complete"
sync
else
echo "✗ Restoration failed"
return 1
fi
}
# Compress disk image for storage
compress_disk_image() {
local image_file=$1
local compression_level=6
echo "Compressing disk image..."
# Use gzip for compression
gzip -$compression_level "$image_file"
if [ $? -eq 0 ]; then
echo "✓ Compression complete"
ls -lh "${image_file}.gz"
fi
}
Usar tar para Copia de Seguridad de Sistema de Archivos
# Create incremental filesystem backup
create_filesystem_backup_tar() {
local source_dir=$1
local backup_file=$2
echo "Creating filesystem backup: $source_dir -> $backup_file"
# Create comprehensive backup excluding unnecessary files
tar \
--exclude='/proc/*' \
--exclude='/sys/*' \
--exclude='/dev/*' \
--exclude='/tmp/*' \
--exclude='/var/tmp/*' \
--exclude='/var/cache/*' \
--exclude='/var/log/*' \
--exclude='/run/*' \
--exclude='/mnt/*' \
--exclude='/backup/*' \
--exclude='lost+found' \
--exclude='.cache' \
--exclude='*.swp' \
-czf "$backup_file" \
-C / \
. 2>&1 | tee /tmp/tar-backup.log
if [ $? -eq 0 ]; then
echo "✓ Filesystem backup complete"
du -sh "$backup_file"
fi
}
# Restore filesystem from tar backup
restore_filesystem_tar() {
local backup_file=$1
local restore_mount=$2
echo "Restoring filesystem from: $backup_file"
echo "Restore location: $restore_mount"
# Verify mount point exists
if [ ! -d "$restore_mount" ]; then
echo "Error: Mount point does not exist: $restore_mount"
return 1
fi
# Extract backup
tar -xzf "$backup_file" -C "$restore_mount" \
--warning=no-file-changed \
--numeric-owner
if [ $? -eq 0 ]; then
echo "✓ Filesystem restoration complete"
else
echo "✗ Restoration failed"
return 1
fi
}
# Incremental backup tracking
track_incremental_backups() {
local snapshot_file="/var/lib/backup/.last_backup"
local backup_dir="/backup/incremental"
mkdir -p "$backup_dir"
if [ ! -f "$snapshot_file" ]; then
touch "$snapshot_file"
fi
# Create incremental backup of changed files
tar -czf "$backup_dir/incremental-$(date +%Y%m%d_%H%M%S).tar.gz" \
--newer-mtime-than "$snapshot_file" \
--exclude-from=/tmp/tar-excludes.txt \
/
# Update snapshot
touch "$snapshot_file"
}
Verificación de Copia de Seguridad Basada en Hash
# Verify backup integrity using checksums
verify_backup_integrity() {
local backup_file=$1
local checksum_file="${backup_file}.sha256"
echo "Verifying backup integrity..."
# Create checksum if it doesn't exist
if [ ! -f "$checksum_file" ]; then
sha256sum "$backup_file" > "$checksum_file"
echo "Checksum created: $checksum_file"
fi
# Verify checksum
if sha256sum -c "$checksum_file" --quiet; then
echo "✓ Backup integrity verified"
return 0
else
echo "✗ Backup corruption detected!"
return 1
fi
}
Crear Entornos de Rescate Arrancables
Crear USB de Rescate Arrancable
# Create comprehensive rescue environment on USB
create_rescue_usb() {
local usb_device=$1 # e.g., /dev/sdb
local iso_file=${2:-"systemrescue-10.00.iso"}
echo "Creating rescue environment on: $usb_device"
# Verify device is actually a USB drive
if [ ! -b "$usb_device" ]; then
echo "Error: Not a valid device: $usb_device"
return 1
fi
# Unmount if mounted
umount "${usb_device}"* 2>/dev/null
# Write ISO to USB
dd if="$iso_file" of="$usb_device" bs=4M status=progress
sync
if [ $? -eq 0 ]; then
echo "✓ Rescue USB created successfully"
fi
}
# Download SystemRescueCD
download_rescue_iso() {
local download_dir="/tmp/rescue-iso"
mkdir -p "$download_dir"
echo "Downloading SystemRescueCD..."
# Get latest version
wget -O "$download_dir/systemrescue.iso" \
https://download.system-rescue.org/systemrescue-10.00.iso
echo "✓ Downloaded to: $download_dir/systemrescue.iso"
}
Construir Entorno de Rescate Personalizado
# Create custom rescue environment with required tools
build_custom_rescue_env() {
local rescue_dir="/opt/rescue"
mkdir -p "$rescue_dir/bin"
mkdir -p "$rescue_dir/sbin"
mkdir -p "$rescue_dir/lib64"
echo "Building custom rescue environment..."
# Copy essential utilities
local essential_tools=(
"/bin/bash"
"/bin/ls"
"/bin/cp"
"/bin/dd"
"/bin/mount"
"/bin/umount"
"/sbin/fsck"
"/sbin/mkfs"
"/sbin/grub-install"
"/usr/bin/rsync"
"/usr/bin/curl"
"/usr/bin/wget"
"/usr/sbin/parted"
"/usr/sbin/fdisk"
)
for tool in "${essential_tools[@]}"; do
if [ -f "$tool" ]; then
cp --parents "$tool" "$rescue_dir/"
fi
done
# Copy shared libraries
ldd "${essential_tools[@]}" | grep "=> /" | awk '{print $3}' | sort -u | while read lib; do
cp --parents "$lib" "$rescue_dir/" 2>/dev/null
done
echo "✓ Custom rescue environment created"
}
Restauración de GRUB
Restaurar Bootloader GRUB
# Restore GRUB bootloader to MBR or UEFI
restore_grub_bootloader() {
local target_device=$1
local boot_partition=${2:-"${target_device}1"}
local grub_root_partition=${3:-"${target_device}2"}
echo "Restoring GRUB bootloader"
echo "Target device: $target_device"
echo "Boot partition: $boot_partition"
echo "Root partition: $grub_root_partition"
# Mount root partition
local temp_root="/mnt/restore-root"
mkdir -p "$temp_root"
mount "$grub_root_partition" "$temp_root"
# Mount boot partition if separate
if [ "$boot_partition" != "$grub_root_partition" ]; then
mkdir -p "$temp_root/boot"
mount "$boot_partition" "$temp_root/boot"
fi
# Mount essential filesystems
mount -B /dev "$temp_root/dev"
mount -t proc proc "$temp_root/proc"
mount -t sysfs sys "$temp_root/sys"
mount -t efivarfs efivarfs "$temp_root/sys/firmware/efi/efivars" 2>/dev/null
# Chroot and reinstall GRUB
echo "Installing GRUB to $target_device..."
chroot "$temp_root" /bin/bash -c "grub-install --target=x86_64-pc $target_device"
if [ $? -eq 0 ]; then
echo "✓ GRUB installed successfully"
else
echo "✗ GRUB installation failed"
fi
# Reinstall GRUB for UEFI if applicable
if [ -d "$temp_root/sys/firmware/efi" ]; then
echo "Restoring GRUB for UEFI..."
chroot "$temp_root" /bin/bash -c "grub-install --target=x86_64-efi --efi-directory=/boot/efi"
fi
# Update GRUB configuration
echo "Updating GRUB configuration..."
chroot "$temp_root" /bin/bash -c "grub-mkconfig -o /boot/grub/grub.cfg"
# Unmount
umount "$temp_root"/sys/firmware/efi/efivars 2>/dev/null
umount "$temp_root"/{sys,proc,dev,boot,}
echo "GRUB restoration complete"
}
# Restore GRUB from rescue environment
restore_grub_from_rescue() {
local root_device=$1
local boot_device=$2
echo "Restoring GRUB from rescue environment"
# Create temporary root
mkdir -p /mnt/recovered
# Mount recovered filesystem
mount "$root_device" /mnt/recovered
# Bind mount important directories
mount --rbind /dev /mnt/recovered/dev
mount --rbind /proc /mnt/recovered/proc
mount --rbind /sys /mnt/recovered/sys
# Enter chroot
chroot /mnt/recovered /bin/bash -c "grub-install $boot_device && grub-mkconfig -o /boot/grub/grub.cfg"
# Cleanup
umount -l /mnt/recovered/{dev,proc,sys}
umount /mnt/recovered
}
Reparar Configuración de GRUB
# Fix GRUB configuration after recovery
fix_grub_config() {
local root_mount=$1
echo "Fixing GRUB configuration..."
# Common GRUB issues and fixes
local grub_config="$root_mount/boot/grub/grub.cfg"
if [ ! -f "$grub_config" ]; then
echo "GRUB config not found, regenerating..."
chroot "$root_mount" grub-mkconfig -o /boot/grub/grub.cfg
fi
# Verify UUID references are correct
echo "Verifying filesystem UUIDs..."
local root_uuid=$(blkid -s UUID -o value "$root_mount")
if ! grep -q "UUID=$root_uuid" "$grub_config"; then
echo "Updating UUID references..."
chroot "$root_mount" grub-mkconfig -o /boot/grub/grub.cfg
fi
echo "✓ GRUB configuration verified"
}
Recuperación de LVM
Recuperar Volúmenes LVM
# Recover data from LVM volumes
recover_lvm_volumes() {
echo "Scanning for LVM volumes..."
# Scan for physical volumes
pvscan
# Scan for volume groups
vgscan
# List all logical volumes
lvdisplay
# Activate volume groups
vgchange -ay
# Mount recovered volumes
local lv_path="/dev/vg0/lv_root"
local mount_point="/mnt/recovered"
mkdir -p "$mount_point"
echo "Mounting LVM volume: $lv_path"
mount "$lv_path" "$mount_point"
if [ $? -eq 0 ]; then
echo "✓ LVM volume mounted successfully"
df -h "$mount_point"
fi
}
# Restore LVM configuration from backup
restore_lvm_config() {
local backup_file=$1
echo "Restoring LVM configuration from: $backup_file"
# Export current LVM state
vgcfgbackup -f /tmp/vg.backup
# Restore from backup
vgcfgrestore -f "$backup_file" 2>&1 | head -20
# Activate restored volume group
vgchange -ay
# Verify restoration
lvdisplay
}
# Create LVM backup for recovery
create_lvm_backup() {
local backup_dir="/backup/lvm"
mkdir -p "$backup_dir"
echo "Creating LVM configuration backup..."
# Backup physical volume info
pvdisplay > "$backup_dir/pv-info.txt"
# Backup volume group info
vgdisplay > "$backup_dir/vg-info.txt"
# Backup logical volume info
lvdisplay > "$backup_dir/lv-info.txt"
# Backup LVM metadata
vgcfgbackup -f "$backup_dir/vg-config.backup"
echo "✓ LVM backup created in: $backup_dir"
}
Recuperación de Boot de Red
Configurar PXE Boot para Recuperación
# Configure PXE boot for bare metal recovery
setup_pxe_recovery() {
local tftp_root="/var/lib/tftpboot"
local pxelinux_cfgdir="$tftp_root/pxelinux.cfg"
echo "Setting up PXE recovery environment..."
# Create directory structure
mkdir -p "$pxelinux_cfgdir"
# Copy PXE bootloader
cp /usr/lib/syslinux/pxelinux.0 "$tftp_root/"
cp /usr/lib/syslinux/ldlinux.c32 "$tftp_root/"
# Create default PXE configuration
cat > "$pxelinux_cfgdir/default" << 'EOF'
DEFAULT menu.c32
PROMPT 0
MENU TITLE System Recovery Boot Menu
TIMEOUT 300
LABEL SystemRescue
MENU LABEL System Rescue Environment
LINUX systemrescue.kernel
APPEND initrd=systemrescue.initramfs root=live:/systemrescue.iso
LABEL ClonezillaRestore
MENU LABEL Clonezilla (Restore)
LINUX clonezilla/vmlinuz
APPEND initrd=clonezilla/initrd.img boot=live live-config noswap nolocales edd=on nomodeset vga=788
LABEL DebianInstaller
MENU LABEL Debian Installer
LINUX debian-installer/amd64/linux
APPEND initrd=debian-installer/amd64/initrd.gz
EOF
echo "✓ PXE recovery environment configured"
}
# Boot system from network for recovery
boot_from_network() {
echo "Instructions for network boot recovery:"
echo ""
echo "1. Power on bare metal machine"
echo "2. Enter BIOS/UEFI setup (typically F2, Del, or Esc)"
echo "3. Set boot priority:"
echo " - Primary: Network/PXE"
echo " - Secondary: Local disk"
echo "4. Save and exit"
echo "5. Machine will boot from PXE"
echo ""
echo "Select recovery option from boot menu"
}
Recuperación Automatizada
Recuperación Automatizada Script
# Automated bare metal recovery with minimal intervention
cat > /usr/local/bin/automated-recovery.sh << 'EOF'
#!/bin/bash
RECOVERY_MODE=${1:-"interactive"}
SOURCE_IMAGE=${2:-""}
TARGET_DEVICE=${3:-""}
# Configuration
RECOVERY_LOG="/var/log/bare-metal-recovery.log"
BACKUP_LOCATION="/backup"
log_recovery() {
echo "[$(date)] $1" | tee -a "$RECOVERY_LOG"
}
# Step 1: Identify recovery image
find_recovery_image() {
if [ -n "$SOURCE_IMAGE" ]; then
log_recovery "Using specified image: $SOURCE_IMAGE"
return 0
fi
echo "Available backup images:"
ls -lh "$BACKUP_LOCATION"/*.img* 2>/dev/null || \
ls -lh "$BACKUP_LOCATION"/*.tar.gz 2>/dev/null || \
echo "No images found"
read -p "Enter backup image path: " SOURCE_IMAGE
if [ ! -f "$SOURCE_IMAGE" ]; then
log_recovery "Error: Image not found"
return 1
fi
}
# Step 2: Identify target device
find_target_device() {
if [ -n "$TARGET_DEVICE" ]; then
log_recovery "Using target device: $TARGET_DEVICE"
return 0
fi
echo "Available devices:"
lsblk -d -o NAME,SIZE,TYPE
read -p "Enter target device (e.g., /dev/sda): " TARGET_DEVICE
if [ ! -b "$TARGET_DEVICE" ]; then
log_recovery "Error: Invalid device"
return 1
fi
}
# Step 3: Verify backup
verify_backup() {
log_recovery "Verifying backup image..."
if [[ "$SOURCE_IMAGE" == *.gz ]]; then
gzip -t "$SOURCE_IMAGE" || return 1
fi
log_recovery "✓ Backup verification passed"
}
# Step 4: Perform recovery
perform_recovery() {
log_recovery "Starting recovery to: $TARGET_DEVICE"
if [[ "$SOURCE_IMAGE" == *.img.gz ]]; then
log_recovery "Restoring disk image..."
gunzip < "$SOURCE_IMAGE" | dd of="$TARGET_DEVICE" bs=4M status=progress
elif [[ "$SOURCE_IMAGE" == *.tar.gz ]]; then
log_recovery "Restoring filesystem from tar..."
mkdir -p /mnt/recovery
mount "$TARGET_DEVICE" /mnt/recovery
tar -xzf "$SOURCE_IMAGE" -C /mnt/recovery
umount /mnt/recovery
fi
sync
log_recovery "✓ Recovery complete"
}
# Step 5: Restore GRUB
restore_bootloader() {
log_recovery "Restoring GRUB bootloader..."
# Attempt automatic GRUB restoration
mkdir -p /mnt/recovery
mount "$TARGET_DEVICE" /mnt/recovery
mount --rbind /dev /mnt/recovery/dev
mount --rbind /proc /mnt/recovery/proc
mount --rbind /sys /mnt/recovery/sys
chroot /mnt/recovery grub-install "$TARGET_DEVICE"
chroot /mnt/recovery grub-mkconfig -o /boot/grub/grub.cfg
umount -l /mnt/recovery/{dev,proc,sys}
umount /mnt/recovery
log_recovery "✓ Bootloader restored"
}
main() {
log_recovery "Starting automated bare metal recovery"
if [ "$RECOVERY_MODE" = "interactive" ]; then
find_recovery_image || exit 1
find_target_device || exit 1
fi
verify_backup || exit 1
read -p "WARNING: This will overwrite $TARGET_DEVICE. Continue? (yes/no): " confirm
[ "$confirm" = "yes" ] || exit 1
perform_recovery || exit 1
restore_bootloader || exit 1
log_recovery "Recovery process complete"
echo "System ready for reboot"
}
main
EOF
chmod +x /usr/local/bin/automated-recovery.sh
Pruebas de Recuperación
Probar Procedimientos de Recuperación Regularmente
# Test bare metal recovery procedure
test_recovery_procedure() {
local test_image="/backup/test-recovery.img.gz"
local test_vm_device="/dev/loop0"
echo "Testing bare metal recovery procedure..."
# Create test environment
mkdir -p /tmp/recovery-test
cd /tmp/recovery-test
# Create loop device for testing
dd if=/dev/zero of=test-disk.img bs=1G count=10
losetup "$test_vm_device" test-disk.img
# Perform recovery
if gunzip < "$test_image" | dd of="$test_vm_device" bs=4M; then
echo "✓ Image restoration successful"
else
echo "✗ Image restoration failed"
return 1
fi
# Verify filesystem
fsck -n "$test_vm_device"
# Cleanup
losetup -d "$test_vm_device"
rm test-disk.img
echo "✓ Recovery test completed"
}
# Document recovery procedure
document_recovery_steps() {
cat > /tmp/recovery-runbook.txt << 'EOF'
==============================================
BARE METAL RECOVERY RUNBOOK
==============================================
Prerequisites:
- Bootable rescue media (USB or CD)
- Latest backup image
- Recovery procedure documentation
Recovery Steps:
1. Boot Server into Rescue Environment
- Plug in bootable USB/CD
- Power on server
- Select boot device from BIOS menu
- Wait for rescue environment to load
2. Identify Disks and Partitions
lsblk
fdisk -l
3. Locate Recovery Image
- Check network storage: NFS, SAMBA
- Download from backup server if needed
ls -lh /mnt/backup/
4. Restore System Image
gunzip < backup.img.gz | dd of=/dev/sda bs=4M status=progress
OR
tar -xzf backup.tar.gz -C /mnt/recovered
5. Restore GRUB Bootloader
mount /dev/sda1 /mnt/recovered
mount --bind /dev /mnt/recovered/dev
chroot /mnt/recovered
grub-install /dev/sda
grub-mkconfig -o /boot/grub/grub.cfg
exit
6. Verify Recovery
fsck -n /dev/sda1
mount /dev/sda1 /mnt/verify
ls /mnt/verify/
7. Reboot
reboot
Expected Recovery Time: 30-60 minutes
Success Indicators:
- System boots without errors
- All filesystems mount
- Services start correctly
- Network connectivity confirmed
EOF
cat /tmp/recovery-runbook.txt
}
Conclusión
Bare metal recovery requires:
- Complete Backups: Full disk or filesystem images at regular intervals
- Bootable Media: Rescue environment ready for immediate use
- GRUB Knowledge: Ability to restore bootloader if needed
- LVM Understanding: Recovery of logical volumes if applicable
- Network Boot: PXE configured as fallback recovery method
- Regular Testing: Proven procedures before actual disaster
Maintain recovery procedures with your infrastructure:
- Update backup media when hardware changes
- Document custom configurations
- Test recovery at least quarterly
- Keep multiple backup copies in different locations
The ultimate goal is to reduce MTTR (Mean Time To Recovery) and ensure RTO/RPO targets are met.


