Wait for udev to recreate /dev/PTN entries when calibrating (#762941)

File system specific commands sometimes fail reporting that the
partition specific /dev entry doesn't exist.  Example failing check
operation details:

    Check and repair file system (ext4) on dev/sdb4
      calibrate /dev/sdb4
        path: /dev/sdb4 (partition)
        start: 4196352
        end: 6293503
        size: 2097152 (1.00 GiB)
      check file system on /dev/sdb4 for errors and (if possible) fix them
        e2fsck -f -y -bv -C 0 /dev/sdb4
          e2fsck 1.42.9 (28-Dec-2013)
          e2fsck: No such file or directory while trying to open /dev/sdb4
          Possibly non-existent device?

This has been reproduced on CentOS 7.  Debugging shows that the
libparted calls used to re-read the partition details in
GParted_Core::calibrate_partition() leads to udev removing and re-adding
all the partition /dev entries for the disk.

    # udevadm monitor &
    # gpartedbin
    ...
     16.480662 +12.618659 calibrate_partition()          calling get_device("/dev/sdb", lp_device) ...
     16.483644 +0.002982 calibrate_partition()          get_device() returned
     16.483678 +0.000034 calibrate_partition()          calling get_disk(lp_device, lp_disk) ...
     16.618113 +0.134435 calibrate_partition()          get_disk() returned
    KERNEL[19275.707968] remove   /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb1 (block)
     16.618561 +0.000448 destroy_device_and_disk()      calling ped_disk_destroy(lp_disk) ...
     16.618584 +0.000023 destroy_device_and_disk()      ped_disk_destroy() returned
     16.618591 +0.000007 destroy_device_and_disk()      calling ped_device_destroy(lp_disk) ...
     16.618602 +0.000011 destroy_device_and_disk()      ped_device_destroy() returned
     16.618687 +0.000085 calibrate_partition()          return true
     16.618851 +0.000164 execute_command()              e2fsck -f -y -v -C 0 /dev/sdb4
    KERNEL[19275.708389] remove   /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb2 (block)
    KERNEL[19275.708500] remove   /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb3 (block)
    KERNEL[19275.708643] remove   /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb4 (block)
    KERNEL[19275.768278] change   /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb (block)
    KERNEL[19275.771171] add      /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb1 (block)
    KERNEL[19275.771360] add      /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb2 (block)
    KERNEL[19275.771542] add      /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb3 (block)
    KERNEL[19275.775858] add      /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb4 (block)
    UDEV  [19275.820153] remove   /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb3 (block)
    UDEV  [19275.823152] remove   /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb4 (block)
    UDEV  [19275.828275] remove   /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb1 (block)
     16.742735 +0.123884 execute_command()              exit status 8
    UDEV  [19275.841425] remove   /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb2 (block)
    UDEV  [19275.905478] change   /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb (block)
    UDEV  [19276.013580] add      /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb3 (block)
    UDEV  [19276.034728] add      /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb4 (block)
    UDEV  [19276.174840] add      /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb1 (block)
    UDEV  [19276.237105] add      /devices/pci0000:00/0000:00:0d.0/ata4/host3/target3:0:0/3:0:0:0/block/sdb/sdb2 (block)

So exactly when GParted is running the external e2fsck command, udev is
in the middle of removing and re-adding all the /dev partition entries
for the disk.  Hence the above failure reporting that /dev/sdb4 didn't
exist.  This error depends on the timing between GParted running the
external file system specific command and udev removing and re-adding
the entries, so sometimes it works and sometimes it fails.

Further debugging showed that simply opening and closing the whole disk
device read-write triggers the same removing and re-adding of all the
partition /dev entries with udev >= 219.  Opening the whole disk device
read-write is what libparted has always done until this post
libparted 3.2 patch to make it open read-only when probing:

    http://git.savannah.gnu.org/cgit/parted.git/commit/?id=44d5ae0115c4ecfe3158748309e9912c5aede92d
    libparted: Use read only when probing devices on linux (#1245144)

To fix this simply wait for udev devices to settle in
calibrate_partitions().  The longest I have seen udev take to do this is
0.80 seconds in a VM.  Wait up to 10 seconds as is done in commit() ->
commit_to_os(), also called when applying operations.

On configurations which don't have this issue execution of udevadm
settle, which will return immediately, adds at most 0.1 seconds to the
time taken for the calibrate step.  This won't be noticed in the time
taken of the operation details so there is no point in trying to avoid
executing udevadm settle when not needed.

Bug 762941 - Operations sometimes failing with: No such file or
             directory
This commit is contained in:
Mike Fleetwood 2016-04-14 16:25:37 +01:00 committed by Curtis Gedak
parent a93a678a7b
commit fd9013d5f6
1 changed files with 6 additions and 0 deletions

View File

@ -3521,6 +3521,12 @@ bool GParted_Core::calibrate_partition( Partition & partition, OperationDetail &
destroy_device_and_disk( lp_device, lp_disk ) ;
}
// (#762941) Above libparted partition querying triggers udev >= 219 to
// remove and re-add all the partition specific /dev/ entries. Wait for
// this to complete to avoid FS specific commands failing because they
// happen to run just when the needed /dev/PTN entry doesn't exist.
settle_device( 10 );
operationdetail.get_last_child().set_status( success ? STATUS_SUCCES : STATUS_ERROR );
return success;
}