Fragment of a failed CI test job from a GiLab job runner which didn't
allow creation of block special devices looked like:
$ tests/makedev.sh
mknod -m 0660 /dev/sda b 8 0
mknod: '/dev/sda': Operation not permitted
chown: cannot access '/dev/sda': No such file or directory
mknod -m 0660 /dev/sda1 b 8 1
mknod: '/dev/sda1': Operation not permitted
chown: cannot access '/dev/sda1': No such file or directory
mkdir: created directory '/dev/disk'
mkdir: created directory '/dev/disk/by-id/'
'/dev/disk/by-id/gparted-sda' -> '/dev/sda'
test/makedev.sh attempted to create two block devices it wanted for
testing, but that failed with "Operation not permitted". It then
created dangling symbolic link /dev/disk/by-id/gparted-sda -> /dev/sda;
gparted-sda pointed to a name which didn't exist.
Despite the previous commit testing and skipping every test where the
block device doesn't exist this unit test still failed:
[ RUN ] BlockSpecialTest.NamedBlockSpecialObjectBySymlinkMatches
test_BlockSpecial.cc:186: Failure
Failed
follow_link_name(): Failed to resolve symbolic link '/dev/disk/by-id/gparted-sda'
test_BlockSpecial.cc:271: Skip test. Block device '' does not exist
[ FAILED ] BlockSpecialTest.NamedBlockSpecialObjectBySymlinkMatches (0 ms)
The unit test called get_link_name() which read the directory
/dev/disk/by-id and found symbolic link gparted-sda. It then called
follow_link_name() passing /dev/disk/by-id/gparted-sda which used
realpath(3) to get the canonicalised absolute pathname, which includes
following links. But as gparted-sda pointed to a non-existent file it
failed and reported message "Failed to resolve symbolic link ...". Then
after that the unit test skips the non-existent block device, but the
test has already failed at that point.
Fix the unit test by also checking the symbolic link points to an
existing block device before calling follow_link_name() on it. This
works because SKIP_IF_BLOCK_DEVICE_DOESNT_EXIST() uses stat(3), which
follows symbolic links, in it's verification.
Also put SKIP_IF_BLOCK_DEVICE_DOESNT_EXIST() immediately after each
initialisation of a block device name for some sort of consistency with
it's need in this fixed NamedBlockSpecialObjectBySymlinkMatches unit
test.
Closed!113 - Fix occasional GitLab CI test jobs failures on
BlockSpecial unit tests
Since November 2022 test_BlockSpecial has been occasionally failing in
GNOME GitLab Docker CI test jobs like this:
[ RUN ] BlockSpecialTest.NamedBlockSpecialObjectBlockDevice
test_BlockSpecial.cc:216: Failure
Value of: bs.m_major > 0 || bs.m_minor > 0
Actual: false
Expected: true
[ FAILED ] BlockSpecialTest.NamedBlockSpecialObjectBlockDevice (0 ms)
...
[ RUN ] BlockSpecialTest.TwoNamedBlockSpecialObjectBlockDevices
test_BlockSpecial.cc:244: Failure
Value of: bs1.m_major != bs2.m_major || bs1.m_minor != bs2.m_minor
Actual: false
Expected: true
[ FAILED ] BlockSpecialTest.TwoNamedBlockSpecialObjectBlockDevices (0 ms)
[ RUN ] BlockSpecialTest.NamedBlockSpecialObjectBySymlinkMatches
test_BlockSpecial.cc:170: Failure
Failed
follow_link_name(): Failed to resolve symbolic link '/dev/disk/by-id/gparted-sda'
[ FAILED ] BlockSpecialTest.NamedBlockSpecialObjectBySymlinkMatches (0 ms)
...
3 FAILED TESTS
FAIL test_BlockSpecial (exit status: 1)
As identified previously [1] the Docker CI images no longer have any
block devices in /dev. test/makedev.sh script was added to create block
devices test_BlockSpecial needs for it's testing. Now a subset of the
GNOME GitLab job runners additionally prevent creation of block special
device nodes. test/makedev.sh reports this:
$ tests/makedev.sh
mknod -m 0660 /dev/sda b 8 0
mknod: /dev/sda: Operation not permitted
chown: cannot access '/dev/sda': No such file or directory
mknod -m 0660 /dev/sda1 b 8 1
mknod: /dev/sda1: Operation not permitted
chown: cannot access '/dev/sda1': No such file or directory
Alternative rejected solutions:
1. Use fakeroot [2]. Package is available for the 3 distributions used
in CI jobs. Does fake stat() call. Works when run like this in the
CI test jobs:
fakeroot -s test/fakeroot.env tests/makedev.sh
fakeroot -i test/fakeroot.env make check
fakeroot -i test/fakeroot.env make distcheck
But if you run fakeroot ... make check on our development machines
as a non-root user it causes the test_SupportedFileSystems unit
tests which use losetup to fail. This is because
test_SupportedFileSystems thinks it's root inside the fakeroot
environment but fakeroot doesn't fake enough for losetup to work.
This makes running tests in the GitLab CI jobs different from how we
would have to run them on our development machines. Prefer not to
do that.
2. Use GNU ld --wrap [3] to call our own __wrap_stat() allowing
test_BlockSpecial to provide mocked results to the stat() call in
constructor BlockSpecial::BlockSpecial(). This works with
GNU C Library >= 2.33, released 01-Feb-2021, and musl libc,
therefore it works on CI tested distributions Ubuntu LTS >= 22.04
and Alpine Linux respectively. However this fails on earlier glibc
releases, so will fail on CentOS 7 CI image, as the compiler emits a
call to __xstat() rather than stat(). This is something to do with
how glibc's /usr/include/sys/stat.h supported multiple versions of
stat(). Don't use this as it's doesn't work everywhere.
Additional useful implementation hints. [4][5]
Choose to fix by just skipping unit tests which need block special names
to exist in the file system, but don't exist. This is the same
technique that test_SupportedFileSystems uses. So tests/makedev.sh
creates block devices if it can in the GNOME GitLab CI test images [1]
and now if that fails the individual unit tests are skipped.
[1] 57983b9fc2
Create block special devices needed by test_BlockSpecial in GitLab
CI jobs (!59)
[2] FakeRoot
https://wiki.debian.org/FakeRoot
[3] ld(1) - Linux manual page
https://man7.org/linux/man-pages/man1/ld.1.html
"--wrap=symbol
Use a wrapper function for symbol. Any undefined reference to
symbol will be resolved to "__wrap_symbol". Any undefined
reference to "__real_symbol" will be resolved to symbol.
This can be used to provide a wrapper for a system function.
The wrapper function should be called "__wrap_symbol". If it
wishes to call the system function, it should call
"__real_symbol".
...
"
[4] gcc: error: unrecognized option --wrap
https://stackoverflow.com/questions/33278164/gcc-error-unrecognized-option-wrap
[5] C++ ld linker --wrap option does not work for internal function calls
https://stackoverflow.com/questions/44464961/c-ld-linker-wrap-option-does-not-work-for-internal-function-callsClosed!113 - Fix occasional GitLab CI test jobs failures on
BlockSpecial unit tests
We have to copy the dentry->d_name before calling closedir(). If not,
the string points to nothing and the test fails (It does not fail all
the time, but only by chance).
Confirmed using valgrind. Selected output from running the unit test
under valgrind:
$ valgrind --track-origins=yes ./test_blockSpecial
==25110== Memcheck, a memory error detector
...
==25110== Command: ./test_BlockSpecial
==25110==
Running main() from src/gtest_main.cc
[==========] Running 26 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 26 tests from BlockSpecialTest
...
[ RUN ] BlockSpecialTest.NamedBlockSpecialObjectBySymlinkMatches
==25110== Invalid read of size 1
==25110== at 0x4C2C9B2: strlen (vg_replace_strmem.c:458)
==25110== by 0x40E7C4: length (char_traits.h:259)
==25110== by 0x40E7C4: append (basic_string.h:1009)
==25110== by 0x40E7C4: operator+<char, std::char_traits<char>, std::allocator<char> > (basic_string.h:2468)
==25110== by 0x40E7C4: get_link_name (test_BlockSpecial.cc:156)
==25110== by 0x40E7C4: GParted::BlockSpecialTest_NamedBlockSpecialObjectBySymlinkMatches_Test::TestBody() (test_BlockSpecial.cc:247)
...
=25110== Address 0x1231ea93 is 115 bytes inside a block of size 32,816 free'd
==25110== at 0x4C2ACBD: free (vg_replace_malloc.c:530)
==25110== by 0x9F773AC: closedir (in /usr/lib64/libc-2.17.so)
==25110== by 0x40E7AC: get_link_name (test_BlockSpecial.cc:153)
==25110== by 0x40E7AC: GParted::BlockSpecialTest_NamedBlockSpecialObjectBySymlinkMatches_Test::TestBody() (test_BlockSpecial.cc:247)
...
==25110== Block was alloc'd at
==25110== at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==25110== by 0x9F77280: __alloc_dir (in /usr/lib64/libc-2.17.so)
==25110== by 0x40E746: get_link_name (test_BlockSpecial.cc:134)
==25110== by 0x40E746: GParted::BlockSpecialTest_NamedBlockSpecialObjectBySymlinkMatches_Test::TestBody() (test_BlockSpecial.cc:247)
...
==25110== Invalid read of size 1
==25110== at 0x4C2E220: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1022)
==25110== by 0x953A997: std::string::append(char const*, unsigned long) (in /usr/lib64/libstdc++.so.6.0.19)
==25110== by 0x40E7D2: append (basic_string.h:1009)
==25110== by 0x40E7D2: operator+<char, std::char_traits<char>, std::allocator<char> > (basic_string.h:2468)
==25110== by 0x40E7D2: get_link_name (test_BlockSpecial.cc:156)
==25110== by 0x40E7D2: GParted::BlockSpecialTest_NamedBlockSpecialObjectBySymlinkMatches_Test::TestBody() (test_BlockSpecial.cc:247)
...
==25110== Address 0x1231ea93 is 115 bytes inside a block of size 32,816 free'd
==25110== at 0x4C2ACBD: free (vg_replace_malloc.c:530)
==25110== by 0x9F773AC: closedir (in /usr/lib64/libc-2.17.so)
==25110== by 0x40E7AC: get_link_name (test_BlockSpecial.cc:153)
==25110== by 0x40E7AC: GParted::BlockSpecialTest_NamedBlockSpecialObjectBySymlinkMatches_Test::TestBody() (test_BlockSpecial.cc:247)
...
==25110== Block was alloc'd at
==25110== at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==25110== by 0x9F77280: __alloc_dir (in /usr/lib64/libc-2.17.so)
==25110== by 0x40E746: get_link_name (test_BlockSpecial.cc:134)
==25110== by 0x40E746: GParted::BlockSpecialTest_NamedBlockSpecialObjectBySymlinkMatches_Test::TestBody() (test_BlockSpecial.cc:247)
...
[ OK ] BlockSpecialTest.NamedBlockSpecialObjectBySymlinkMatches (50 ms)
Selected lines from test_BlockSpecial.cc:
132 static std::string get_link_name()
133 {
134 DIR * dir = opendir( "/dev/disk/by-id" );
...
141 bool found = false;
142 struct dirent * dentry;
143 // Silence GCC [-Wparentheses] warning with double parentheses
144 while ( ( dentry = readdir( dir ) ) )
145 {
146 if ( strcmp( dentry->d_name, "." ) != 0 &&
147 strcmp( dentry->d_name, ". " ) != 0 )
148 {
149 found = true;
150 break;
151 }
152 }
153 closedir( dir );
154
155 if ( found )
156 return std::string( "/dev/disk/by-id/" ) + dentry->d_name;
So the memory referred to by dentry was allocated on line 134, freed on
153 and accessed after freed on 156.
Closes!41 - Fix test (dentry->d_name is invalidated by closedir...)
Found that older but still supported distributions Debian 8 and
Ubuntu 14.04 LTS don't have directory /dev/disk/by-path/. This is used
by the BlockSpecial unit test as a source of a symbolic link to a block
special device.
This causes the unit test to fail like this:
$ cd tests
$ ./test_BlockSpecial
...
[ RUN ] BlockSpecialTest.NamedBlockSpecialObjectBySymlinkMatches
test_BlockSpecial.cc:137: Failure
Failed
get_link_name(): Failed to open directory '/dev/disk/by-path'
test_BlockSpecial.cc:168: Failure
Failed
follow_link_name(): Failed to resolve symbolic link ''
test_BlockSpecial.cc:255: Failure
Expected: (lnk.m_name.c_str()) != (bs.m_name.c_str()), actual: "" vs ""
[ FAILED ] BlockSpecialTest.NamedBlockSpecialObjectBySymlinkMatches (0 ms)
...
[ FAILED ] 1 test, listed below:
[ FAILED ] BlockSpecialTest.NamedBlockSpecialObjectBySymlinkMatches
1 FAILED TEST
Which in turn causes make check and make distcheck to fail.
Use directory /dev/disk/by-id/ instead as it always exists.
Deliberately breaking one of the operator<() tests produces less than
ideal diagnostics because it doesn't print the BlockSpecial objects
being compared.
$ ./test_BlockSpecial
...
[ RUN ] BlockSpecialTest.OperatorLessThanTwoEmptyObjects
test_BlockSpecial.cc:362: Failure
Value of: bs1 < bs2
Actual: true
Expected: false
[ FAILED ] BlockSpecialTest.OperatorLessThanTwoEmptyObjects (1 ms)
...
This could be solved by using the Google Test Binary Comparison
assertions, however in the tests for false from (bs1 == bs2) and
(bs1 < bs2) comparisons then operators != and >= would have to be
implemented and the tests changed from:
EXPECT_FALSE( bs1 < bs2 );
to:
EXPECT_GE( bs1, bs2 );
This makes the meaning of the test less than clear. The primary purpose
of the test is to check operator<(), but it is expecting the first
BlockSpecial object to be GE (greater than or equal to) than the second,
which is calling operator>=() which in turn is testing operator<(). For
tests of the operators themselves using Google Test Binary Comparison
assertions obscures what is being tested too much.
Instead provide a custom failure message which prints the BlockSpecial
objects failing the comparison, leaving the test still directly calling
the operator being tested, like this:
EXPECT_FALSE( bs1 < bs2 ) << " Where: bs1 = " << bs1 << "\n"
<< " And: bs2 = " << bs2;
And with a suitable macro this is simplified to:
EXPECT_FALSE( bs1 < bs2 ) << ON_FAILURE_WHERE( bs1, bs2 );
Now the above deliberately broken test produces this output:
$ ./test_BlockSpecial
...
[ RUN ] BlockSpecialTest.OperatorLessThanTwoEmptyObjects
test_BlockSpecial.cc:369: Failure
Value of: bs1 < bs2
Actual: true
Expected: false
Where: bs1 = BlockSpecial{"",0,0}
And: bs2 = BlockSpecial{"",0,0}
[ FAILED ] BlockSpecialTest.OperatorLessThanTwoEmptyObjects (0 ms)
...
Bug 781978 - Add Google Test C++ test framework
Add a specific test for the failure found in bug 771670 and fixed by
commit:
253c4d6416
Fix BlockSpecial comparison (#771670)
Bug 781978 - Add Google Test C++ test framework
So far only includes tests of the construction of BlockSpecial objects
and the interaction with the internal cache used to reduce the number of
stat(2) calls. Tests have been designed to be runable by non-root users
and assume as little as possible about the environment by discovering
used block device names and symbolic link name.
Bug 781978 - Add Google Test C++ test framework