drbd: always write bitmap on detach
If we detach due to local read-error (which sets a bit in the bitmap),
stay Primary, and then re-attach (which re-reads the bitmap from disk),
we potentially lost the "out-of-sync" (or, "bad block") information in
the bitmap.
Always (try to) write out the changed bitmap pages before going diskless.
That way, we don't lose the bit for the bad block,
the next resync will fetch it from the peer, and rewrite
it locally, which may result in block reallocation in some
lower layer (or the hardware), and thereby "heal" the bad blocks.
If the bitmap writeout errors out as well, we will (again: try to)
mark the "we need a full sync" bit in our super block,
if it was a READ error; writes are covered by the activity log already.
If that superblock does not make it to disk either, we are sorry.
Maybe we just lost an entire disk or controller (or iSCSI connection),
and there actually are no bad blocks at all, so we don't need to
re-fetch from the peer, there is no "auto-healing" necessary.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index d8ba5c4..9b833e0 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1617,17 +1617,20 @@
/* first half of local IO error, failure to attach,
* or administrative detach */
if (os.disk != D_FAILED && ns.disk == D_FAILED) {
- enum drbd_io_error_p eh = EP_PASS_ON;
- int was_io_error = 0;
/* corresponding get_ldev was in __drbd_set_state, to serialize
* our cleanup here with the transition to D_DISKLESS.
- * But is is still not save to dreference ldev here, since
- * we might come from an failed Attach before ldev was set. */
+ * But it is still not safe to dreference ldev here, we may end
+ * up here from a failed attach, before ldev was even set. */
if (mdev->ldev) {
- eh = mdev->ldev->dc.on_io_error;
- was_io_error = drbd_test_and_clear_flag(mdev, WAS_IO_ERROR);
+ enum drbd_io_error_p eh = mdev->ldev->dc.on_io_error;
- if (was_io_error && eh == EP_CALL_HELPER)
+ /* In some setups, this handler triggers a suicide,
+ * basically mapping IO error to node failure, to
+ * reduce the number of different failure scenarios.
+ *
+ * This handler intentionally runs before we abort IO,
+ * notify the peer, or try to update our meta data. */
+ if (eh == EP_CALL_HELPER && drbd_test_flag(mdev, WAS_IO_ERROR))
drbd_khelper(mdev, "local-io-error");
/* Immediately allow completion of all application IO,
@@ -1643,7 +1646,7 @@
* So aborting local requests may cause crashes,
* or even worse, silent data corruption.
*/
- if (drbd_test_and_clear_flag(mdev, FORCE_DETACH))
+ if (drbd_test_flag(mdev, FORCE_DETACH))
tl_abort_disk_io(mdev);
/* current state still has to be D_FAILED,
@@ -4220,6 +4223,26 @@
* inc/dec it frequently. Once we are D_DISKLESS, no one will touch
* the protected members anymore, though, so once put_ldev reaches zero
* again, it will be safe to free them. */
+
+ /* Try to write changed bitmap pages, read errors may have just
+ * set some bits outside the area covered by the activity log.
+ *
+ * If we have an IO error during the bitmap writeout,
+ * we will want a full sync next time, just in case.
+ * (Do we want a specific meta data flag for this?)
+ *
+ * If that does not make it to stable storage either,
+ * we cannot do anything about that anymore. */
+ if (mdev->bitmap) {
+ if (drbd_bitmap_io_from_worker(mdev, drbd_bm_write,
+ "detach", BM_LOCKED_MASK)) {
+ if (drbd_test_flag(mdev, WAS_READ_ERROR)) {
+ drbd_md_set_flag(mdev, MDF_FULL_SYNC);
+ drbd_md_sync(mdev);
+ }
+ }
+ }
+
drbd_force_state(mdev, NS(disk, D_DISKLESS));
return 1;
}