base: use setjmp to speed up fiber

ucontext is an order of magnitude slower compared to most of the fiber
implementation, mainly due to the additional signal mask operation.

This change applies the trick provided in
http://www.1024cores.net/home/lock-free-algorithms/tricks/fibers,
which uses _setjmp/_longjmp to switch between contexts created by
ucontext.

Combine with NodeList improvement, we see 81% speed improvement with the
example provided by Matthias Jung:
https://gist.github.com/myzinsky/557200aa04556de44a317e0a10f51840

Compared with Accellera's SystemC, gem5 SystemC was originally 10x
slower, and with this change it's about 1.8x.

Change-Id: I0ffb6978e83dc8be049b750dc1baebb3d251601c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34356
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
diff --git a/src/base/fiber.cc b/src/base/fiber.cc
index 3d2e2e9..fe1bad0 100644
--- a/src/base/fiber.cc
+++ b/src/base/fiber.cc
@@ -145,10 +145,12 @@
 
     setStarted();
 
-    // Swap back to the parent context which is still considered "current",
-    // now that we're ready to go.
-    int ret M5_VAR_USED = swapcontext(&ctx, &_currentFiber->ctx);
-    panic_if(ret == -1, strerror(errno));
+    if (_setjmp(jmp) == 0) {
+        // Swap back to the parent context which is still considered "current",
+        // now that we're ready to go.
+        int ret = swapcontext(&ctx, &_currentFiber->ctx);
+        panic_if(ret == -1, strerror(errno));
+    }
 
     // Call main() when we're been reactivated for the first time.
     main();
@@ -175,7 +177,8 @@
     Fiber *prev = _currentFiber;
     Fiber *next = this;
     _currentFiber = next;
-    swapcontext(&prev->ctx, &next->ctx);
+    if (_setjmp(prev->jmp) == 0)
+        _longjmp(next->jmp, 1);
 }
 
 Fiber *Fiber::currentFiber() { return _currentFiber; }
diff --git a/src/base/fiber.hh b/src/base/fiber.hh
index dc7ef01..be8937f 100644
--- a/src/base/fiber.hh
+++ b/src/base/fiber.hh
@@ -39,6 +39,12 @@
 #include <ucontext.h>
 #endif
 
+// Avoid fortify source for longjmp to work between ucontext stacks.
+#pragma push_macro("__USE_FORTIFY_LEVEL")
+#undef __USE_FORTIFY_LEVEL
+#include <setjmp.h>
+#pragma pop_macro("__USE_FORTIFY_LEVEL")
+
 #include <cstddef>
 #include <cstdint>
 
@@ -137,6 +143,10 @@
     void start();
 
     ucontext_t ctx;
+    // ucontext is slow in swapcontext. Here we use _setjmp/_longjmp to avoid
+    // the additional signals for speed up.
+    jmp_buf jmp;
+
     Fiber *link;
 
     // The stack for this context, or a nullptr if allocated elsewhere.