[PATCH] NMI: Update NMI users of RCU to use new API Uses of RCU for dynamically changeable NMI handlers need to use the new rcu_dereference() and rcu_assign_pointer() facilities. This change makes it clear that these uses are safe from a memory-barrier viewpoint, but the main purpose is to document exactly what operations are being protected by RCU. This has been tested on x86 and x86-64, which are the only architectures affected by this change. Signed-off-by: <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit: 19306059cd7fedaf96b4b0260a9a8a45e513c857 [log] [tgz]
author: Paul E. McKenney <paulmck@us.ibm.com> Tue Sep 06 15:16:35 2005 -0700
committer: Linus Torvalds <torvalds@g5.osdl.org> Wed Sep 07 16:57:19 2005 -0700
tree: 7c32d59c1a5830689d5f85a7f81e89e48d1097ae
parent: fe21773d655c2c64641ec2cef499289ea175c817 [diff]
diff --git a/Documentation/RCU/NMI-RCU.txt b/Documentation/RCU/NMI-RCU.txt
new file mode 100644
index 0000000..d0634a5
--- /dev/null
+++ b/Documentation/RCU/NMI-RCU.txt

@@ -0,0 +1,112 @@
+Using RCU to Protect Dynamic NMI Handlers
+
+
+Although RCU is usually used to protect read-mostly data structures,
+it is possible to use RCU to provide dynamic non-maskable interrupt
+handlers, as well as dynamic irq handlers.  This document describes
+how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer
+work in "arch/i386/oprofile/nmi_timer_int.c" and in
+"arch/i386/kernel/traps.c".
+
+The relevant pieces of code are listed below, each followed by a
+brief explanation.
+
+	static int dummy_nmi_callback(struct pt_regs *regs, int cpu)
+	{
+		return 0;
+	}
+
+The dummy_nmi_callback() function is a "dummy" NMI handler that does
+nothing, but returns zero, thus saying that it did nothing, allowing
+the NMI handler to take the default machine-specific action.
+
+	static nmi_callback_t nmi_callback = dummy_nmi_callback;
+
+This nmi_callback variable is a global function pointer to the current
+NMI handler.
+
+	fastcall void do_nmi(struct pt_regs * regs, long error_code)
+	{
+		int cpu;
+
+		nmi_enter();
+
+		cpu = smp_processor_id();
+		++nmi_count(cpu);
+
+		if (!rcu_dereference(nmi_callback)(regs, cpu))
+			default_do_nmi(regs);
+
+		nmi_exit();
+	}
+
+The do_nmi() function processes each NMI.  It first disables preemption
+in the same way that a hardware irq would, then increments the per-CPU
+count of NMIs.  It then invokes the NMI handler stored in the nmi_callback
+function pointer.  If this handler returns zero, do_nmi() invokes the
+default_do_nmi() function to handle a machine-specific NMI.  Finally,
+preemption is restored.
+
+Strictly speaking, rcu_dereference() is not needed, since this code runs
+only on i386, which does not need rcu_dereference() anyway.  However,
+it is a good documentation aid, particularly for anyone attempting to
+do something similar on Alpha.
+
+Quick Quiz:  Why might the rcu_dereference() be necessary on Alpha,
+	     given that the code referenced by the pointer is read-only?
+
+
+Back to the discussion of NMI and RCU...
+
+	void set_nmi_callback(nmi_callback_t callback)
+	{
+		rcu_assign_pointer(nmi_callback, callback);
+	}
+
+The set_nmi_callback() function registers an NMI handler.  Note that any
+data that is to be used by the callback must be initialized up -before-
+the call to set_nmi_callback().  On architectures that do not order
+writes, the rcu_assign_pointer() ensures that the NMI handler sees the
+initialized values.
+
+	void unset_nmi_callback(void)
+	{
+		rcu_assign_pointer(nmi_callback, dummy_nmi_callback);
+	}
+
+This function unregisters an NMI handler, restoring the original
+dummy_nmi_handler().  However, there may well be an NMI handler
+currently executing on some other CPU.  We therefore cannot free
+up any data structures used by the old NMI handler until execution
+of it completes on all other CPUs.
+
+One way to accomplish this is via synchronize_sched(), perhaps as
+follows:
+
+	unset_nmi_callback();
+	synchronize_sched();
+	kfree(my_nmi_data);
+
+This works because synchronize_sched() blocks until all CPUs complete
+any preemption-disabled segments of code that they were executing.
+Since NMI handlers disable preemption, synchronize_sched() is guaranteed
+not to return until all ongoing NMI handlers exit.  It is therefore safe
+to free up the handler's data as soon as synchronize_sched() returns.
+
+
+Answer to Quick Quiz
+
+	Why might the rcu_dereference() be necessary on Alpha, given
+	that the code referenced by the pointer is read-only?
+
+	Answer: The caller to set_nmi_callback() might well have
+		initialized some data that is to be used by the
+		new NMI handler.  In this case, the rcu_dereference()
+		would be needed, because otherwise a CPU that received
+		an NMI just after the new handler was set might see
+		the pointer to the new NMI handler, but the old
+		pre-initialized version of the handler's data.
+
+		More important, the rcu_dereference() makes it clear
+		to someone reading the code that the pointer is being
+		protected by RCU.

diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
index 54629bb..029bf94 100644
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c

@@ -657,7 +657,7 @@
 
 	++nmi_count(cpu);
 
-	if (!nmi_callback(regs, cpu))
+	if (!rcu_dereference(nmi_callback)(regs, cpu))
 		default_do_nmi(regs);
 
 	nmi_exit();
@@ -665,7 +665,7 @@
 
 void set_nmi_callback(nmi_callback_t callback)
 {
-	nmi_callback = callback;
+	rcu_assign_pointer(nmi_callback, callback);
 }
 EXPORT_SYMBOL_GPL(set_nmi_callback);
 

diff --git a/arch/x86_64/kernel/nmi.c b/arch/x86_64/kernel/nmi.c
index 84cae81..caf1649 100644
--- a/arch/x86_64/kernel/nmi.c
+++ b/arch/x86_64/kernel/nmi.c

@@ -524,14 +524,14 @@
 
 	nmi_enter();
 	add_pda(__nmi_count,1);
-	if (!nmi_callback(regs, cpu))
+	if (!rcu_dereference(nmi_callback)(regs, cpu))
 		default_do_nmi(regs);
 	nmi_exit();
 }
 
 void set_nmi_callback(nmi_callback_t callback)
 {
-	nmi_callback = callback;
+	rcu_assign_pointer(nmi_callback, callback);
 }
 
 void unset_nmi_callback(void)
commit	19306059cd7fedaf96b4b0260a9a8a45e513c857	[log] [tgz]
author	Paul E. McKenney <paulmck@us.ibm.com>	Tue Sep 06 15:16:35 2005 -0700
committer	Linus Torvalds <torvalds@g5.osdl.org>	Wed Sep 07 16:57:19 2005 -0700
tree	7c32d59c1a5830689d5f85a7f81e89e48d1097ae
parent	fe21773d655c2c64641ec2cef499289ea175c817 [diff]