)]}'
{
  "commit": "cc7a486cac78f6fc1a24e8cd63036bae8d2ab431",
  "tree": "258abdc1843d6f9ed76608412f1364178fb8c697",
  "parents": [
    "796aadeb1b2db9b5d463946766c5bbfd7717158c"
  ],
  "author": {
    "name": "Nick Piggin",
    "email": "nickpiggin@yahoo.com.au",
    "time": "Mon Aug 11 13:49:30 2008 +1000"
  },
  "committer": {
    "name": "Ingo Molnar",
    "email": "mingo@elte.hu",
    "time": "Mon Aug 11 15:21:28 2008 +0200"
  },
  "message": "generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask()\n\n* Venki Pallipadi \u003cvenkatesh.pallipadi@intel.com\u003e wrote:\n\n\u003e Found a OOPS on a big SMP box during an overnight reboot test with\n\u003e upstream git.\n\u003e\n\u003e Suresh and I looked at the oops and looks like the root cause is in\n\u003e generic_smp_call_function_interrupt() and smp_call_function_mask() with\n\u003e wait parameter.\n\u003e\n\u003e The actual oops looked like\n\u003e\n\u003e [   11.277260] BUG: unable to handle kernel paging request at ffff8802ffffffff\n\u003e [   11.277815] IP: [\u003cffff8802ffffffff\u003e] 0xffff8802ffffffff\n\u003e [   11.278155] PGD 202063 PUD 0\n\u003e [   11.278576] Oops: 0010 [1] SMP\n\u003e [   11.279006] CPU 5\n\u003e [   11.279336] Modules linked in:\n\u003e [   11.279752] Pid: 0, comm: swapper Not tainted 2.6.27-rc2-00020-g685d87f #290\n\u003e [   11.280039] RIP: 0010:[\u003cffff8802ffffffff\u003e]  [\u003cffff8802ffffffff\u003e] 0xffff8802ffffffff\n\u003e [   11.280692] RSP: 0018:ffff88027f1f7f70  EFLAGS: 00010086\n\u003e [   11.280976] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000\n\u003e [   11.281264] RDX: 0000000000004f4e RSI: 0000000000000001 RDI: 0000000000000000\n\u003e [   11.281624] RBP: ffff88027f1f7f98 R08: 0000000000000001 R09: ffffffff802509af\n\u003e [   11.281925] R10: ffff8800280c2780 R11: 0000000000000000 R12: ffff88027f097d48\n\u003e [   11.282214] R13: ffff88027f097d70 R14: 0000000000000005 R15: ffff88027e571000\n\u003e [   11.282502] FS:  0000000000000000(0000) GS:ffff88027f1c3340(0000) knlGS:0000000000000000\n\u003e [   11.283096] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b\n\u003e [   11.283382] CR2: ffff8802ffffffff CR3: 0000000000201000 CR4: 00000000000006e0\n\u003e [   11.283760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000\n\u003e [   11.284048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400\n\u003e [   11.284337] Process swapper (pid: 0, threadinfo ffff88027f1f2000, task ffff88027f1f0640)\n\u003e [   11.284936] Stack:  ffffffff80250963 0000000000000212 0000000000ee8c78 0000000000ee8a66\n\u003e [   11.285802]  ffff88027e571550 ffff88027f1f7fa8 ffffffff8021adb5 ffff88027f1f3e40\n\u003e [   11.286599]  ffffffff8020bdd6 ffff88027f1f3e40 \u003cEOI\u003e  ffff88027f1f3ef8 0000000000000000\n\u003e [   11.287120] Call Trace:\n\u003e [   11.287768]  \u003cIRQ\u003e  [\u003cffffffff80250963\u003e] ? generic_smp_call_function_interrupt+0x61/0x12c\n\u003e [   11.288354]  [\u003cffffffff8021adb5\u003e] smp_call_function_interrupt+0x17/0x27\n\u003e [   11.288744]  [\u003cffffffff8020bdd6\u003e] call_function_interrupt+0x66/0x70\n\u003e [   11.289030]  \u003cEOI\u003e  [\u003cffffffff8024ab3b\u003e] ? clockevents_notify+0x19/0x73\n\u003e [   11.289380]  [\u003cffffffff803b9b75\u003e] ? acpi_idle_enter_simple+0x18b/0x1fa\n\u003e [   11.289760]  [\u003cffffffff803b9b6b\u003e] ? acpi_idle_enter_simple+0x181/0x1fa\n\u003e [   11.290051]  [\u003cffffffff8053aeca\u003e] ? cpuidle_idle_call+0x70/0xa2\n\u003e [   11.290338]  [\u003cffffffff80209f61\u003e] ? cpu_idle+0x5f/0x7d\n\u003e [   11.290723]  [\u003cffffffff8060224a\u003e] ? start_secondary+0x14d/0x152\n\u003e [   11.291010]\n\u003e [   11.291287]\n\u003e [   11.291654] Code:  Bad RIP value.\n\u003e [   11.292041] RIP  [\u003cffff8802ffffffff\u003e] 0xffff8802ffffffff\n\u003e [   11.292380]  RSP \u003cffff88027f1f7f70\u003e\n\u003e [   11.292741] CR2: ffff8802ffffffff\n\u003e [   11.310951] ---[ end trace 137c54d525305f1c ]---\n\u003e\n\u003e The problem is with the following sequence of events:\n\u003e\n\u003e - CPU A calls smp_call_function_mask() for CPU B with wait parameter\n\u003e - CPU A sets up the call_function_data on the stack and does an rcu add to\n\u003e   call_function_queue\n\u003e - CPU A waits until the WAIT flag is cleared\n\u003e - CPU B gets the call function interrupt and starts going through the\n\u003e   call_function_queue\n\u003e - CPU C also gets some other call function interrupt and starts going through\n\u003e   the call_function_queue\n\u003e - CPU C, which is also going through the call_function_queue, starts referencing\n\u003e   CPU A\u0027s stack, as that element is still in call_function_queue\n\u003e - CPU B finishes the function call that CPU A set up and as there are no other\n\u003e   references to it, rcu deletes the call_function_data (which was from CPU A\n\u003e   stack)\n\u003e - CPU B sees the wait flag and just clears the flag (no call_rcu to free)\n\u003e - CPU A which was waiting on the flag continues executing and the stack\n\u003e   contents change\n\u003e\n\u003e - CPU C is still in rcu_read section accessing the CPU A\u0027s stack sees\n\u003e   inconsistent call_funation_data and can try to execute\n\u003e   function with some random pointer, causing stack corruption for A\n\u003e   (by clearing the bits in mask field) and oops.\n\nNice debugging work.\n\nI\u0027d suggest something like the attached (boot tested) patch as the simple\nfix for now.\n\nI expect the benefits from the less synchronized, multiple-in-flight-data\nglobal queue will still outweigh the costs of dynamic allocations. But\nif worst comes to worst then we just go back to a globally synchronous\none-at-a-time implementation, but that would be pretty sad!\n\nSigned-off-by: Ingo Molnar \u003cmingo@elte.hu\u003e\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "96fc7c0edc59d1f09ca56a500d90b0a8a212c7d7",
      "old_mode": 33188,
      "old_path": "kernel/smp.c",
      "new_id": "e6084f6efb4d70210889c9dbb3e6a02931acc784",
      "new_mode": 33188,
      "new_path": "kernel/smp.c"
    }
  ]
}
