)]}'
{
  "commit": "cf11da9c5d374962913ca5ba0ce0886b58286224",
  "tree": "88480a47229aa9a3244beca6cae49e0ae00df37b",
  "parents": [
    "aa182e64f16fc29a4984c2d79191b161888bbd9b"
  ],
  "author": {
    "name": "Dave Chinner",
    "email": "dchinner@redhat.com",
    "time": "Tue Jul 15 07:08:24 2014 +1000"
  },
  "committer": {
    "name": "Dave Chinner",
    "email": "david@fromorbit.com",
    "time": "Tue Jul 15 07:08:24 2014 +1000"
  },
  "message": "xfs: refine the allocation stack switch\n\nThe allocation stack switch at xfs_bmapi_allocate() has served it\u0027s\npurpose, but is no longer a sufficient solution to the stack usage\nproblem we have in the XFS allocation path.\n\nWhilst the kernel stack size is now 16k, that is not a valid reason\nfor undoing all our \"keep stack usage down\" modifications. What it\ndoes allow us to do is have the freedom to refine and perfect the\nmodifications knowing that if we get it wrong it won\u0027t blow up in\nour faces - we have a safety net now.\n\nThis is important because we still have the issue of older kernels\nhaving smaller stacks and that they are still supported and are\ndemonstrating a wide range of different stack overflows.  Red Hat\nhas several open bugs for allocation based stack overflows from\ndirectory modifications and direct IO block allocation and these\nproblems still need to be solved. If we can solve them upstream,\nthen distro\u0027s won\u0027t need to bake their own unique solutions.\n\nTo that end, I\u0027ve observed that every allocation based stack\noverflow report has had a specific characteristic - it has happened\nduring or directly after a bmap btree block split. That event\nrequires a new block to be allocated to the tree, and so we\neffectively stack one allocation stack on top of another, and that\u0027s\nwhen we get into trouble.\n\nA further observation is that bmap btree block splits are much rarer\nthan writeback allocation - over a range of different workloads I\u0027ve\nobserved the ratio of bmap btree inserts to splits ranges from 100:1\n(xfstests run) to 10000:1 (local VM image server with sparse files\nthat range in the hundreds of thousands to millions of extents).\nEither way, bmap btree split events are much, much rarer than\nallocation events.\n\nFinally, we have to move the kswapd state to the allocation workqueue\nwork when allocation is done on behalf of kswapd. This is proving to\ncause significant perturbation in performance under memory pressure\nand appears to be generating allocation deadlock warnings under some\nworkloads, so avoiding the use of a workqueue for the majority of\nkswapd writeback allocation will minimise the impact of such\nbehaviour.\n\nHence it makes sense to move the stack switch to xfs_btree_split()\nand only do it for bmap btree splits. Stack switches during\nallocation will be much rarer, so there won\u0027t be significant\nperformacne overhead caused by switching stacks. The worse case\nstack from all allocation paths will be split, not just writeback.\nAnd the majority of memory allocations will be done in the correct\ncontext (e.g. kswapd) without causing additional latency, and so we\nsimplify the memory reclaim interactions between processes,\nworkqueues and kswapd.\n\nThe worst stack I\u0027ve been able to generate with this patch in place\nis 5600 bytes deep. It\u0027s very revealing because we exit XFS at:\n\n37)     1768      64   kmem_cache_alloc+0x13b/0x170\n\nabout 1800 bytes of stack consumed, and the remaining 3800 bytes\n(and 36 functions) is memory reclaim, swap and the IO stack. And\nthis occurs in the inode allocation from an open(O_CREAT) syscall,\nnot writeback.\n\nThe amount of stack being used is much less than I\u0027ve previously be\nable to generate - fs_mark testing has been able to generate stack\nusage of around 7k without too much trouble; with this patch it\u0027s\nonly just getting to 5.5k. This is primarily because the metadata\nallocation paths (e.g. directory blocks) are no longer causing\ndouble splits on the same stack, and hence now stack tracing is\nshowing swapping being the worst stack consumer rather than XFS.\n\nPerformance of fs_mark inode create workloads is unchanged.\nPerformance of fs_mark async fsync workloads is consistently good\nwith context switches reduced by around 150,000/s (30%).\nPerformance of dbench, streaming IO and postmark is unchanged.\nAllocation deadlock warnings have not been seen on the workloads\nthat generated them since adding this patch.\n\nSigned-off-by: Dave Chinner \u003cdchinner@redhat.com\u003e\nReviewed-by: Brian Foster \u003cbfoster@redhat.com\u003e\nSigned-off-by: Dave Chinner \u003cdavid@fromorbit.com\u003e\n\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "96175df211b1955f98843d7d251f6e204fa4c2e2",
      "old_mode": 33188,
      "old_path": "fs/xfs/xfs_bmap.c",
      "new_id": "75c3fe5f3d9d82a34c84139c56eb4028dd3f42d0",
      "new_mode": 33188,
      "new_path": "fs/xfs/xfs_bmap.c"
    },
    {
      "type": "modify",
      "old_id": "38ba36e9b2f0c5616f018c0e5474da7ce9b42290",
      "old_mode": 33188,
      "old_path": "fs/xfs/xfs_bmap.h",
      "new_id": "b879ca56a64ccfab5b2a42502a5b50f68b85f1df",
      "new_mode": 33188,
      "new_path": "fs/xfs/xfs_bmap.h"
    },
    {
      "type": "modify",
      "old_id": "057f671811d6128da5c27f5af595ad13c834a42b",
      "old_mode": 33188,
      "old_path": "fs/xfs/xfs_bmap_util.c",
      "new_id": "64731ef3324d4b44a938aeac30fc3b816d890222",
      "new_mode": 33188,
      "new_path": "fs/xfs/xfs_bmap_util.c"
    },
    {
      "type": "modify",
      "old_id": "935ed2b24edfb05b4d5893dccf7cebdb09a374ed",
      "old_mode": 33188,
      "old_path": "fs/xfs/xfs_bmap_util.h",
      "new_id": "2fdb72d2c908fc5f962f5beff4166c1b07c69f08",
      "new_mode": 33188,
      "new_path": "fs/xfs/xfs_bmap_util.h"
    },
    {
      "type": "modify",
      "old_id": "bf810c6baf2b8144cd5e28fbcda8cf1162077219",
      "old_mode": 33188,
      "old_path": "fs/xfs/xfs_btree.c",
      "new_id": "cf893bc1e373a967ba978836310d8d2abf4a0871",
      "new_mode": 33188,
      "new_path": "fs/xfs/xfs_btree.c"
    },
    {
      "type": "modify",
      "old_id": "6c5eb4c551e3f562e1aba435ceb9a0df438b9e08",
      "old_mode": 33188,
      "old_path": "fs/xfs/xfs_iomap.c",
      "new_id": "6d3ec2b6ee294c7ec38e28fd32376162276f1005",
      "new_mode": 33188,
      "new_path": "fs/xfs/xfs_iomap.c"
    }
  ]
}
