| Documentation for /proc/sys/fs/* kernel version 2.2.10 |
| (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> |
| (c) 2009, Shen Feng<shen@cn.fujitsu.com> |
| |
| For general info and legal blurb, please look in README. |
| |
| ============================================================== |
| |
| This file contains documentation for the sysctl files in |
| /proc/sys/fs/ and is valid for Linux kernel version 2.2. |
| |
| The files in this directory can be used to tune and monitor |
| miscellaneous and general things in the operation of the Linux |
| kernel. Since some of the files _can_ be used to screw up your |
| system, it is advisable to read both documentation and source |
| before actually making adjustments. |
| |
| 1. /proc/sys/fs |
| ---------------------------------------------------------- |
| |
| Currently, these files are in /proc/sys/fs: |
| - aio-max-nr |
| - aio-nr |
| - dentry-state |
| - dquot-max |
| - dquot-nr |
| - file-max |
| - file-nr |
| - inode-max |
| - inode-nr |
| - inode-state |
| - nr_open |
| - overflowuid |
| - overflowgid |
| - pipe-user-pages-hard |
| - pipe-user-pages-soft |
| - protected_hardlinks |
| - protected_symlinks |
| - suid_dumpable |
| - super-max |
| - super-nr |
| |
| ============================================================== |
| |
| aio-nr & aio-max-nr: |
| |
| aio-nr is the running total of the number of events specified on the |
| io_setup system call for all currently active aio contexts. If aio-nr |
| reaches aio-max-nr then io_setup will fail with EAGAIN. Note that |
| raising aio-max-nr does not result in the pre-allocation or re-sizing |
| of any kernel data structures. |
| |
| ============================================================== |
| |
| dentry-state: |
| |
| From linux/fs/dentry.c: |
| -------------------------------------------------------------- |
| struct { |
| int nr_dentry; |
| int nr_unused; |
| int age_limit; /* age in seconds */ |
| int want_pages; /* pages requested by system */ |
| int dummy[2]; |
| } dentry_stat = {0, 0, 45, 0,}; |
| -------------------------------------------------------------- |
| |
| Dentries are dynamically allocated and deallocated, and |
| nr_dentry seems to be 0 all the time. Hence it's safe to |
| assume that only nr_unused, age_limit and want_pages are |
| used. Nr_unused seems to be exactly what its name says. |
| Age_limit is the age in seconds after which dcache entries |
| can be reclaimed when memory is short and want_pages is |
| nonzero when shrink_dcache_pages() has been called and the |
| dcache isn't pruned yet. |
| |
| ============================================================== |
| |
| dquot-max & dquot-nr: |
| |
| The file dquot-max shows the maximum number of cached disk |
| quota entries. |
| |
| The file dquot-nr shows the number of allocated disk quota |
| entries and the number of free disk quota entries. |
| |
| If the number of free cached disk quotas is very low and |
| you have some awesome number of simultaneous system users, |
| you might want to raise the limit. |
| |
| ============================================================== |
| |
| file-max & file-nr: |
| |
| The value in file-max denotes the maximum number of file- |
| handles that the Linux kernel will allocate. When you get lots |
| of error messages about running out of file handles, you might |
| want to increase this limit. |
| |
| Historically,the kernel was able to allocate file handles |
| dynamically, but not to free them again. The three values in |
| file-nr denote the number of allocated file handles, the number |
| of allocated but unused file handles, and the maximum number of |
| file handles. Linux 2.6 always reports 0 as the number of free |
| file handles -- this is not an error, it just means that the |
| number of allocated file handles exactly matches the number of |
| used file handles. |
| |
| Attempts to allocate more file descriptors than file-max are |
| reported with printk, look for "VFS: file-max limit <number> |
| reached". |
| ============================================================== |
| |
| nr_open: |
| |
| This denotes the maximum number of file-handles a process can |
| allocate. Default value is 1024*1024 (1048576) which should be |
| enough for most machines. Actual limit depends on RLIMIT_NOFILE |
| resource limit. |
| |
| ============================================================== |
| |
| inode-max, inode-nr & inode-state: |
| |
| As with file handles, the kernel allocates the inode structures |
| dynamically, but can't free them yet. |
| |
| The value in inode-max denotes the maximum number of inode |
| handlers. This value should be 3-4 times larger than the value |
| in file-max, since stdin, stdout and network sockets also |
| need an inode struct to handle them. When you regularly run |
| out of inodes, you need to increase this value. |
| |
| The file inode-nr contains the first two items from |
| inode-state, so we'll skip to that file... |
| |
| Inode-state contains three actual numbers and four dummies. |
| The actual numbers are, in order of appearance, nr_inodes, |
| nr_free_inodes and preshrink. |
| |
| Nr_inodes stands for the number of inodes the system has |
| allocated, this can be slightly more than inode-max because |
| Linux allocates them one pageful at a time. |
| |
| Nr_free_inodes represents the number of free inodes (?) and |
| preshrink is nonzero when the nr_inodes > inode-max and the |
| system needs to prune the inode list instead of allocating |
| more. |
| |
| ============================================================== |
| |
| overflowgid & overflowuid: |
| |
| Some filesystems only support 16-bit UIDs and GIDs, although in Linux |
| UIDs and GIDs are 32 bits. When one of these filesystems is mounted |
| with writes enabled, any UID or GID that would exceed 65535 is translated |
| to a fixed value before being written to disk. |
| |
| These sysctls allow you to change the value of the fixed UID and GID. |
| The default is 65534. |
| |
| ============================================================== |
| |
| pipe-user-pages-hard: |
| |
| Maximum total number of pages a non-privileged user may allocate for pipes. |
| Once this limit is reached, no new pipes may be allocated until usage goes |
| below the limit again. When set to 0, no limit is applied, which is the default |
| setting. |
| |
| ============================================================== |
| |
| pipe-user-pages-soft: |
| |
| Maximum total number of pages a non-privileged user may allocate for pipes |
| before the pipe size gets limited to a single page. Once this limit is reached, |
| new pipes will be limited to a single page in size for this user in order to |
| limit total memory usage, and trying to increase them using fcntl() will be |
| denied until usage goes below the limit again. The default value allows to |
| allocate up to 1024 pipes at their default size. When set to 0, no limit is |
| applied. |
| |
| ============================================================== |
| |
| protected_hardlinks: |
| |
| A long-standing class of security issues is the hardlink-based |
| time-of-check-time-of-use race, most commonly seen in world-writable |
| directories like /tmp. The common method of exploitation of this flaw |
| is to cross privilege boundaries when following a given hardlink (i.e. a |
| root process follows a hardlink created by another user). Additionally, |
| on systems without separated partitions, this stops unauthorized users |
| from "pinning" vulnerable setuid/setgid files against being upgraded by |
| the administrator, or linking to special files. |
| |
| When set to "0", hardlink creation behavior is unrestricted. |
| |
| When set to "1" hardlinks cannot be created by users if they do not |
| already own the source file, or do not have read/write access to it. |
| |
| This protection is based on the restrictions in Openwall and grsecurity. |
| |
| ============================================================== |
| |
| protected_symlinks: |
| |
| A long-standing class of security issues is the symlink-based |
| time-of-check-time-of-use race, most commonly seen in world-writable |
| directories like /tmp. The common method of exploitation of this flaw |
| is to cross privilege boundaries when following a given symlink (i.e. a |
| root process follows a symlink belonging to another user). For a likely |
| incomplete list of hundreds of examples across the years, please see: |
| http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp |
| |
| When set to "0", symlink following behavior is unrestricted. |
| |
| When set to "1" symlinks are permitted to be followed only when outside |
| a sticky world-writable directory, or when the uid of the symlink and |
| follower match, or when the directory owner matches the symlink's owner. |
| |
| This protection is based on the restrictions in Openwall and grsecurity. |
| |
| ============================================================== |
| |
| suid_dumpable: |
| |
| This value can be used to query and set the core dump mode for setuid |
| or otherwise protected/tainted binaries. The modes are |
| |
| 0 - (default) - traditional behaviour. Any process which has changed |
| privilege levels or is execute only will not be dumped. |
| 1 - (debug) - all processes dump core when possible. The core dump is |
| owned by the current user and no security is applied. This is |
| intended for system debugging situations only. Ptrace is unchecked. |
| This is insecure as it allows regular users to examine the memory |
| contents of privileged processes. |
| 2 - (suidsafe) - any binary which normally would not be dumped is dumped |
| anyway, but only if the "core_pattern" kernel sysctl is set to |
| either a pipe handler or a fully qualified path. (For more details |
| on this limitation, see CVE-2006-2451.) This mode is appropriate |
| when administrators are attempting to debug problems in a normal |
| environment, and either have a core dump pipe handler that knows |
| to treat privileged core dumps with care, or specific directory |
| defined for catching core dumps. If a core dump happens without |
| a pipe handler or fully qualifid path, a message will be emitted |
| to syslog warning about the lack of a correct setting. |
| |
| ============================================================== |
| |
| super-max & super-nr: |
| |
| These numbers control the maximum number of superblocks, and |
| thus the maximum number of mounted filesystems the kernel |
| can have. You only need to increase super-max if you need to |
| mount more filesystems than the current value in super-max |
| allows you to. |
| |
| ============================================================== |
| |
| aio-nr & aio-max-nr: |
| |
| aio-nr shows the current system-wide number of asynchronous io |
| requests. aio-max-nr allows you to change the maximum value |
| aio-nr can grow to. |
| |
| ============================================================== |
| |
| mount-max: |
| |
| This denotes the maximum number of mounts that may exist |
| in a mount namespace. |
| |
| ============================================================== |
| |
| |
| 2. /proc/sys/fs/binfmt_misc |
| ---------------------------------------------------------- |
| |
| Documentation for the files in /proc/sys/fs/binfmt_misc is |
| in Documentation/binfmt_misc.txt. |
| |
| |
| 3. /proc/sys/fs/mqueue - POSIX message queues filesystem |
| ---------------------------------------------------------- |
| |
| The "mqueue" filesystem provides the necessary kernel features to enable the |
| creation of a user space library that implements the POSIX message queues |
| API (as noted by the MSG tag in the POSIX 1003.1-2001 version of the System |
| Interfaces specification.) |
| |
| The "mqueue" filesystem contains values for determining/setting the amount of |
| resources used by the file system. |
| |
| /proc/sys/fs/mqueue/queues_max is a read/write file for setting/getting the |
| maximum number of message queues allowed on the system. |
| |
| /proc/sys/fs/mqueue/msg_max is a read/write file for setting/getting the |
| maximum number of messages in a queue value. In fact it is the limiting value |
| for another (user) limit which is set in mq_open invocation. This attribute of |
| a queue must be less or equal then msg_max. |
| |
| /proc/sys/fs/mqueue/msgsize_max is a read/write file for setting/getting the |
| maximum message size value (it is every message queue's attribute set during |
| its creation). |
| |
| /proc/sys/fs/mqueue/msg_default is a read/write file for setting/getting the |
| default number of messages in a queue value if attr parameter of mq_open(2) is |
| NULL. If it exceed msg_max, the default value is initialized msg_max. |
| |
| /proc/sys/fs/mqueue/msgsize_default is a read/write file for setting/getting |
| the default message size value if attr parameter of mq_open(2) is NULL. If it |
| exceed msgsize_max, the default value is initialized msgsize_max. |
| |
| 4. /proc/sys/fs/epoll - Configuration options for the epoll interface |
| -------------------------------------------------------- |
| |
| This directory contains configuration options for the epoll(7) interface. |
| |
| max_user_watches |
| ---------------- |
| |
| Every epoll file descriptor can store a number of files to be monitored |
| for event readiness. Each one of these monitored files constitutes a "watch". |
| This configuration option sets the maximum number of "watches" that are |
| allowed for each user. |
| Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes |
| on a 64bit one. |
| The current default value for max_user_watches is the 1/32 of the available |
| low memory, divided for the "watch" cost in bytes. |
| |