/proc/sys/vm下内核参数解析 谁借莪1个温暖的怀抱¢ 2022-06-18 03:50 183阅读 0赞 \[wuyaalan@localhost desktop\]$ cd /proc/sys/vm/ \[wuyaalan@localhost vm\]$ ls block\_dump hugepages\_treat\_as\_movable oom\_kill\_allocating\_task compact\_memory hugetlb\_shm\_group overcommit\_memory dirty\_background\_bytes laptop\_mode overcommit\_ratio dirty\_background\_ratio legacy\_va\_layout page-cluster dirty\_bytes lowmem\_reserve\_ratio panic\_on\_oom dirty\_expire\_centisecs max\_map\_count percpu\_pagelist\_fraction dirty\_ratio min\_free\_kbytes scan\_unevictable\_pages dirty\_writeback\_centisecs mmap\_min\_addr stat\_interval drop\_caches nr\_hugepages swappiness extfrag\_threshold nr\_overcommit\_hugepages vdso\_enabled extra\_free\_kbytes nr\_pdflush\_threads vfs\_cache\_pressure highmem\_is\_dirtyable oom\_dump\_tasks would\_have\_oomkilled 从上面结果可以看出,proc文件系统给用户提供了很多内核信息帮助,使得用户可以通过修改内核参数达到提高系统性能的目的。 接下来对上面列出的部分参数含义进行解释说明。 一 block\_dump block\_dump enables block I/O debugging when set to a nonzero value. If you want to find out which process caused the disk to spin up(see /proc/sys/vm/laptop\_mode ), you can gather information by setting the flag. When this flag is set, Linux reports all disk read and write operations that take place, and all block dirtyings done to files. This makes it possible to debug why a disk needs to spin up, and to increase battery life even more. The output of block\_dump is written to the kernel output, and it can be retrieved using "dmesg". When you use block\_dump and your kernel logging level also includes kernel debugging messages, you probably want to turn off klogd, otherwise the output of block\_dump will be logged, causing disk activity that is not normally there. 参数block\_dump使块I / O调试时设置为一个非零的值。如果你想找出哪些过程引起的磁盘旋转(见/proc/sys/vm/laptop\_mode),你可以通过设置标志收集信息。设置该标志后,Linux将会以文件的形式报告所有磁盘活动时的读写操作以及所有脏块。这使得它可以解释为什么一个磁盘需要旋转起来,甚至可以增加电池寿命。把block\_dump输出写至内核输出,可以使用“dmesg”相关信息。当你使用block\_dump和内核日志记录级别,还包括内核调试信息,你可能要关闭klogd,否则block\_dump输出将被记录,导致不正常的磁盘活动有。 二 dirty\_background\_ratio Contains, as a percentage of total system memory, the number of pages at which the pdflush background writeback daemon will start writing out dirty data. 参数dirty\_background\_ratio是当所有被更改页面总大小占工作内存超过一定比例 时,pdflush 会开始写回工作。用户可以增加这个比例,以增加页面驻留在内存的时间。 三 dirty\_expire\_centisecs This tunable is used to define when dirty data is old enough to be eligible for writeout by the pdflush daemons. It is expressed in 100'ths of a second. Data which has been dirty in memory for longer than this interval will be written out next time a pdflush daemon wakes up. 参数dirty\_expire\_centisecs控制一个更改过的页面经过多长时间后被认为是过期的、必须被写回的页面。 四 dirty\_ratio Contains, as a percentage of total system memory, the number of pages at which a process which is generating disk writes will itself start writing out dirty data. 五 dirty\_writeback\_centisecs The pdflush writeback daemons will periodically wake up and write "old" data out to disk. This tunable expresses the interval between those wakeups, in 100'ths of a second. Setting this to zero disables periodic writeback altogether. 参数dirty\_writeback\_centisecs 是在pdflash线程周期唤醒的时间间隔。也就是每过一定时间pdflsh就会将修改过得数据回写到磁盘。 六 drop\_caches Writing to this will cause the kernel to drop clean caches, dentries and inodes from memory, causing that memory to become free. To free pagecache: * echo 1 > /proc/sys/vm/drop\_caches To free dentries and inodes: * echo 2 > /proc/sys/vm/drop\_caches To free pagecache, dentries and inodes: * echo 3 > /proc/sys/vm/drop\_caches As this is a non-destructive operation, and dirty objects are not freeable, the user should run "sync" first in order to make sure all cached objects are freed. This tunable was added in 2.6.16. 七 hugepages\_treat\_as\_movable When a non-zero value is written to this tunable, future allocations for the huge page pool will use ZONE\_MOVABLE. Despite huge pages being non-movable, we do not introduce additional external fragmentation of note as huge pages are always the largest contiguous block we care about. Huge pages are not movable so are not allocated from ZONE\_MOVABLE by default. However, as ZONE\_MOVABLE will always have pages that can be migrated or reclaimed, it can be used to satisfy hugepage allocations even when the system has been running a long time. This allows an administrator to resize the hugepage pool at runtime depending on the size of ZONE\_MOVABLE. 八 hugetlb\_shm\_group hugetlb\_shm\_group contains group id that is allowed to create SysV shared memory segment using hugetlb page 九 laptop\_mode laptop\_mode is a knob that controls "laptop mode". When the knob is set, any physical disk I/O (that might have caused the hard disk to spin up, see 。/proc/sys/vm/block\_dump) causes Linux to flush all dirty blocks. The result of this is that after a disk has spun down, it will not be spun up anymore to write dirty blocks, because those blocks had already been written immediately after the most recent read operation. The value of the laptop\_mode knob determines the time between the occurrence of disk I/O and when the flush is triggered. A sensible value for the knob is 5 seconds. Setting the knob to 0 disables laptop mode. 在“笔记本模式”下,内核更智能的使用I/O 系统,它会尽量使磁盘处于低能耗的状态下。“笔记本模式”会将许多的I/O 操作组织在一起,一次完成,而在每次的磁盘I/O 之间是默认长达10 分钟的非活动期,这样会大大减少磁盘启动的次数。为了完成这么长时间的非活动期,内核就要在一次活动期时完成尽可能多的I/O 任务。在一次活动期间,要完成大量的预读,然后将所有的缓冲同步。 十 legacy\_va\_layout If non-zero, this sysctl disables the new 32-bit mmap map layout - the kernel will use the legacy (2.4) layout for all processes 十一 lowmem\_reserve\_ratio Ratio of total pages to free pages for each memory zone. 十二 max\_map\_count This file contains the maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries. While most applications need less than a thousand maps, certain programs, particularly malloc debuggers, may consume lots of them, e.g., up to one or two maps per allocation. The default value is 65536. 十三 min\_free\_kbytes This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a pages\_min value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based proportionally on its size. 十四 mmap\_min\_addr This file indicates the amount of address space which a user process will be restricted from mmaping. Since kernel null dereference bugs could accidentally operate based on the information in the first couple of pages of memory userspace processes should not be allowed to write to them. By default this value is set to 0 and no protections will be enforced by the security module. Setting this value to something like 64k will allow the vast majority of applications to work correctly and provide defense in depth against future potential kernel bugs. 十五 nr\_hugepages nr\_hugepages configures number of hugetlb page reserved for the system. 十六 nr\_pdflush\_threads The count of currently-running pdflush threads. This is a read-only value. 十七 numa\_zonelist\_order This sysctl is only for NUMA. 'Where the memory is allocated from' is controlled by zonelists. In non-NUMA case, a zonelist for GFP\_KERNEL is ordered as following: ZONE\_NORMAL -> ZONE\_DMA. This means that a memory allocation request for GFP\_KERNEL will get memory from ZONE\_DMA only when ZONE\_NORMAL is not available. In NUMA case, you can think of following 2 types of order. Assume 2 node NUMA and below is zonelist of Node(0)'s GFP\_KERNEL: (A) Node(0) ZONE\_NORMAL -> Node(0) ZONE\_DMA -> Node(1) ZONE\_NORMAL (B) Node(0) ZONE\_NORMAL -> Node(1) ZONE\_NORMAL -> Node(0) ZONE\_DMA. Type(A) offers the best locality for processes on Node(0), but ZONE\_DMA will be used before ZONE\_NORMAL exhaustion. This increases possibility of out-of-memory (OOM) of ZONE\_DMA because ZONE\_DMA is tend to be small. Type(B) cannot offer the best locality but is more robust against OOM of the DMA zone. Type(A) is called as "Node" order. Type (B) is "Zone" order. "Node order" orders the zonelists by node, then by zone within each node. Specify "\[Nn\]ode" for node order. "Zone Order" orders the zonelists by zone type, then by node within each zone. Specify "\[Zz\]one" for zone order. Specify "\[Dd\]efault" to request automatic configuration. Autoconfiguration will select "node" order in following case: (1) if the DMA zone does not exist or (2) if the DMA zone comprises greater than 50% of the available memory or (3) if any node's DMA zone comprises greater than 60% of its local memory and the amount of local memory is big enough. Otherwise, "zone" order will be selected. Default order is recommended unless this is causing problems for your system/application. 十八 overcommit\_memory Controls overcommit of system memory, possibly allowing processes to allocate (but not use) more memory than is actually available. * 0 - Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slighly more memory in this mode. This is the default. * 1 - Always overcommit. Appropriate for some scientific applications. * 2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap plus a configurable percentage (default is 50) of physical RAM. Depending on the percentage you use, in most situations this means a process will not be killed while attempting to use already-allocated memory but will receive errors on memory allocation as appropriate. 十九 overcommit\_ratio Percentage of physical memory size to include in overcommit calculations. Memory allocation limit = swapspace + physmem \* (overcommit\_ratio / 100) swapspace = total size of all swap areas physmem = size of physical memory in system 二十 page-cluster page-cluster controls the number of pages which are written to swap in a single attempt. The swap I/O size. It is a logarithmic value - setting it to zero means "1 page", setting it to 1 means "2 pages", setting it to 2 means "4 pages", etc. The default value is three (eight pages at a time). There may be some small benefits in tuning this to a different value if your workload is swap-intensive. 二十一 panic\_on\_oom This enables or disables panic on out-of-memory feature. If this is set to 1, the kernel panics when out-of-memory happens. If this is set to 0, the kernel will kill some rogue process, by calling oom\_kill(). Usually, oom\_killer can kill rogue processes and system will survive. If you want to panic the system rather than killing rogue processes, set this to 1. The default value is 0. 二十二 percpu\_pagelist\_fraction This is the fraction of pages at most (high mark pcp->high) in each zone that are allocated for each per cpu page list. The min value for this is 8. It means that we don't allow more than 1/8th of pages in each zone to be allocated in any single per\_cpu\_pagelist. This entry only changes the value of hot per cpu pagelists. User can specify a number like 100 to allocate 1/100th of each zone to each per cpu page list. The batch value of each per cpu pagelist is also updated as a result. It is set to pcp->high / 4. The upper limit of batch is (PAGE\_SHIFT \* 8). The initial value is zero. Kernel does not use this value at boot time to set the high water marks for each per cpu page list. 二十三 stat\_interval With this tunable you can configure VM statistics update interval. The default value is 1. This tunable first appeared in 2.6.22 kernel. 二十四 swap\_token\_timeout This file contains valid hold time of swap out protection token. The Linux VM has token based thrashing control mechanism and uses the token to prevent unnecessary page faults in thrashing situation. The unit of the value is second. The value would be useful to tune thrashing behavior. This tunable was removed in 2.6.20 when the algorithm got improved. 二十五 swappiness swappiness is a parameter which sets the kernel's balance between reclaiming pages from the page cache and swapping process memory. The default value is 60. If you want kernel to swap out more process memory and thus cache more file contents increase the value. Otherwise, if you would like kernel to swap less decrease it. 二十六 vdso\_enabled When this flag is set, the kernel maps a vDSO page into newly created processes and passes its address down to glibc upon exec(). This feature is enabled by default. vDSO is a virtual DSO (dynamic shared object) exposed by the kernel at some address in every process' memory. It's purpose is to speed up system calls. The mapping address used to be fixed (0xffffe000), but starting with 2.6.18 it's randomized (besides the security implications, this also helps debuggers 二十七 vfs\_cache\_pressure Controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects. At the default value of vfs\_cache\_pressure = 100 the kernel will attempt to reclaim dentries and inodes at a "fair" rate with respect to pagecache and swapcache reclaim. Decreasing vfs\_cache\_pressure causes the kernel to prefer to retain dentry and inode caches. Increasing vfs\_cache\_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes. # # <script>window.\_bd\_share\_config=\{"common":\{"bdSnsKey":\{\},"bdText":"","bdMini":"2","bdMiniList":false,"bdPic":"","bdStyle":"0","bdSize":"16"\},"share":\{\}\};with(document)0\[(getElementsByTagName('head')\[0\]||body).appendChild(createElement('script')).src='http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion='+~(-new Date()/36e5)\];</script> 阅读(970) | 评论(0) | 转发(0) | 0 上一篇:使用valgrind检查内存问题 下一篇:git使用总结 相关热门文章 * [浅析usb转serial串口设备在lin...][usb_serial_lin...] * [Qt学习笔记---信号与槽...][Qt_---_...] * [内核头文件的使用][Link 1] * [Linux内核模块LKM编译-自制Mak...][Linux_LKM_-_Mak...] * [(MAKE)手动建立makefile简单实...][MAKE_makefile_...] * linux 常见服务端口 * xmanager 2.0 for linux配置 * 【ROOTFS搭建】busybox的httpd... * openwrt中luci学习笔记 * 什么是shell * [linux dhcp peizhi roc][] * [关于Unix文件的软链接][Unix] * [求教这个命令什么意思,我是新...][...] * [sed -e "/grep/d" 是什么意思...][sed -e _grep_d_ _...] * [谁能够帮我解决LINUX 2.6 10...][LINUX 2.6 10...] 给主人留下些什么吧!~~ 评论热议 [usb_serial_lin...]: http://blog.chinaunix.net/uid-28939875-id-5711368.html [Qt_---_...]: http://blog.chinaunix.net/uid-30415354-id-5711319.html [Link 1]: http://blog.chinaunix.net/uid-25518484-id-5711290.html [Linux_LKM_-_Mak...]: http://blog.chinaunix.net/uid-29057948-id-5711135.html [MAKE_makefile_...]: http://blog.chinaunix.net/uid-29057948-id-5711133.html [linux dhcp peizhi roc]: http://ask.chinaunix.net/question/785970 [Unix]: http://ask.chinaunix.net/question/785964 [...]: http://ask.chinaunix.net/question/785935 [sed -e _grep_d_ _...]: http://ask.chinaunix.net/question/785919 [LINUX 2.6 10...]: http://ask.chinaunix.net/question/785702
还没有评论,来说两句吧...