diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/cpu-hotplug.txt | 27 | ||||
-rw-r--r-- | Documentation/cpusets.txt | 41 | ||||
-rw-r--r-- | Documentation/dvb/bt8xx.txt | 6 | ||||
-rw-r--r-- | Documentation/feature-removal-schedule.txt | 18 | ||||
-rw-r--r-- | Documentation/filesystems/ntfs.txt | 6 | ||||
-rw-r--r-- | Documentation/filesystems/tmpfs.txt | 30 | ||||
-rw-r--r-- | Documentation/filesystems/v9fs.txt | 16 | ||||
-rw-r--r-- | Documentation/fujitsu/frv/kernel-ABI.txt | 234 | ||||
-rw-r--r-- | Documentation/hwmon/w83627hf | 4 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 26 | ||||
-rw-r--r-- | Documentation/kprobes.txt | 81 | ||||
-rw-r--r-- | Documentation/mips/AU1xxx_IDE.README | 6 | ||||
-rw-r--r-- | Documentation/scsi/ChangeLog.megaraid_sas | 23 | ||||
-rw-r--r-- | Documentation/sysctl/kernel.txt | 10 | ||||
-rw-r--r-- | Documentation/video4linux/CARDLIST.saa7134 | 4 | ||||
-rw-r--r-- | Documentation/vm/page_migration | 118 | ||||
-rw-r--r-- | Documentation/x86_64/boot-options.txt | 4 |
17 files changed, 530 insertions, 124 deletions
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt index 08c5d04f308..57a09f99ecb 100644 --- a/Documentation/cpu-hotplug.txt +++ b/Documentation/cpu-hotplug.txt @@ -11,6 +11,8 @@ Joel Schopp <jschopp@austin.ibm.com> ia64/x86_64: Ashok Raj <ashok.raj@intel.com> + s390: + Heiko Carstens <heiko.carstens@de.ibm.com> Authors: Ashok Raj <ashok.raj@intel.com> Lots of feedback: Nathan Lynch <nathanl@austin.ibm.com>, @@ -44,9 +46,28 @@ maxcpus=n Restrict boot time cpus to n. Say if you have 4 cpus, using maxcpus=2 will only boot 2. You can choose to bring the other cpus later online, read FAQ's for more info. -additional_cpus=n [x86_64 only] use this to limit hotpluggable cpus. - This option sets - cpu_possible_map = cpu_present_map + additional_cpus +additional_cpus*=n Use this to limit hotpluggable cpus. This option sets + cpu_possible_map = cpu_present_map + additional_cpus + +(*) Option valid only for following architectures +- x86_64, ia64, s390 + +ia64 and x86_64 use the number of disabled local apics in ACPI tables MADT +to determine the number of potentially hot-pluggable cpus. The implementation +should only rely on this to count the #of cpus, but *MUST* not rely on the +apicid values in those tables for disabled apics. In the event BIOS doesnt +mark such hot-pluggable cpus as disabled entries, one could use this +parameter "additional_cpus=x" to represent those cpus in the cpu_possible_map. + +s390 uses the number of cpus it detects at IPL time to also the number of bits +in cpu_possible_map. If it is desired to add additional cpus at a later time +the number should be specified using this option or the possible_cpus option. + +possible_cpus=n [s390 only] use this to set hotpluggable cpus. + This option sets possible_cpus bits in + cpu_possible_map. Thus keeping the numbers of bits set + constant even if the machine gets rebooted. + This option overrides additional_cpus. CPU maps and such ----------------- diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt index 990998ee10b..30c41459953 100644 --- a/Documentation/cpusets.txt +++ b/Documentation/cpusets.txt @@ -4,8 +4,9 @@ Copyright (C) 2004 BULL SA. Written by Simon.Derr@bull.net -Portions Copyright (c) 2004 Silicon Graphics, Inc. +Portions Copyright (c) 2004-2006 Silicon Graphics, Inc. Modified by Paul Jackson <pj@sgi.com> +Modified by Christoph Lameter <clameter@sgi.com> CONTENTS: ========= @@ -90,7 +91,8 @@ This can be especially valuable on: These subsets, or "soft partitions" must be able to be dynamically adjusted, as the job mix changes, without impacting other concurrently -executing jobs. +executing jobs. The location of the running jobs pages may also be moved +when the memory locations are changed. The kernel cpuset patch provides the minimum essential kernel mechanisms required to efficiently implement such subsets. It @@ -102,8 +104,8 @@ memory allocator code. 1.3 How are cpusets implemented ? --------------------------------- -Cpusets provide a Linux kernel (2.6.7 and above) mechanism to constrain -which CPUs and Memory Nodes are used by a process or set of processes. +Cpusets provide a Linux kernel mechanism to constrain which CPUs and +Memory Nodes are used by a process or set of processes. The Linux kernel already has a pair of mechanisms to specify on which CPUs a task may be scheduled (sched_setaffinity) and on which Memory @@ -371,22 +373,17 @@ cpusets memory placement policy 'mems' subsequently changes. If the cpuset flag file 'memory_migrate' is set true, then when tasks are attached to that cpuset, any pages that task had allocated to it on nodes in its previous cpuset are migrated -to the tasks new cpuset. Depending on the implementation, -this migration may either be done by swapping the page out, -so that the next time the page is referenced, it will be paged -into the tasks new cpuset, usually on the node where it was -referenced, or this migration may be done by directly copying -the pages from the tasks previous cpuset to the new cpuset, -where possible to the same node, relative to the new cpuset, -as the node that held the page, relative to the old cpuset. +to the tasks new cpuset. The relative placement of the page within +the cpuset is preserved during these migration operations if possible. +For example if the page was on the second valid node of the prior cpuset +then the page will be placed on the second valid node of the new cpuset. + Also if 'memory_migrate' is set true, then if that cpusets 'mems' file is modified, pages allocated to tasks in that cpuset, that were on nodes in the previous setting of 'mems', -will be moved to nodes in the new setting of 'mems.' Again, -depending on the implementation, this might be done by swapping, -or by direct copying. In either case, pages that were not in -the tasks prior cpuset, or in the cpusets prior 'mems' setting, -will not be moved. +will be moved to nodes in the new setting of 'mems.' +Pages that were not in the tasks prior cpuset, or in the cpusets +prior 'mems' setting, will not be moved. There is an exception to the above. If hotplug functionality is used to remove all the CPUs that are currently assigned to a cpuset, @@ -434,16 +431,6 @@ and then start a subshell 'sh' in that cpuset: # The next line should display '/Charlie' cat /proc/self/cpuset -In the case that a change of cpuset includes wanting to move already -allocated memory pages, consider further the work of IWAMOTO -Toshihiro <iwamoto@valinux.co.jp> for page remapping and memory -hotremoval, which can be found at: - - http://people.valinux.co.jp/~iwamoto/mh.html - -The integration of cpusets with such memory migration is not yet -available. - In the future, a C library interface to cpusets will likely be available. For now, the only way to query or modify cpusets is via the cpuset file system, using the various cd, mkdir, echo, cat, diff --git a/Documentation/dvb/bt8xx.txt b/Documentation/dvb/bt8xx.txt index df6c05453cb..52ed462061d 100644 --- a/Documentation/dvb/bt8xx.txt +++ b/Documentation/dvb/bt8xx.txt @@ -111,4 +111,8 @@ source: linux/Documentation/video4linux/CARDLIST.bttv If you have problems with this please do ask on the mailing list. -- -Authors: Richard Walker, Jamie Honan, Michael Hunold, Manu Abraham +Authors: Richard Walker, + Jamie Honan, + Michael Hunold, + Manu Abraham, + Michael Krufky diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index b730d765b52..81bc51369f5 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -171,3 +171,21 @@ Why: The ISA interface is faster and should be always available. The I2C probing is also known to cause trouble in at least one case (see bug #5889.) Who: Jean Delvare <khali@linux-fr.org> + +--------------------------- + +What: mount/umount uevents +When: February 2007 +Why: These events are not correct, and do not properly let userspace know + when a file system has been mounted or unmounted. Userspace should + poll the /proc/mounts file instead to detect this properly. +Who: Greg Kroah-Hartman <gregkh@suse.de> + +--------------------------- + +What: Support for NEC DDB5074 and DDB5476 evaluation boards. +When: June 2006 +Why: Board specific code doesn't build anymore since ~2.6.0 and no + users have complained indicating there is no more need for these + boards. This should really be considered a last call. +Who: Ralf Baechle <ralf@linux-mips.org> diff --git a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.txt index 614de312490..25116858789 100644 --- a/Documentation/filesystems/ntfs.txt +++ b/Documentation/filesystems/ntfs.txt @@ -457,6 +457,12 @@ ChangeLog Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog. +2.1.26: + - Implement support for sector sizes above 512 bytes (up to the maximum + supported by NTFS which is 4096 bytes). + - Enhance support for NTFS volumes which were supported by Windows but + not by Linux due to invalid attribute list attribute flags. + - A few minor updates and bug fixes. 2.1.25: - Write support is now extended with write(2) being able to both overwrite existing file data and to extend files. Also, if a write diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt index dbe4d87d261..1773106976a 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.txt @@ -79,15 +79,27 @@ that instance in a system with many cpus making intensive use of it. tmpfs has a mount option to set the NUMA memory allocation policy for -all files in that instance: -mpol=interleave prefers to allocate memory from each node in turn -mpol=default prefers to allocate memory from the local node -mpol=bind prefers to allocate from mpol_nodelist -mpol=preferred prefers to allocate from first node in mpol_nodelist +all files in that instance (if CONFIG_NUMA is enabled) - which can be +adjusted on the fly via 'mount -o remount ...' -The following mount option is used in conjunction with mpol=interleave, -mpol=bind or mpol=preferred: -mpol_nodelist: nodelist suitable for parsing with nodelist_parse. +mpol=default prefers to allocate memory from the local node +mpol=prefer:Node prefers to allocate memory from the given Node +mpol=bind:NodeList allocates memory only from nodes in NodeList +mpol=interleave prefers to allocate from each node in turn +mpol=interleave:NodeList allocates from each node of NodeList in turn + +NodeList format is a comma-separated list of decimal numbers and ranges, +a range being two hyphen-separated decimal numbers, the smallest and +largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15 + +Note that trying to mount a tmpfs with an mpol option will fail if the +running kernel does not support NUMA; and will fail if its nodelist +specifies a node >= MAX_NUMNODES. If your system relies on that tmpfs +being mounted, but from time to time runs a kernel built without NUMA +capability (perhaps a safe recovery kernel), or configured to support +fewer nodes, then it is advisable to omit the mpol option from automatic +mount options. It can be added later, when the tmpfs is already mounted +on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'. To specify the initial root directory you can use the following mount @@ -109,4 +121,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root. Author: Christoph Rohland <cr@sap.com>, 1.12.01 Updated: - Hugh Dickins <hugh@veritas.com>, 13 March 2005 + Hugh Dickins <hugh@veritas.com>, 19 February 2006 diff --git a/Documentation/filesystems/v9fs.txt b/Documentation/filesystems/v9fs.txt index 4e92feb6b50..24c7a9c41f0 100644 --- a/Documentation/filesystems/v9fs.txt +++ b/Documentation/filesystems/v9fs.txt @@ -57,8 +57,6 @@ OPTIONS port=n port to connect to on the remote server - timeout=n request timeouts (in ms) (default 60000ms) - noextend force legacy mode (no 9P2000.u semantics) uid attempt to mount as a particular uid @@ -74,10 +72,16 @@ OPTIONS RESOURCES ========= -The Linux version of the 9P server, along with some client-side utilities -can be found at http://v9fs.sf.net (along with a CVS repository of the -development branch of this module). There are user and developer mailing -lists here, as well as a bug-tracker. +The Linux version of the 9P server is now maintained under the npfs project +on sourceforge (http://sourceforge.net/projects/npfs). + +There are user and developer mailing lists available through the v9fs project +on sourceforge (http://sourceforge.net/projects/v9fs). + +News and other information is maintained on SWiK (http://swik.net/v9fs). + +Bug reports may be issued through the kernel.org bugzilla +(http://bugzilla.kernel.org) For more information on the Plan 9 Operating System check out http://plan9.bell-labs.com/plan9 diff --git a/Documentation/fujitsu/frv/kernel-ABI.txt b/Documentation/fujitsu/frv/kernel-ABI.txt new file mode 100644 index 00000000000..0ed9b0a779b --- /dev/null +++ b/Documentation/fujitsu/frv/kernel-ABI.txt @@ -0,0 +1,234 @@ + ================================= + INTERNAL KERNEL ABI FOR FR-V ARCH + ================================= + +The internal FRV kernel ABI is not quite the same as the userspace ABI. A number of the registers +are used for special purposed, and the ABI is not consistent between modules vs core, and MMU vs +no-MMU. + +This partly stems from the fact that FRV CPUs do not have a separate supervisor stack pointer, and +most of them do not have any scratch registers, thus requiring at least one general purpose +register to be clobbered in such an event. Also, within the kernel core, it is possible to simply +jump or call directly between functions using a relative offset. This cannot be extended to modules +for the displacement is likely to be too far. Thus in modules the address of a function to call +must be calculated in a register and then used, requiring two extra instructions. + +This document has the following sections: + + (*) System call register ABI + (*) CPU operating modes + (*) Internal kernel-mode register ABI + (*) Internal debug-mode register ABI + (*) Virtual interrupt handling + + +======================== +SYSTEM CALL REGISTER ABI +======================== + +When a system call is made, the following registers are effective: + + REGISTERS CALL RETURN + =============== ======================= ======================= + GR7 System call number Preserved + GR8 Syscall arg #1 Return value + GR9-GR13 Syscall arg #2-6 Preserved + + +=================== +CPU OPERATING MODES +=================== + +The FR-V CPU has three basic operating modes. In order of increasing capability: + + (1) User mode. + + Basic userspace running mode. + + (2) Kernel mode. + + Normal kernel mode. There are many additional control registers available that may be + accessed in this mode, in addition to all the stuff available to user mode. This has two + submodes: + + (a) Exceptions enabled (PSR.T == 1). + + Exceptions will invoke the appropriate normal kernel mode handler. On entry to the + handler, the PSR.T bit will be cleared. + + (b) Exceptions disabled (PSR.T == 0). + + No exceptions or interrupts may happen. Any mandatory exceptions will cause the CPU to + halt unless the CPU is told to jump into debug mode instead. + + (3) Debug mode. + + No exceptions may happen in this mode. Memory protection and management exceptions will be + flagged for later consideration, but the exception handler won't be invoked. Debugging traps + such as hardware breakpoints and watchpoints will be ignored. This mode is entered only by + debugging events obtained from the other two modes. + + All kernel mode registers may be accessed, plus a few extra debugging specific registers. + + +================================= +INTERNAL KERNEL-MODE REGISTER ABI +================================= + +There are a number of permanent register assignments that are set up by entry.S in the exception +prologue. Note that there is a complete set of exception prologues for each of user->kernel +transition and kernel->kernel transition. There are also user->debug and kernel->debug mode +transition prologues. + + + REGISTER FLAVOUR USE + =============== ======= ==================================================== + GR1 Supervisor stack pointer + GR15 Current thread info pointer + GR16 GP-Rel base register for small data + GR28 Current exception frame pointer (__frame) + GR29 Current task pointer (current) + GR30 Destroyed by kernel mode entry + GR31 NOMMU Destroyed by debug mode entry + GR31 MMU Destroyed by TLB miss kernel mode entry + CCR.ICC2 Virtual interrupt disablement tracking + CCCR.CC3 Cleared by exception prologue (atomic op emulation) + SCR0 MMU See mmu-layout.txt. + SCR1 MMU See mmu-layout.txt. + SCR2 MMU Save for EAR0 (destroyed by icache insns in debug mode) + SCR3 MMU Save for GR31 during debug exceptions + DAMR/IAMR NOMMU Fixed memory protection layout. + DAMR/IAMR MMU See mmu-layout.txt. + + +Certain registers are also used or modified across function calls: + + REGISTER CALL RETURN + =============== =============================== =============================== + GR0 Fixed Zero - + GR2 Function call frame pointer + GR3 Special Preserved + GR3-GR7 - Clobbered + GR8 Function call arg #1 Return value (or clobbered) + GR9 Function call arg #2 Return value MSW (or clobbered) + GR10-GR13 Function call arg #3-#6 Clobbered + GR14 - Clobbered + GR15-GR16 Special Preserved + GR17-GR27 - Preserved + GR28-GR31 Special Only accessed explicitly + LR Return address after CALL Clobbered + CCR/CCCR - Mostly Clobbered + + +================================ +INTERNAL DEBUG-MODE REGISTER ABI +================================ + +This is the same as the kernel-mode register ABI for functions calls. The difference is that in +debug-mode there's a different stack and a different exception frame. Almost all the global +registers from kernel-mode (including the stack pointer) may be changed. + + REGISTER FLAVOUR USE + =============== ======= ==================================================== + GR1 Debug stack pointer + GR16 GP-Rel base register for small data + GR31 Current debug exception frame pointer (__debug_frame) + SCR3 MMU Saved value of GR31 + + +Note that debug mode is able to interfere with the kernel's emulated atomic ops, so it must be +exceedingly careful not to do any that would interact with the main kernel in this regard. Hence +the debug mode code (gdbstub) is almost completely self-contained. The only external code used is +the sprintf family of functions. + +Futhermore, break.S is so complicated because single-step mode does not switch off on entry to an +exception. That means unless manually disabled, single-stepping will blithely go on stepping into +things like interrupts. See gdbstub.txt for more information. + + +========================== +VIRTUAL INTERRUPT HANDLING +========================== + +Because accesses to the PSR is so slow, and to disable interrupts we have to access it twice (once +to read and once to write), we don't actually disable interrupts at all if we don't have to. What +we do instead is use the ICC2 condition code flags to note virtual disablement, such that if we +then do take an interrupt, we note the flag, really disable interrupts, set another flag and resume +execution at the point the interrupt happened. Setting condition flags as a side effect of an +arithmetic or logical instruction is really fast. This use of the ICC2 only occurs within the +kernel - it does not affect userspace. + +The flags we use are: + + (*) CCR.ICC2.Z [Zero flag] + + Set to virtually disable interrupts, clear when interrupts are virtually enabled. Can be + modified by logical instructions without affecting the Carry flag. + + (*) CCR.ICC2.C [Carry flag] + + Clear to indicate hardware interrupts are really disabled, set otherwise. + + +What happens is this: + + (1) Normal kernel-mode operation. + + ICC2.Z is 0, ICC2.C is 1. + + (2) An interrupt occurs. The exception prologue examines ICC2.Z and determines that nothing needs + doing. This is done simply with an unlikely BEQ instruction. + + (3) The interrupts are disabled (local_irq_disable) + + ICC2.Z is set to 1. + + (4) If interrupts were then re-enabled (local_irq_enable): + + ICC2.Z would be set to 0. + + A TIHI #2 instruction (trap #2 if condition HI - Z==0 && C==0) would be used to trap if + interrupts were now virtually enabled, but physically disabled - which they're not, so the + trap isn't taken. The kernel would then be back to state (1). + + (5) An interrupt occurs. The exception prologue examines ICC2.Z and determines that the interrupt + shouldn't actually have happened. It jumps aside, and there disabled interrupts by setting + PSR.PIL to 14 and then it clears ICC2.C. + + (6) If interrupts were then saved and disabled again (local_irq_save): + + ICC2.Z would be shifted into the save variable and masked off (giving a 1). + + ICC2.Z would then be set to 1 (thus unchanged), and ICC2.C would be unaffected (ie: 0). + + (7) If interrupts were then restored from state (6) (local_irq_restore): + + ICC2.Z would be set to indicate the result of XOR'ing the saved value (ie: 1) with 1, which + gives a result of 0 - thus leaving ICC2.Z set. + + ICC2.C would remain unaffected (ie: 0). + + A TIHI #2 instruction would be used to again assay the current state, but this would do + nothing as Z==1. + + (8) If interrupts were then enabled (local_irq_enable): + + ICC2.Z would be cleared. ICC2.C would be left unaffected. Both flags would now be 0. + + A TIHI #2 instruction again issued to assay the current state would then trap as both Z==0 + [interrupts virtually enabled] and C==0 [interrupts really disabled] would then be true. + + (9) The trap #2 handler would simply enable hardware interrupts (set PSR.PIL to 0), set ICC2.C to + 1 and return. + +(10) Immediately upon returning, the pending interrupt would be taken. + +(11) The interrupt handler would take the path of actually processing the interrupt (ICC2.Z is + clear, BEQ fails as per step (2)). + +(12) The interrupt handler would then set ICC2.C to 1 since hardware interrupts are definitely + enabled - or else the kernel wouldn't be here. + +(13) On return from the interrupt handler, things would be back to state (1). + +This trap (#2) is only available in kernel mode. In user mode it will result in SIGILL. diff --git a/Documentation/hwmon/w83627hf b/Documentation/hwmon/w83627hf index 5d23776e990..bbeaba68044 100644 --- a/Documentation/hwmon/w83627hf +++ b/Documentation/hwmon/w83627hf @@ -36,6 +36,10 @@ Module Parameters (default is 1) Use 'init=0' to bypass initializing the chip. Try this if your computer crashes when you load the module. +* reset: int + (default is 0) + The driver used to reset the chip on load, but does no more. Use + 'reset=1' to restore the old behavior. Report if you need to do this. Description ----------- diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 84370363da8..fc99075e0af 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -335,6 +335,12 @@ running once the system is up. timesource is not avalible, it defaults to PIT. Format: { pit | tsc | cyclone | pmtmr } + disable_8254_timer + enable_8254_timer + [IA32/X86_64] Disable/Enable interrupt 0 timer routing + over the 8254 in addition to over the IO-APIC. The + kernel tries to set a sensible default. + hpet= [IA-32,HPET] option to disable HPET and use PIT. Format: disable @@ -1034,6 +1040,8 @@ running once the system is up. nomce [IA-32] Machine Check Exception + nomca [IA-64] Disable machine check abort handling + noresidual [PPC] Don't use residual data on PReP machines. noresume [SWSUSP] Disables resume and restores original swap @@ -1133,6 +1141,8 @@ running once the system is up. Mechanism 1. conf2 [IA-32] Force use of PCI Configuration Mechanism 2. + nommconf [IA-32,X86_64] Disable use of MMCONFIG for PCI + Configuration nosort [IA-32] Don't sort PCI devices according to order given by the PCI BIOS. This sorting is done to get a device order compatible with @@ -1280,6 +1290,19 @@ running once the system is up. New name for the ramdisk parameter. See Documentation/ramdisk.txt. + rcu.blimit= [KNL,BOOT] Set maximum number of finished + RCU callbacks to process in one batch. + + rcu.qhimark= [KNL,BOOT] Set threshold of queued + RCU callbacks over which batch limiting is disabled. + + rcu.qlowmark= [KNL,BOOT] Set threshold of queued + RCU callbacks below which batch limiting is re-enabled. + + rcu.rsinterval= [KNL,BOOT,SMP] Set the number of additional + RCU callbacks to queued before forcing reschedule + on all cpus. + rdinit= [KNL] Format: <full_path> Run specified binary instead of /init from the ramdisk, @@ -1636,6 +1659,9 @@ running once the system is up. Format: <irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]] + norandmaps Don't use address space randomization + Equivalent to echo 0 > /proc/sys/kernel/randomize_va_space + ______________________________________________________________________ Changelog: diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index 0ea5a0c6e82..2c3b1eae428 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt @@ -136,17 +136,20 @@ Kprobes, jprobes, and return probes are implemented on the following architectures: - i386 -- x86_64 (AMD-64, E64MT) +- x86_64 (AMD-64, EM64T) - ppc64 -- ia64 (Support for probes on certain instruction types is still in progress.) +- ia64 (Does not support probes on instruction slot1.) - sparc64 (Return probes not yet implemented.) 3. Configuring Kprobes When configuring the kernel using make menuconfig/xconfig/oldconfig, -ensure that CONFIG_KPROBES is set to "y". Under "Kernel hacking", -look for "Kprobes". You may have to enable "Kernel debugging" -(CONFIG_DEBUG_KERNEL) before you can enable Kprobes. +ensure that CONFIG_KPROBES is set to "y". Under "Instrumentation +Support", look for "Kprobes". + +So that you can load and unload Kprobes-based instrumentation modules, +make sure "Loadable module support" (CONFIG_MODULES) and "Module +unloading" (CONFIG_MODULE_UNLOAD) are set to "y". You may also want to ensure that CONFIG_KALLSYMS and perhaps even CONFIG_KALLSYMS_ALL are set to "y", since kallsyms_lookup_name() @@ -262,18 +265,18 @@ at any time after the probe has been registered. 5. Kprobes Features and Limitations -As of Linux v2.6.12, Kprobes allows multiple probes at the same -address. Currently, however, there cannot be multiple jprobes on -the same function at the same time. +Kprobes allows multiple probes at the same address. Currently, +however, there cannot be multiple jprobes on the same function at +the same time. In general, you can install a probe anywhere in the kernel. In particular, you can probe interrupt handlers. Known exceptions are discussed in this section. -For obvious reasons, it's a bad idea to install a probe in -the code that implements Kprobes (mostly kernel/kprobes.c and -arch/*/kernel/kprobes.c). A patch in the v2.6.13 timeframe instructs -Kprobes to reject such requests. +The register_*probe functions will return -EINVAL if you attempt +to install a probe in the code that implements Kprobes (mostly +kernel/kprobes.c and arch/*/kernel/kprobes.c, but also functions such +as do_page_fault and notifier_call_chain). If you install a probe in an inline-able function, Kprobes makes no attempt to chase down all inline instances of the function and @@ -290,18 +293,14 @@ from the accidental ones. Don't drink and probe. Kprobes makes no attempt to prevent probe handlers from stepping on each other -- e.g., probing printk() and then calling printk() from a -probe handler. As of Linux v2.6.12, if a probe handler hits a probe, -that second probe's handlers won't be run in that instance. - -In Linux v2.6.12 and previous versions, Kprobes' data structures are -protected by a single lock that is held during probe registration and -unregistration and while handlers are run. Thus, no two handlers -can run simultaneously. To improve scalability on SMP systems, -this restriction will probably be removed soon, in which case -multiple handlers (or multiple instances of the same handler) may -run concurrently on different CPUs. Code your handlers accordingly. - -Kprobes does not use semaphores or allocate memory except during +probe handler. If a probe handler hits a probe, that second probe's +handlers won't be run in that instance, and the kprobe.nmissed member +of the second probe will be incremented. + +As of Linux v2.6.15-rc1, multiple handlers (or multiple instances of +the same handler) may run concurrently on different CPUs. + +Kprobes does not use mutexes or allocate memory except during registration and unregistration. Probe handlers are run with preemption disabled. Depending on the @@ -316,11 +315,18 @@ address instead of the real return address for kretprobed functions. (As far as we can tell, __builtin_return_address() is used only for instrumentation and error reporting.) -If the number of times a function is called does not match the -number of times it returns, registering a return probe on that -function may produce undesirable results. We have the do_exit() -and do_execve() cases covered. do_fork() is not an issue. We're -unaware of other specific cases where this could be a problem. +If the number of times a function is called does not match the number +of times it returns, registering a return probe on that function may +produce undesirable results. We have the do_exit() case covered. +do_execve() and do_fork() are not an issue. We're unaware of other +specific cases where this could be a problem. + +If, upon entry to or exit from a function, the CPU is running on +a stack other than that of the current task, registering a return +probe on that function may produce undesirable results. For this +reason, Kprobes doesn't support return probes (or kprobes or jprobes) +on the x86_64 version of __switch_to(); the registration functions +return -EINVAL. 6. Probe Overhead @@ -347,14 +353,12 @@ k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99 7. TODO -a. SystemTap (http://sourceware.org/systemtap): Work in progress -to provide a simplified programming interface for probe-based -instrumentation. -b. Improved SMP scalability: Currently, work is in progress to handle -multiple kprobes in parallel. -c. Kernel return probes for sparc64. -d. Support for other architectures. -e. User-space probes. +a. SystemTap (http://sourceware.org/systemtap): Provides a simplified +programming interface for probe-based instrumentation. Try it out. +b. Kernel return probes for sparc64. +c. Support for other architectures. +d. User-space probes. +e. Watchpoint probes (which fire on data references). 8. Kprobes Example @@ -411,8 +415,7 @@ int init_module(void) printk("Couldn't find %s to plant kprobe\n", "do_fork"); return -1; } - ret = register_kprobe(&kp); - if (ret < 0) { + if ((ret = register_kprobe(&kp) < 0)) { printk("register_kprobe failed, returned %d\n", ret); return -1; } diff --git a/Documentation/mips/AU1xxx_IDE.README b/Documentation/mips/AU1xxx_IDE.README index a7e4c4ea356..afb31c141d9 100644 --- a/Documentation/mips/AU1xxx_IDE.README +++ b/Documentation/mips/AU1xxx_IDE.README @@ -95,11 +95,13 @@ CONFIG_BLK_DEV_IDEDMA_PCI=y CONFIG_IDEDMA_PCI_AUTO=y CONFIG_BLK_DEV_IDE_AU1XXX=y CONFIG_BLK_DEV_IDE_AU1XXX_MDMA2_DBDMA=y -CONFIG_BLK_DEV_IDE_AU1XXX_BURSTABLE_ON=y CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ=128 CONFIG_BLK_DEV_IDEDMA=y CONFIG_IDEDMA_AUTO=y +Also define 'IDE_AU1XXX_BURSTMODE' in 'drivers/ide/mips/au1xxx-ide.c' to enable +the burst support on DBDMA controller. + If the used system need the USB support enable the following kernel configs for high IDE to USB throughput. @@ -115,6 +117,8 @@ CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ=128 CONFIG_BLK_DEV_IDEDMA=y CONFIG_IDEDMA_AUTO=y +Also undefine 'IDE_AU1XXX_BURSTMODE' in 'drivers/ide/mips/au1xxx-ide.c' to +disable the burst support on DBDMA controller. ADD NEW HARD DISC TO WHITE OR BLACK LIST ---------------------------------------- diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas index f8c16cbf56b..2dafa63bd37 100644 --- a/Documentation/scsi/ChangeLog.megaraid_sas +++ b/Documentation/scsi/ChangeLog.megaraid_sas @@ -1,3 +1,26 @@ +1 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com> +2 Current Version : 00.00.02.04 +3 Older Version : 00.00.02.04 + +i. Support for 1078 type (ppc IOP) controller, device id : 0x60 added. + During initialization, depending on the device id, the template members + are initialized with function pointers specific to the ppc or + xscale controllers. + + -Sumant Patro <Sumant.Patro@lsil.com> + +1 Release Date : Fri Feb 03 14:16:25 PST 2006 - Sumant Patro + <Sumant.Patro@lsil.com> +2 Current Version : 00.00.02.04 +3 Older Version : 00.00.02.02 +i. Register 16 byte CDB capability with scsi midlayer + + "Ths patch properly registers the 16 byte command length capability of the + megaraid_sas controlled hardware with the scsi midlayer. All megaraid_sas + hardware supports 16 byte CDB's." + + -Joshua Giles <joshua_giles@dell.com> + 1 Release Date : Mon Jan 23 14:09:01 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com> 2 Current Version : 00.00.02.02 3 Older Version : 00.00.02.01 diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 9f11d36a8c1..b0c7ab93dcb 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -16,6 +16,7 @@ before actually making adjustments. Currently, these files might (depending on your configuration) show up in /proc/sys/kernel: +- acpi_video_flags - acct - core_pattern - core_uses_pid @@ -57,6 +58,15 @@ show up in /proc/sys/kernel: ============================================================== +acpi_video_flags: + +flags + +See Doc*/kernel/power/video.txt, it allows mode of video boot to be +set during run time. + +============================================================== + acct: highwater lowwater frequency diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134 index 8a352597830..da4fb890165 100644 --- a/Documentation/video4linux/CARDLIST.saa7134 +++ b/Documentation/video4linux/CARDLIST.saa7134 @@ -13,7 +13,7 @@ 12 -> Medion 7134 [16be:0003] 13 -> Typhoon TV+Radio 90031 14 -> ELSA EX-VISION 300TV [1048:226b] - 15 -> ELSA EX-VISION 500TV [1048:226b] + 15 -> ELSA EX-VISION 500TV [1048:226a] 16 -> ASUS TV-FM 7134 [1043:4842,1043:4830,1043:4840] 17 -> AOPEN VA1000 POWER [1131:7133] 18 -> BMK MPEX No Tuner @@ -75,7 +75,7 @@ 74 -> LifeView FlyTV Platinum Mini2 [14c0:1212] 75 -> AVerMedia AVerTVHD MCE A180 [1461:1044] 76 -> SKNet MonsterTV Mobile [1131:4ee9] - 77 -> Pinnacle PCTV 110i (saa7133) [11bd:002e] + 77 -> Pinnacle PCTV 40i/50i/110i (saa7133) [11bd:002e] 78 -> ASUSTeK P7131 Dual [1043:4862] 79 -> Sedna/MuchTV PC TV Cardbus TV/Radio (ITO25 Rev:2B) 80 -> ASUS Digimatrix TV [1043:0210] diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration index c52820fcf50..0dd4ef30c36 100644 --- a/Documentation/vm/page_migration +++ b/Documentation/vm/page_migration @@ -12,12 +12,18 @@ is running. Page migration allows a process to manually relocate the node on which its pages are located through the MF_MOVE and MF_MOVE_ALL options while setting -a new memory policy. The pages of process can also be relocated +a new memory policy via mbind(). The pages of process can also be relocated from another process using the sys_migrate_pages() function call. The migrate_pages function call takes two sets of nodes and moves pages of a process that are located on the from nodes to the destination nodes. - -Manual migration is very useful if for example the scheduler has relocated +Page migration functions are provided by the numactl package by Andi Kleen +(a version later than 0.9.3 is required. Get it from +ftp://ftp.suse.com/pub/people/ak). numactl provided libnuma which +provides an interface similar to other numa functionality for page migration. +cat /proc/<pid>/numa_maps allows an easy review of where the pages of +a process are located. See also the numa_maps manpage in the numactl package. + +Manual migration is useful if for example the scheduler has relocated a process to a processor on a distant node. A batch scheduler or an administrator may detect the situation and move the pages of the process nearer to the new processor. At some point in the future we may have @@ -25,10 +31,12 @@ some mechanism in the scheduler that will automatically move the pages. Larger installations usually partition the system using cpusets into sections of nodes. Paul Jackson has equipped cpusets with the ability to -move pages when a task is moved to another cpuset. This allows automatic -control over locality of a process. If a task is moved to a new cpuset -then also all its pages are moved with it so that the performance of the -process does not sink dramatically (as is the case today). +move pages when a task is moved to another cpuset (See ../cpusets.txt). +Cpusets allows the automation of process locality. If a task is moved to +a new cpuset then also all its pages are moved with it so that the +performance of the process does not sink dramatically. Also the pages +of processes in a cpuset are moved if the allowed memory nodes of a +cpuset are changed. Page migration allows the preservation of the relative location of pages within a group of nodes for all migration techniques which will preserve a @@ -37,22 +45,26 @@ process. This is necessary in order to preserve the memory latencies. Processes will run with similar performance after migration. Page migration occurs in several steps. First a high level -description for those trying to use migrate_pages() and then -a low level description of how the low level details work. +description for those trying to use migrate_pages() from the kernel +(for userspace usage see the Andi Kleen's numactl package mentioned above) +and then a low level description of how the low level details work. -A. Use of migrate_pages() -------------------------- +A. In kernel use of migrate_pages() +----------------------------------- 1. Remove pages from the LRU. Lists of pages to be migrated are generated by scanning over pages and moving them into lists. This is done by - calling isolate_lru_page() or __isolate_lru_page(). + calling isolate_lru_page(). Calling isolate_lru_page increases the references to the page - so that it cannot vanish under us. + so that it cannot vanish while the page migration occurs. + It also prevents the swapper or other scans to encounter + the page. -2. Generate a list of newly allocates page to move the contents - of the first list to. +2. Generate a list of newly allocates page. These pages will contain the + contents of the pages from the first list after page migration is + complete. 3. The migrate_pages() function is called which attempts to do the migration. It returns the moved pages in the @@ -63,13 +75,17 @@ A. Use of migrate_pages() 4. The leftover pages of various types are returned to the LRU using putback_to_lru_pages() or otherwise disposed of. The pages will still have the refcount as - increased by isolate_lru_pages()! + increased by isolate_lru_pages() if putback_to_lru_pages() is not + used! The kernel may want to handle the various cases of failures in + different ways. -B. Operation of migrate_pages() --------------------------------- +B. How migrate_pages() works +---------------------------- -migrate_pages does several passes over its list of pages. A page is moved -if all references to a page are removable at the time. +migrate_pages() does several passes over its list of pages. A page is moved +if all references to a page are removable at the time. The page has +already been removed from the LRU via isolate_lru_page() and the refcount +is increased so that the page cannot be freed while page migration occurs. Steps: @@ -79,36 +95,40 @@ Steps: 3. Make sure that the page has assigned swap cache entry if it is an anonyous page. The swap cache reference is necessary - to preserve the information contain in the page table maps. + to preserve the information contain in the page table maps while + page migration occurs. 4. Prep the new page that we want to move to. It is locked and set to not being uptodate so that all accesses to the new - page immediately lock while we are moving references. + page immediately lock while the move is in progress. -5. All the page table references to the page are either dropped (file backed) - or converted to swap references (anonymous pages). This should decrease the - reference count. +5. All the page table references to the page are either dropped (file + backed pages) or converted to swap references (anonymous pages). + This should decrease the reference count. -6. The radix tree lock is taken +6. The radix tree lock is taken. This will cause all processes trying + to reestablish a pte to block on the radix tree spinlock. 7. The refcount of the page is examined and we back out if references remain otherwise we know that we are the only one referencing this page. 8. The radix tree is checked and if it does not contain the pointer to this - page then we back out. + page then we back out because someone else modified the mapping first. 9. The mapping is checked. If the mapping is gone then a truncate action may be in progress and we back out. -10. The new page is prepped with some settings from the old page so that accesses - to the new page will be discovered to have the correct settings. +10. The new page is prepped with some settings from the old page so that + accesses to the new page will be discovered to have the correct settings. 11. The radix tree is changed to point to the new page. -12. The reference count of the old page is dropped because the reference has now - been removed. +12. The reference count of the old page is dropped because the radix tree + reference is gone. -13. The radix tree lock is dropped. +13. The radix tree lock is dropped. With that lookups become possible again + and other processes will move from spinning on the tree lock to sleeping on + the locked new page. 14. The page contents are copied to the new page. @@ -119,11 +139,37 @@ Steps: 17. Queued up writeback on the new page is triggered. -18. If swap pte's were generated for the page then remove them again. +18. If swap pte's were generated for the page then replace them with real + ptes. This will reenable access for processes not blocked by the page lock. + +19. The page locks are dropped from the old and new page. + Processes waiting on the page lock can continue. + +20. The new page is moved to the LRU and can be scanned by the swapper + etc again. + +TODO list +--------- + +- Page migration requires the use of swap handles to preserve the + information of the anonymous page table entries. This means that swap + space is reserved but never used. The maximum number of swap handles used + is determined by CHUNK_SIZE (see mm/mempolicy.c) per ongoing migration. + Reservation of pages could be avoided by having a special type of swap + handle that does not require swap space and that would only track the page + references. Something like that was proposed by Marcelo Tosatti in the + past (search for migration cache on lkml or linux-mm@kvack.org). -19. The locks are dropped from the old and new page. +- Page migration unmaps ptes for file backed pages and requires page + faults to reestablish these ptes. This could be optimized by somehow + recording the references before migration and then reestablish them later. + However, there are several locking challenges that have to be overcome + before this is possible. -20. The new page is moved to the LRU. +- Page migration generates read ptes for anonymous pages. Dirty page + faults are required to make the pages writable again. It may be possible + to generate a pte marked dirty if it is known that the page is dirty and + that this process has the only reference to that page. -Christoph Lameter, December 19, 2005. +Christoph Lameter, March 8, 2006. diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt index 153740f460a..1921353259a 100644 --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt @@ -52,6 +52,10 @@ APICs apicmaintimer. Useful when your PIT timer is totally broken. + disable_8254_timer / enable_8254_timer + Enable interrupt 0 timer routing over the 8254 in addition to over + the IO-APIC. The kernel tries to set a sensible default. + Early Console syntax: earlyprintk=vga |