aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-devices-cache_disable18
-rw-r--r--Documentation/DMA-API.txt12
-rw-r--r--Documentation/futex-requeue-pi.txt131
-rw-r--r--Documentation/kernel-parameters.txt15
-rw-r--r--Documentation/memory-barriers.txt129
-rw-r--r--Documentation/scheduler/sched-rt-group.txt20
-rw-r--r--Documentation/trace/ftrace.txt15
-rw-r--r--Documentation/x86/boot.txt122
-rw-r--r--Documentation/x86/x86_64/boot-options.txt5
-rw-r--r--Documentation/x86/x86_64/mm.txt9
10 files changed, 449 insertions, 27 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-cache_disable b/Documentation/ABI/testing/sysfs-devices-cache_disable
new file mode 100644
index 00000000000..175bb4f7051
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-cache_disable
@@ -0,0 +1,18 @@
+What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
+Date: August 2008
+KernelVersion: 2.6.27
+Contact: mark.langsdorf@amd.com
+Description: These files exist in every cpu's cache index directories.
+ There are currently 2 cache_disable_# files in each
+ directory. Reading from these files on a supported
+ processor will return that cache disable index value
+ for that processor and node. Writing to one of these
+ files will cause the specificed cache index to be disabled.
+
+ Currently, only AMD Family 10h Processors support cache index
+ disable, and only for their L3 caches. See the BIOS and
+ Kernel Developer's Guide at
+ http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf
+ for formatting information and other details on the
+ cache index disable.
+Users: joachim.deguara@amd.com
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index d9aa43d78bc..25fb8bcf32a 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -704,12 +704,24 @@ this directory the following files can currently be found:
The current number of free dma_debug_entries
in the allocator.
+ dma-api/driver-filter
+ You can write a name of a driver into this file
+ to limit the debug output to requests from that
+ particular driver. Write an empty string to
+ that file to disable the filter and see
+ all errors again.
+
If you have this code compiled into your kernel it will be enabled by default.
If you want to boot without the bookkeeping anyway you can provide
'dma_debug=off' as a boot parameter. This will disable DMA-API debugging.
Notice that you can not enable it again at runtime. You have to reboot to do
so.
+If you want to see debug messages only for a special device driver you can
+specify the dma_debug_driver=<drivername> parameter. This will enable the
+driver filter at boot time. The debug code will only print errors for that
+driver afterwards. This filter can be disabled or changed later using debugfs.
+
When the code disables itself at runtime this is most likely because it ran
out of dma_debug_entries. These entries are preallocated at boot. The number
of preallocated entries is defined per architecture. If it is too low for you
diff --git a/Documentation/futex-requeue-pi.txt b/Documentation/futex-requeue-pi.txt
new file mode 100644
index 00000000000..9dc1ff4fd53
--- /dev/null
+++ b/Documentation/futex-requeue-pi.txt
@@ -0,0 +1,131 @@
+Futex Requeue PI
+----------------
+
+Requeueing of tasks from a non-PI futex to a PI futex requires
+special handling in order to ensure the underlying rt_mutex is never
+left without an owner if it has waiters; doing so would break the PI
+boosting logic [see rt-mutex-desgin.txt] For the purposes of
+brevity, this action will be referred to as "requeue_pi" throughout
+this document. Priority inheritance is abbreviated throughout as
+"PI".
+
+Motivation
+----------
+
+Without requeue_pi, the glibc implementation of
+pthread_cond_broadcast() must resort to waking all the tasks waiting
+on a pthread_condvar and letting them try to sort out which task
+gets to run first in classic thundering-herd formation. An ideal
+implementation would wake the highest-priority waiter, and leave the
+rest to the natural wakeup inherent in unlocking the mutex
+associated with the condvar.
+
+Consider the simplified glibc calls:
+
+/* caller must lock mutex */
+pthread_cond_wait(cond, mutex)
+{
+ lock(cond->__data.__lock);
+ unlock(mutex);
+ do {
+ unlock(cond->__data.__lock);
+ futex_wait(cond->__data.__futex);
+ lock(cond->__data.__lock);
+ } while(...)
+ unlock(cond->__data.__lock);
+ lock(mutex);
+}
+
+pthread_cond_broadcast(cond)
+{
+ lock(cond->__data.__lock);
+ unlock(cond->__data.__lock);
+ futex_requeue(cond->data.__futex, cond->mutex);
+}
+
+Once pthread_cond_broadcast() requeues the tasks, the cond->mutex
+has waiters. Note that pthread_cond_wait() attempts to lock the
+mutex only after it has returned to user space. This will leave the
+underlying rt_mutex with waiters, and no owner, breaking the
+previously mentioned PI-boosting algorithms.
+
+In order to support PI-aware pthread_condvar's, the kernel needs to
+be able to requeue tasks to PI futexes. This support implies that
+upon a successful futex_wait system call, the caller would return to
+user space already holding the PI futex. The glibc implementation
+would be modified as follows:
+
+
+/* caller must lock mutex */
+pthread_cond_wait_pi(cond, mutex)
+{
+ lock(cond->__data.__lock);
+ unlock(mutex);
+ do {
+ unlock(cond->__data.__lock);
+ futex_wait_requeue_pi(cond->__data.__futex);
+ lock(cond->__data.__lock);
+ } while(...)
+ unlock(cond->__data.__lock);
+ /* the kernel acquired the the mutex for us */
+}
+
+pthread_cond_broadcast_pi(cond)
+{
+ lock(cond->__data.__lock);
+ unlock(cond->__data.__lock);
+ futex_requeue_pi(cond->data.__futex, cond->mutex);
+}
+
+The actual glibc implementation will likely test for PI and make the
+necessary changes inside the existing calls rather than creating new
+calls for the PI cases. Similar changes are needed for
+pthread_cond_timedwait() and pthread_cond_signal().
+
+Implementation
+--------------
+
+In order to ensure the rt_mutex has an owner if it has waiters, it
+is necessary for both the requeue code, as well as the waiting code,
+to be able to acquire the rt_mutex before returning to user space.
+The requeue code cannot simply wake the waiter and leave it to
+acquire the rt_mutex as it would open a race window between the
+requeue call returning to user space and the waiter waking and
+starting to run. This is especially true in the uncontended case.
+
+The solution involves two new rt_mutex helper routines,
+rt_mutex_start_proxy_lock() and rt_mutex_finish_proxy_lock(), which
+allow the requeue code to acquire an uncontended rt_mutex on behalf
+of the waiter and to enqueue the waiter on a contended rt_mutex.
+Two new system calls provide the kernel<->user interface to
+requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_REQUEUE_CMP_PI.
+
+FUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait()
+and pthread_cond_timedwait()) to block on the initial futex and wait
+to be requeued to a PI-aware futex. The implementation is the
+result of a high-speed collision between futex_wait() and
+futex_lock_pi(), with some extra logic to check for the additional
+wake-up scenarios.
+
+FUTEX_REQUEUE_CMP_PI is called by the waker
+(pthread_cond_broadcast() and pthread_cond_signal()) to requeue and
+possibly wake the waiting tasks. Internally, this system call is
+still handled by futex_requeue (by passing requeue_pi=1). Before
+requeueing, futex_requeue() attempts to acquire the requeue target
+PI futex on behalf of the top waiter. If it can, this waiter is
+woken. futex_requeue() then proceeds to requeue the remaining
+nr_wake+nr_requeue tasks to the PI futex, calling
+rt_mutex_start_proxy_lock() prior to each requeue to prepare the
+task as a waiter on the underlying rt_mutex. It is possible that
+the lock can be acquired at this stage as well, if so, the next
+waiter is woken to finish the acquisition of the lock.
+
+FUTEX_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but
+their sum is all that really matters. futex_requeue() will wake or
+requeue up to nr_wake + nr_requeue tasks. It will wake only as many
+tasks as it can acquire the lock for, which in the majority of cases
+should be 0 as good programming practice dictates that the caller of
+either pthread_cond_broadcast() or pthread_cond_signal() acquire the
+mutex prior to making the call. FUTEX_REQUEUE_PI requires that
+nr_wake=1. nr_requeue should be INT_MAX for broadcast and 0 for
+signal.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 8041e51f886..a5253f6d01a 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -329,11 +329,6 @@ and is between 256 and 4096 characters. It is defined in the file
flushed before they will be reused, which
is a lot of faster
- amd_iommu_size= [HW,X86-64]
- Define the size of the aperture for the AMD IOMMU
- driver. Possible values are:
- '32M', '64M' (default), '128M', '256M', '512M', '1G'
-
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: <a>,<b>
@@ -646,6 +641,13 @@ and is between 256 and 4096 characters. It is defined in the file
DMA-API debugging code disables itself because the
architectural default is too low.
+ dma_debug_driver=<driver_name>
+ With this option the DMA-API debugging driver
+ filter feature can be enabled at boot time. Just
+ pass the driver to filter for as the parameter.
+ The filter can be disabled or changed to another
+ driver later using sysfs.
+
dscc4.setup= [NET]
dtc3181e= [HW,SCSI]
@@ -1581,6 +1583,9 @@ and is between 256 and 4096 characters. It is defined in the file
noinitrd [RAM] Tells the kernel not to load any configured
initial RAM disk.
+ nointremap [X86-64, Intel-IOMMU] Do not enable interrupt
+ remapping.
+
nointroute [IA-64]
nojitter [IA64] Disables jitter checking for ITC timers.
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index f5b7127f54a..7f5809eddee 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -31,6 +31,7 @@ Contents:
- Locking functions.
- Interrupt disabling functions.
+ - Sleep and wake-up functions.
- Miscellaneous functions.
(*) Inter-CPU locking barrier effects.
@@ -1217,6 +1218,132 @@ barriers are required in such a situation, they must be provided from some
other means.
+SLEEP AND WAKE-UP FUNCTIONS
+---------------------------
+
+Sleeping and waking on an event flagged in global data can be viewed as an
+interaction between two pieces of data: the task state of the task waiting for
+the event and the global data used to indicate the event. To make sure that
+these appear to happen in the right order, the primitives to begin the process
+of going to sleep, and the primitives to initiate a wake up imply certain
+barriers.
+
+Firstly, the sleeper normally follows something like this sequence of events:
+
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (event_indicated)
+ break;
+ schedule();
+ }
+
+A general memory barrier is interpolated automatically by set_current_state()
+after it has altered the task state:
+
+ CPU 1
+ ===============================
+ set_current_state();
+ set_mb();
+ STORE current->state
+ <general barrier>
+ LOAD event_indicated
+
+set_current_state() may be wrapped by:
+
+ prepare_to_wait();
+ prepare_to_wait_exclusive();
+
+which therefore also imply a general memory barrier after setting the state.
+The whole sequence above is available in various canned forms, all of which
+interpolate the memory barrier in the right place:
+
+ wait_event();
+ wait_event_interruptible();
+ wait_event_interruptible_exclusive();
+ wait_event_interruptible_timeout();
+ wait_event_killable();
+ wait_event_timeout();
+ wait_on_bit();
+ wait_on_bit_lock();
+
+
+Secondly, code that performs a wake up normally follows something like this:
+
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+or:
+
+ event_indicated = 1;
+ wake_up_process(event_daemon);
+
+A write memory barrier is implied by wake_up() and co. if and only if they wake
+something up. The barrier occurs before the task state is cleared, and so sits
+between the STORE to indicate the event and the STORE to set TASK_RUNNING:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ set_current_state(); STORE event_indicated
+ set_mb(); wake_up();
+ STORE current->state <write barrier>
+ <general barrier> STORE current->state
+ LOAD event_indicated
+
+The available waker functions include:
+
+ complete();
+ wake_up();
+ wake_up_all();
+ wake_up_bit();
+ wake_up_interruptible();
+ wake_up_interruptible_all();
+ wake_up_interruptible_nr();
+ wake_up_interruptible_poll();
+ wake_up_interruptible_sync();
+ wake_up_interruptible_sync_poll();
+ wake_up_locked();
+ wake_up_locked_poll();
+ wake_up_nr();
+ wake_up_poll();
+ wake_up_process();
+
+
+[!] Note that the memory barriers implied by the sleeper and the waker do _not_
+order multiple stores before the wake-up with respect to loads of those stored
+values after the sleeper has called set_current_state(). For instance, if the
+sleeper does:
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (event_indicated)
+ break;
+ __set_current_state(TASK_RUNNING);
+ do_something(my_data);
+
+and the waker does:
+
+ my_data = value;
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+there's no guarantee that the change to event_indicated will be perceived by
+the sleeper as coming after the change to my_data. In such a circumstance, the
+code on both sides must interpolate its own memory barriers between the
+separate data accesses. Thus the above sleeper ought to do:
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (event_indicated) {
+ smp_rmb();
+ do_something(my_data);
+ }
+
+and the waker should do:
+
+ my_data = value;
+ smp_wmb();
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+
MISCELLANEOUS FUNCTIONS
-----------------------
@@ -1366,7 +1493,7 @@ WHERE ARE MEMORY BARRIERS NEEDED?
Under normal operation, memory operation reordering is generally not going to
be a problem as a single-threaded linear piece of code will still appear to
-work correctly, even if it's in an SMP kernel. There are, however, three
+work correctly, even if it's in an SMP kernel. There are, however, four
circumstances in which reordering definitely _could_ be a problem:
(*) Interprocessor interaction.
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt
index 5ba4d3fc625..1df7f9cdab0 100644
--- a/Documentation/scheduler/sched-rt-group.txt
+++ b/Documentation/scheduler/sched-rt-group.txt
@@ -4,6 +4,7 @@
CONTENTS
========
+0. WARNING
1. Overview
1.1 The problem
1.2 The solution
@@ -14,6 +15,23 @@ CONTENTS
3. Future plans
+0. WARNING
+==========
+
+ Fiddling with these settings can result in an unstable system, the knobs are
+ root only and assumes root knows what he is doing.
+
+Most notable:
+
+ * very small values in sched_rt_period_us can result in an unstable
+ system when the period is smaller than either the available hrtimer
+ resolution, or the time it takes to handle the budget refresh itself.
+
+ * very small values in sched_rt_runtime_us can result in an unstable
+ system when the runtime is so small the system has difficulty making
+ forward progress (NOTE: the migration thread and kstopmachine both
+ are real-time processes).
+
1. Overview
===========
@@ -169,7 +187,7 @@ get their allocated time.
Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
the biggest challenge as the current linux PI infrastructure is geared towards
-the limited static priority levels 0-139. With deadline scheduling you need to
+the limited static priority levels 0-99. With deadline scheduling you need to
do deadline inheritance (since priority is inversely proportional to the
deadline delta (deadline - now).
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index fd9a3e69381..e362f50c496 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -518,9 +518,18 @@ priority with zero (0) being the highest priority and the nice
values starting at 100 (nice -20). Below is a quick chart to map
the kernel priority to user land priorities.
- Kernel priority: 0 to 99 ==> user RT priority 99 to 0
- Kernel priority: 100 to 139 ==> user nice -20 to 19
- Kernel priority: 140 ==> idle task priority
+ Kernel Space User Space
+ ===============================================================
+ 0(high) to 98(low) user RT priority 99(high) to 1(low)
+ with SCHED_RR or SCHED_FIFO
+ ---------------------------------------------------------------
+ 99 sched_priority is not used in scheduling
+ decisions(it must be specified as 0)
+ ---------------------------------------------------------------
+ 100(high) to 139(low) user nice -20(high) to 19(low)
+ ---------------------------------------------------------------
+ 140 idle task priority
+ ---------------------------------------------------------------
The task states are:
diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index e0203662f9e..8da3a795083 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -50,6 +50,10 @@ Protocol 2.08: (Kernel 2.6.26) Added crc32 checksum and ELF format
Protocol 2.09: (Kernel 2.6.26) Added a field of 64-bit physical
pointer to single linked list of struct setup_data.
+Protocol 2.10: (Kernel 2.6.31) Added a protocol for relaxed alignment
+ beyond the kernel_alignment added, new init_size and
+ pref_address fields. Added extended boot loader IDs.
+
**** MEMORY LAYOUT
The traditional memory map for the kernel loader, used for Image or
@@ -168,12 +172,13 @@ Offset Proto Name Meaning
021C/4 2.00+ ramdisk_size initrd size (set by boot loader)
0220/4 2.00+ bootsect_kludge DO NOT USE - for bootsect.S use only
0224/2 2.01+ heap_end_ptr Free memory after setup end
-0226/2 N/A pad1 Unused
+0226/1 2.02+(3 ext_loader_ver Extended boot loader version
+0227/1 2.02+(3 ext_loader_type Extended boot loader ID
0228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line
022C/4 2.03+ ramdisk_max Highest legal initrd address
0230/4 2.05+ kernel_alignment Physical addr alignment required for kernel
0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not
-0235/1 N/A pad2 Unused
+0235/1 2.10+ min_alignment Minimum alignment, as a power of two
0236/2 N/A pad3 Unused
0238/4 2.06+ cmdline_size Maximum size of the kernel command line
023C/4 2.07+ hardware_subarch Hardware subarchitecture
@@ -182,6 +187,8 @@ Offset Proto Name Meaning
024C/4 2.08+ payload_length Length of kernel payload
0250/8 2.09+ setup_data 64-bit physical pointer to linked list
of struct setup_data
+0258/8 2.10+ pref_address Preferred loading address
+0260/4 2.10+ init_size Linear memory required during initialization
(1) For backwards compatibility, if the setup_sects field contains 0, the
real value is 4.
@@ -190,6 +197,8 @@ Offset Proto Name Meaning
field are unusable, which means the size of a bzImage kernel
cannot be determined.
+(3) Ignored, but safe to set, for boot protocols 2.02-2.09.
+
If the "HdrS" (0x53726448) magic number is not found at offset 0x202,
the boot protocol version is "old". Loading an old kernel, the
following parameters should be assumed:
@@ -343,18 +352,32 @@ Protocol: 2.00+
0xTV here, where T is an identifier for the boot loader and V is
a version number. Otherwise, enter 0xFF here.
+ For boot loader IDs above T = 0xD, write T = 0xE to this field and
+ write the extended ID minus 0x10 to the ext_loader_type field.
+ Similarly, the ext_loader_ver field can be used to provide more than
+ four bits for the bootloader version.
+
+ For example, for T = 0x15, V = 0x234, write:
+
+ type_of_loader <- 0xE4
+ ext_loader_type <- 0x05
+ ext_loader_ver <- 0x23
+
Assigned boot loader ids:
0 LILO (0x00 reserved for pre-2.00 bootloader)
1 Loadlin
2 bootsect-loader (0x20, all other values reserved)
- 3 SYSLINUX
- 4 EtherBoot
+ 3 Syslinux
+ 4 Etherboot/gPXE
5 ELILO
7 GRUB
- 8 U-BOOT
+ 8 U-Boot
9 Xen
A Gujin
B Qemu
+ C Arcturus Networks uCbootloader
+ E Extended (see ext_loader_type)
+ F Special (0xFF = undefined)
Please contact <hpa@zytor.com> if you need a bootloader ID
value assigned.
@@ -453,6 +476,35 @@ Protocol: 2.01+
Set this field to the offset (from the beginning of the real-mode
code) of the end of the setup stack/heap, minus 0x0200.
+Field name: ext_loader_ver
+Type: write (optional)
+Offset/size: 0x226/1
+Protocol: 2.02+
+
+ This field is used as an extension of the version number in the
+ type_of_loader field. The total version number is considered to be
+ (type_of_loader & 0x0f) + (ext_loader_ver << 4).
+
+ The use of this field is boot loader specific. If not written, it
+ is zero.
+
+ Kernels prior to 2.6.31 did not recognize this field, but it is safe
+ to write for protocol version 2.02 or higher.
+
+Field name: ext_loader_type
+Type: write (obligatory if (type_of_loader & 0xf0) == 0xe0)
+Offset/size: 0x227/1
+Protocol: 2.02+
+
+ This field is used as an extension of the type number in
+ type_of_loader field. If the type in type_of_loader is 0xE, then
+ the actual type is (ext_loader_type + 0x10).
+
+ This field is ignored if the type in type_of_loader is not 0xE.
+
+ Kernels prior to 2.6.31 did not recognize this field, but it is safe
+ to write for protocol version 2.02 or higher.
+
Field name: cmd_line_ptr
Type: write (obligatory)
Offset/size: 0x228/4
@@ -482,11 +534,19 @@ Protocol: 2.03+
0x37FFFFFF, you can start your ramdisk at 0x37FE0000.)
Field name: kernel_alignment
-Type: read (reloc)
+Type: read/modify (reloc)
Offset/size: 0x230/4
-Protocol: 2.05+
+Protocol: 2.05+ (read), 2.10+ (modify)
+
+ Alignment unit required by the kernel (if relocatable_kernel is
+ true.) A relocatable kernel that is loaded at an alignment
+ incompatible with the value in this field will be realigned during
+ kernel initialization.
- Alignment unit required by the kernel (if relocatable_kernel is true.)
+ Starting with protocol version 2.10, this reflects the kernel
+ alignment preferred for optimal performance; it is possible for the
+ loader to modify this field to permit a lesser alignment. See the
+ min_alignment and pref_address field below.
Field name: relocatable_kernel
Type: read (reloc)
@@ -498,6 +558,22 @@ Protocol: 2.05+
After loading, the boot loader must set the code32_start field to
point to the loaded code, or to a boot loader hook.
+Field name: min_alignment
+Type: read (reloc)
+Offset/size: 0x235/1
+Protocol: 2.10+
+
+ This field, if nonzero, indicates as a power of two the minimum
+ alignment required, as opposed to preferred, by the kernel to boot.
+ If a boot loader makes use of this field, it should update the
+ kernel_alignment field with the alignment unit desired; typically:
+
+ kernel_alignment = 1 << min_alignment
+
+ There may be a considerable performance cost with an excessively
+ misaligned kernel. Therefore, a loader should typically try each
+ power-of-two alignment from kernel_alignment down to this alignment.
+
Field name: cmdline_size
Type: read
Offset/size: 0x238/4
@@ -582,6 +658,36 @@ Protocol: 2.09+
sure to consider the case where the linked list already contains
entries.
+Field name: pref_address
+Type: read (reloc)
+Offset/size: 0x258/8
+Protocol: 2.10+
+
+ This field, if nonzero, represents a preferred load address for the
+ kernel. A relocating bootloader should attempt to load at this
+ address if possible.
+
+ A non-relocatable kernel will unconditionally move itself and to run
+ at this address.
+
+Field name: init_size
+Type: read
+Offset/size: 0x25c/4
+
+ This field indicates the amount of linear contiguous memory starting
+ at the kernel runtime start address that the kernel needs before it
+ is capable of examining its memory map. This is not the same thing
+ as the total amount of memory the kernel needs to boot, but it can
+ be used by a relocating boot loader to help select a safe load
+ address for the kernel.
+
+ The kernel runtime start address is determined by the following algorithm:
+
+ if (relocatable_kernel)
+ runtime_start = align_up(load_address, kernel_alignment)
+ else
+ runtime_start = pref_address
+
**** THE IMAGE CHECKSUM
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 34c13040a71..2db5893d6c9 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -150,11 +150,6 @@ NUMA
Otherwise, the remaining system RAM is allocated to an
additional node.
- numa=hotadd=percent
- Only allow hotadd memory to preallocate page structures upto
- percent of already available memory.
- numa=hotadd=0 will disable hotadd memory.
-
ACPI
acpi=off Don't enable ACPI
diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 29b52b14d0b..d6498e3cd71 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -6,10 +6,11 @@ Virtual memory map with 4 level page tables:
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
hole caused by [48:63] sign extension
ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole
-ffff880000000000 - ffffc0ffffffffff (=57 TB) direct mapping of all phys. memory
-ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole
-ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space
-ffffe20000000000 - ffffe2ffffffffff (=40 bits) virtual memory map (1TB)
+ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
+ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
+ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
+ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
+ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
... unused hole ...
ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0
ffffffffa0000000 - fffffffffff00000 (=1536 MB) module mapping space