aboutsummaryrefslogtreecommitdiff
path: root/include/asm-x86_64
AgeCommit message (Collapse)Author
2006-09-26[PATCH] Replace mp bus array with bitmap for bus not pciAndi Kleen
Since we only support PCI and ISA legacy busses now there is no need to have an full array with checking. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Move early chipset quirks out to new fileAndi Kleen
They did not really belong into io_apic.c. Move them into a new file and clean it up a bit. Also remove outdated ATI quirk that was obsolete, Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Remove obsolete PIC modeAndi Kleen
PIC mode is an outdated way to drive the APICs that was used on some early MP boards. It is not supported in the ACPI model. It is unlikely to be ever configured by any x86-64 system Remove it thus. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Clean up and minor fixes to TLB flushAndi Kleen
- Convert CR* accesses to dedicated inline functions and rewrite the rest as C inlines - Don't do a double flush for global flushes (pointed out by Zach Amsden) This was a bug workaround for old CPUs that don't do 64bit and is obsolete. - Add a proper memory clobber to invlpg - Remove an unused extern Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Remove all ifdefs for local/io apicAndi Kleen
IO-APIC or local APIC can only be disabled at runtime anyways and Kconfig has forced these options on for a long time now. The Kconfigs are kept only now for the benefit of the shared acpi boot.c code. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Add proper alignment to ENTRYAndi Kleen
Previously it didn't align. Use the same one as the C compiler in blended mode, which is good for K8 and Core2 and doesn't hurt on P4. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Clean up read write lock assemblyAndi Kleen
- Move the slow path fallbacks to their own assembly files This makes them much easier to read and is needed for the next change. - Add CFI annotations for unwinding (XXX need review) - Remove constant case which can never happen with out of line spinlocks - Use patchable LOCK prefixes - Don't use lock sections anymore for inline code because they can't be expressed by the unwinder (this adds one taken jump to the lock fast path) Cc: jbeulich@novell.com Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Support patchable lock prefix for pure assembly filesAndi Kleen
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] x86-64 TIF flags for debug regs and io bitmap in ctxswStephane Eranian
Hello, Following my discussion with Andi. Here is a patch that introduces two new TIF flags to simplify the context switch code in __switch_to(). The idea is to minimize the number of cache lines accessed in the common case, i.e., when neither the debug registers nor the I/O bitmap are used. This patch covers the x86-64 modifications. A patch for i386 follows. Changelog: - add TIF_DEBUG to track when debug registers are active - add TIF_IO_BITMAP to track when I/O bitmap is used - modify __switch_to() to use the new TIF flags <signed-off-by>: eranian@hpl.hp.com Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Clean up asm/smp.h includesAndi Kleen
No need to include it from entry.S Drop all the #ifdef __ASSEMBLY__ Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Add the vgetcpu vsyscallVojtech Pavlik
This patch adds a vgetcpu vsyscall, which depending on the CPU RDTSCP capability uses either the RDTSCP or CPUID to obtain a CPU and node numbers and pass them to the program. AK: Lots of changes over Vojtech's original code: Better prototype for vgetcpu() It's better to pass the cpu / node numbers as separate arguments to avoid mistakes when going from SMP to NUMA. Also add a fast time stamp based cache using a user supplied argument to speed things more up. Use fast method from Chuck Ebbert to retrieve node/cpu from GDT limit instead of CPUID Made sure RDTSCP init is always executed after node is known. Drop printk Signed-off-by: Vojtech Pavlik <vojtech@suse.cz> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Add initalization of the RDTSCP auxilliary valuesVojtech Pavlik
This patch adds initalization of the RDTSCP auxilliary values to CPU numbers to time.c. If RDTSCP is available, the MSRs are written with the respective values. It can be later used to initalize per-cpu timekeeping variables. AK: Some cleanups. Move externs into headers and fix CPU hotplug. Signed-off-by: Vojtech Pavlik <vojtech@suse.cz> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Add macros for rdtscpVojtech Pavlik
This patch adds macros for reading tsc via the RDTSCP instruction, as well as writing the auxilliary MSR read by RDTSCP to msr.h [AK: changed rdtscp definition for old binutils] Signed-off-by: Vojtech Pavlik <vojtech@suse.cz> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] x86: i386/x86-64 Add nmi watchdog support for new Intel CPUsVenkatesh Pallipadi
AK: This redoes the changes I temporarily reverted. Intel now has support for Architectural Performance Monitoring Counters ( Refer to IA-32 Intel Architecture Software Developer's Manual http://www.intel.com/design/pentium4/manuals/253669.htm ). This feature is present starting from Intel Core Duo and Intel Core Solo processors. What this means is, the performance monitoring counters and some performance monitoring events are now defined in an architectural way (using cpuid). And there will be no need to check for family/model etc for these architectural events. Below is the patch to use this performance counters in nmi watchdog driver. Patch handles both i386 and x86-64 kernels. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Fix up panic messages for different NMI panicsAndi Kleen
When a unknown NMI happened the panic would claim a NMI watchdog timeout. Also it would check the variable set by nmi_watchdog=panic and panic then. Fix up the panic message to be generic Unconditionally panic on unknown NMI when panic on unknown nmi is enabled. Noticed by Jan Beulich Cc: jbeulich@novell.com Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386/x86-64: Fix NMI watchdog suspend/resumeShaohua Li
Making NMI suspend/resume work with SMP. We use CPU hotplug to offline APs in SMP suspend/resume. Only BSP executes sysdev's .suspend/.resume method. APs should follow CPU hotplug code path. And: +From: Don Zickus <dzickus@redhat.com> Makes the start/stop paths of nmi watchdog more robust to handle the suspend/resume cases more gracefully. AK: I merged the two patches together Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Don Zickus <dzickus@redhat.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26[PATCH] x86: Add abilty to enable/disable nmi watchdog with sysctlDon Zickus
Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi watchdog. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386/x86-64: Remove un/set_nmi_callback and ↵Don Zickus
reserve/release_lapic_nmi functions Removes the un/set_nmi_callback and reserve/release_lapic_nmi functions as they are no longer needed. The various subsystems are modified to register with the die_notifier instead. Also includes compile fixes by Andrew Morton. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Add ppoll/pselect syscallsAndi Kleen
Needed TIF_RESTORE_SIGMASK first Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Add TIF_RESTORE_SIGMASKAndi Kleen
We need TIF_RESTORE_SIGMASK in order to support ppoll() and pselect() system calls. This patch originally came from Andi, and was based heavily on David Howells' implementation of same on i386. I fixed a typo which was causing do_signal() to use the wrong signal mask. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] x86: Cleanup NMI interrupt pathDon Zickus
This patch cleans up the NMI interrupt path. Instead of being gated by if the 'nmi callback' is set, the interrupt handler now calls everyone who is registered on the die_chain and additionally checks the nmi watchdog, reseting it if enabled. This allows more subsystems to hook into the NMI if they need to (without being block by set_nmi_callback). Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Add SMP support on x86_64 to reservation frameworkDon Zickus
This patch includes the changes to make the nmi watchdog on x86_64 SMP aware. A bunch of code was moved around to make it simpler to read. In addition, it is now possible to determine if a particular NMI was the result of the watchdog or not. This feature allows the kernel to filter out unknown NMIs easier. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] x86: Add performance counter reservation framework for UP kernelsDon Zickus
Adds basic infrastructure to allow subsystems to reserve performance counters on the x86 chips. Only UP kernels are supported in this patch to make reviewing easier. The SMP portion makes a lot more changes. Think of this as a locking mechanism where each bit represents a different counter. In addition, each subsystem should also reserve an appropriate event selection register that will correspond to the performance counter it will be using (this is mainly neccessary for the Pentium 4 chips as they break the 1:1 relationship to performance counters). This will help prevent subsystems like oprofile from interfering with the nmi watchdog. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] x86: Temporarily revert parts of the Core 2 nmi nmi watchdog supportAndi Kleen
This makes merging easier. They are readded a few patches later. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-25[libata] No need for all those arch libata-portmap.h headersJeff Garzik
They all contain the same thing. Instead, have a single generic one in include/asm-generic, and permit an arch to override as needed. Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-09-24Merge branch 'master' into upstreamJeff Garzik
2006-09-19[HEADERS] One line per header in Kbuild files to reduce conflictsDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-09-19Merge branch 'master' into upstreamJeff Garzik
2006-09-16[PATCH] Fix 'make headers_check' on x86_64David Woodhouse
On Tue, 2006-09-12 at 17:44 +0100, David Woodhouse wrote: > asm-x86_64/elf.h requires asm/processor.h, which does not exist > asm-x86_64/signal.h requires linux/linkage.h, which does not exist > asm-x86_64/unistd.h requires linux/linkage.h, which does not exist > asm-x86_64/vsyscall.h requires linux/seqlock.h, which does not exist Again, move stuff which shouldn't be visible inside (mostly already existing) #ifdef __KERNEL__. This fixes a bunch of mislabelled and unlabelled #endifs in unistd.h and also cleans that up to conform with what's visible on other architectures, since the minimal fix for the error reported about would have involved a more intrusive patch, renesting other ifdefs. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-04Merge branch 'master' into upstreamJeff Garzik
2006-08-30[PATCH] x86_64: Save original IST values for checking stack addressesKeith Owens
The values in init_tss.ist[] can change when an IST event occurs. Save the original IST values for checking stack addresses when debugging or doing stack traces. Signed-off-by: Keith Owens <kaos@ocs.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-30[PATCH] x86_64: Remove __KERNEL__ ifdef around _syscall*()Andi Kleen
After all their only point is having them in user space. On x86-64 they don't even work in kernel space. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-30[PATCH] x86_64: Remove alternative_smpAndi Kleen
The .fill causes miscompilations with some binutils version. Instead just patch the lock prefix in the lock constructs. That is the majority of the cost and should be good enough. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-30[PATCH] x86: Make backtracer fallback logic more bullet-proofJan Beulich
The unwinder fallback logic still had potential for falling through to the legacy stack trace code without printing an indication (at once serving as a separator) of this. Further, the stack pointer retrieval for the fallback should be as restrictive as possible (in order to avoid having the legacy stack tracer try to access invalid memory). The patch tightens that, but this could certainly be further improved. Also making the call_trace command line option now conditional upon CONFIG_STACK_UNWIND (as it's meaningless otherwise). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-30[PATCH] x86: fix x86 cpuid keys used in alternative_smp()Jan Beulich
By hard-coding the cpuid keys for alternative_smp() rather than using the symbolic constant it turned out that incorrect values were used on both i386 (0x68 instead of 0x69) and x86-64 (0x66 instead of 0x68). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-10[PATCH] libata: rework legacy handling to remove much of the cruftAlan Cox
Kill host_set->next Fix simplex support Allow per platform setting of IDE legacy bases Some of this can be tidied further later on, in particular all the legacy port gunge belongs as a PCI quirk/PCI header decode to understand the special legacy IDE rules in the PCI spec. Longer term Jeff also wants to move the request_irq/free_irq out of core which will make this even cleaner. tj: folded in three followup patches - ata_piix-fix, broken-arch-fix and fix-new-legacy-handling, and separated per-dev xfermask into separate patch preceding this one. Folded in fixes are... * ata_piix-fix: fix build failure due to host_set->next removal * broken-arch-fix: add missing include/asm-*/libata-portmap.h * fix-new-legacy-handling: * In ata_pci_init_legacy_port(), probe_num was incorrectly incremented during initialization of the secondary port and probe_ent->n_ports was incorrectly fixed to 1. * Both legacy ports ended up having the same hard_port_no. * When printing port information, both legacy ports printed the first irq. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Tejun Heo <htejun@gmail.com>
2006-07-31[PATCH] IA64: kprobe invalidate icache of jump bufferbibo, mao
Kprobe inserts breakpoint instruction in probepoint and then jumps to instruction slot when breakpoint is hit, the instruction slot icache must be consistent with dcache. Here is the patch which invalidates instruction slot icache area. Without this patch, in some machines there will be fault when executing instruction slot where icache content is inconsistent with dcache. Signed-off-by: bibo,mao <bibo.mao@intel.com> Acked-by: "Luck, Tony" <tony.luck@intel.com> Acked-by: Keshavamurthy Anil S <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-29[PATCH] x86_64: Fix swiotlb=forceAndi Kleen
It was broken before. But having it is important as possible hardware bug workaround. And previously there was no way to force swiotlb if there is another IOMMU. Side effect is that iommu=force won't force swiotlb anymore even if there isn't another IOMMU. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-29[PATCH] x86_64: Calgary IOMMU - Multi-Node NULL pointer dereference fixJon Mason
Calgary hits a NULL pointer dereference when booting in a multi-chassis NUMA system. See Redhat bugzilla number 198498, found by Konrad Rzeszutek (konradr@redhat.com). There are many issues that had to be resolved to fix this problem. Firstly when I originally wrote the code to handle NUMA systems, I had a large misunderstanding that was not corrected until now. That was that I thought the "number of nodes online" referred to number of physical systems connected. So that if NUMA was disabled, there would only be 1 node and it would only show that node's PCI bus. In reality if NUMA is disabled, the system displays all of the connected chassis as one node but is only ignorant of the delays in accessing main memory. Therefore, references to num_online_nodes() and MAX_NUMNODES are incorrect and need to be set to the maximum number of nodes that can be accessed (which are 8). I created a variable, MAX_NUM_CHASSIS, and set it to 8 to fix this. Secondly, when walking the PCI in detect_calgary, the code only checked the first "slot" when looking to see if a device is present. This will work for most cases, but unfortunately it isn't always the case. In the NUMA MXE drawers, there are USB devices present on the 3rd slot (with slot 1 being empty). So, to work around this, all slots (up to 8) are scanned to see if there are any devices present. Lastly, the bus is being enumerated on large systems in a different way the we originally thought. This throws the ugly logic we had out the window. To more elegantly handle this, I reorganized the kva array to be sparse (which removed the need to have any bus number to kva slot logic in tce.c) and created a secondary space array to contain the bus number to phb mapping. With these changes Calgary boots on an x460 with 4 nodes with and without NUMA enabled. Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-28[PATCH] x86_64: Enlarge debug stack for nested kprobesbibo mao
In x86_64 platform, INT1 and INT3 trap stack is IST stack called DEBUG_STACK, when INT1/INT3 trap happens, system will switch to DEBUG_STACK by hardware. Current DEBUG_STACK size is 4K, when int1/int3 trap happens, kernel will minus current DEBUG_STACK IST value by 4k. But if int3/int1 trap is nested, it will destroy other vector's IST stack. This patch modifies this, it sets DEBUG_STACK size as 8K and allows two level of nested int1/int3 trap. Kprobe DEBUG_STACK may be nested, because kprobe handler may be probed by other kprobes. Thanks jbeulich for pointing out error in the first patch. [AK: nested kprobes are pretty dubious. Hopefully one nest will be enough. This will cost 8K per CPU (4K more than before)] Signed-off-by: bibo, mao <bibo.mao@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14[PATCH] remove set_wmb - arch removalSteven Rostedt
set_wmb should not be used in the kernel because it just confuses the code more and has no benefit. Since it is not currently used in the kernel this patch removes it so that new code does not include it. All archs define set_wmb(var, value) to do { var = value; wmb(); } while(0) except ia64 and sparc which use a mb() instead. But this is still moot since it is not used anyway. Hasn't been tested on any archs but x86 and x86_64 (and only compiled tested) Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] x86_64: Fix Calgary copyright statements per IBM guidelinesMuli Ben-Yehuda
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-04Merge git://git.infradead.org/hdrinstall-2.6Linus Torvalds
* git://git.infradead.org/hdrinstall-2.6: Remove export of include/linux/isdn/tpam.h Remove <linux/i2c-id.h> and <linux/i2c-algo-ite.h> from userspace export Restrict headers exported to userspace for SPARC and SPARC64 Add empty Kbuild files for 'make headers_install' in remaining arches. Add Kbuild file for Alpha 'make headers_install' Add Kbuild file for SPARC 'make headers_install' Add Kbuild file for IA64 'make headers_install' Add Kbuild file for S390 'make headers_install' Add Kbuild file for i386 'make headers_install' Add Kbuild file for x86_64 'make headers_install' Add Kbuild file for PowerPC 'make headers_install' Add generic Kbuild files for 'make headers_install' Basic implementation of 'make headers_check' Basic implementation of 'make headers_install'
2006-07-03[PATCH] lockdep: irqtrace cleanup of include/asm-x86_64/irqflags.hIngo Molnar
Clean up the x86-64 irqflags.h file: - macro => inline function transformation - simplifications - style fixes Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03[PATCH] lockdep: irqtrace subsystem, x86_64 supportIngo Molnar
Add irqflags-tracing support to x86_64. [akpm@osdl.org: build fix] Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03[PATCH] lockdep: beautify x86_64 stacktracesIngo Molnar
Beautify x86_64 stacktraces to be more readable. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03[PATCH] lockdep: add per_cpu_offset()Ingo Molnar
Add the per_cpu_offset() generic method. (used by the lock validator) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-02[PATCH] irq-flags: x86_64: Use the new IRQF_ constantsThomas Gleixner
Use the new IRQF_ constants and remove the SA_INTERRUPT define Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01[PATCH] SMP alternatives: skip with UP kernelsGerd Hoffmann
Hide the magic in alternative.h and provide some dummy inline functions for the UP case (gcc should manage to optimize away these calls). No changes in module.c. Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[AF_UNIX]: Datagram getpeersecCatherine Zhang
This patch implements an API whereby an application can determine the label of its peer's Unix datagram sockets via the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of the peer of a Unix datagram socket. The application can then use this security context to determine the security context for processing on behalf of the peer who sent the packet. Patch design and implementation: The design and implementation is very similar to the UDP case for INET sockets. Basically we build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). To retrieve the security context, the application first indicates to the kernel such desire by setting the SO_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for Unix datagram socket should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len); recvmsg(sockfd, &msg_hdr, 0); if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr); if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) && cmsg_hdr->cmsg_level == SOL_SOCKET && cmsg_hdr->cmsg_type == SCM_SECURITY) { memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow a server socket to receive security context of the peer. Testing: We have tested the patch by setting up Unix datagram client and server applications. We verified that the server can retrieve the security context using the auxiliary data mechanism of recvmsg. Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com> Acked-by: Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>