diff options
author | Jeff Garzik <jgarzik@pobox.com> | 2005-08-29 16:40:27 -0400 |
---|---|---|
committer | Jeff Garzik <jgarzik@pobox.com> | 2005-08-29 16:40:27 -0400 |
commit | c1b054d03f5b31c33eaa0b267c629b118eaf3790 (patch) | |
tree | 9333907ca767be24fcb3667877242976c3e3c8dd /Documentation | |
parent | 559fb51ba7e66fe298b8355fabde1275b7def35f (diff) | |
parent | bf4e70e54cf31dcca48d279c7f7e71328eebe749 (diff) |
Merge /spare/repo/linux-2.6/
Diffstat (limited to 'Documentation')
109 files changed, 4589 insertions, 2149 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 8de8a01a247..f28a24e0279 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX @@ -138,6 +138,8 @@ java.txt - info on the in-kernel binary support for Java(tm). kbuild/ - directory with info about the kernel build process. +kdumpt.txt + - mini HowTo on getting the crash dump code to work. kernel-doc-nano-HOWTO.txt - mini HowTo on generation and location of kernel documentation files. kernel-docs.txt diff --git a/Documentation/Changes b/Documentation/Changes index 57542bc25ed..5eaab0441d7 100644 --- a/Documentation/Changes +++ b/Documentation/Changes @@ -44,9 +44,9 @@ running, the suggested command should tell you. Again, keep in mind that this list assumes you are already functionally running a Linux 2.4 kernel. Also, not all tools are -necessary on all systems; obviously, if you don't have any PCMCIA (PC -Card) hardware, for example, you probably needn't concern yourself -with pcmcia-cs. +necessary on all systems; obviously, if you don't have any ISDN +hardware, for example, you probably needn't concern yourself with +isdn4k-utils. o Gnu C 2.95.3 # gcc --version o Gnu make 3.79.1 # make --version @@ -57,13 +57,15 @@ o e2fsprogs 1.29 # tune2fs o jfsutils 1.1.3 # fsck.jfs -V o reiserfsprogs 3.6.3 # reiserfsck -V 2>&1|grep reiserfsprogs o xfsprogs 2.6.0 # xfs_db -V +o pcmciautils 004 o pcmcia-cs 3.1.21 # cardmgr -V o quota-tools 3.09 # quota -V o PPP 2.4.0 # pppd --version o isdn4k-utils 3.1pre1 # isdnctrl 2>&1|grep version o nfs-utils 1.0.5 # showmount --version o procps 3.2.0 # ps --version -o oprofile 0.5.3 # oprofiled --version +o oprofile 0.9 # oprofiled --version +o udev 058 # udevinfo -V Kernel compilation ================== @@ -186,13 +188,20 @@ architecture independent and any version from 2.0.0 onward should work correctly with this version of the XFS kernel code (2.6.0 or later is recommended, due to some significant improvements). +PCMCIAutils +----------- + +PCMCIAutils replaces pcmcia-cs (see below). It properly sets up +PCMCIA sockets at system startup and loads the appropriate modules +for 16-bit PCMCIA devices if the kernel is modularized and the hotplug +subsystem is used. Pcmcia-cs --------- PCMCIA (PC Card) support is now partially implemented in the main -kernel source. Pay attention when you recompile your kernel ;-). -Also, be sure to upgrade to the latest pcmcia-cs release. +kernel source. The "pcmciautils" package (see above) replaces pcmcia-cs +for newest kernels. Quota-tools ----------- @@ -349,9 +358,13 @@ Xfsprogs -------- o <ftp://oss.sgi.com/projects/xfs/download/> +Pcmciautils +----------- +o <ftp://ftp.kernel.org/pub/linux/utils/kernel/pcmcia/> + Pcmcia-cs --------- -o <ftp://pcmcia-cs.sourceforge.net/pub/pcmcia-cs/pcmcia-cs-3.1.21.tar.gz> +o <http://pcmcia-cs.sourceforge.net/> Quota-tools ---------- diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 87da3478fad..fa3e29ad8a4 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -49,7 +49,7 @@ installmandocs: mandocs KERNELDOC = scripts/kernel-doc DOCPROC = scripts/basic/docproc -XMLTOFLAGS = -m Documentation/DocBook/stylesheet.xsl +XMLTOFLAGS = -m $(srctree)/Documentation/DocBook/stylesheet.xsl #XMLTOFLAGS += --skip-validation ### diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index bb6a0106be1..d650ce36485 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl @@ -266,7 +266,7 @@ X!Ekernel/module.c <chapter id="hardware"> <title>Hardware Interfaces</title> <sect1><title>Interrupt Handling</title> -!Iarch/i386/kernel/irq.c +!Ikernel/irq/manage.c </sect1> <sect1><title>Resources Management</title> diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index 6df1dfd18b6..375ae760dc1 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl @@ -84,6 +84,14 @@ void (*port_disable) (struct ata_port *); Called from ata_bus_probe() and ata_bus_reset() error paths, as well as when unregistering from the SCSI module (rmmod, hot unplug). + This function should do whatever needs to be done to take the + port out of use. In most cases, ata_port_disable() can be used + as this hook. + </para> + <para> + Called from ata_bus_probe() on a failed probe. + Called from ata_bus_reset() on a failed bus reset. + Called from ata_scsi_release(). </para> </sect2> @@ -98,6 +106,13 @@ void (*dev_config) (struct ata_port *, struct ata_device *); found. Typically used to apply device-specific fixups prior to issue of SET FEATURES - XFER MODE, and prior to operation. </para> + <para> + Called by ata_device_add() after ata_dev_identify() determines + a device is present. + </para> + <para> + This entry may be specified as NULL in ata_port_operations. + </para> </sect2> @@ -135,6 +150,8 @@ void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf); registers / DMA buffers. ->tf_read() is called to read the hardware registers / DMA buffers, to obtain the current set of taskfile register values. + Most drivers for taskfile-based hardware (PIO or MMIO) use + ata_tf_load() and ata_tf_read() for these hooks. </para> </sect2> @@ -147,6 +164,8 @@ void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf); <para> causes an ATA command, previously loaded with ->tf_load(), to be initiated in hardware. + Most drivers for taskfile-based hardware use ata_exec_command() + for this hook. </para> </sect2> @@ -161,6 +180,10 @@ Allow low-level driver to filter ATA PACKET commands, returning a status indicating whether or not it is OK to use DMA for the supplied PACKET command. </para> + <para> + This hook may be specified as NULL, in which case libata will + assume that atapi dma can be supported. + </para> </sect2> @@ -175,6 +198,14 @@ u8 (*check_err)(struct ata_port *ap); Reads the Status/AltStatus/Error ATA shadow register from hardware. On some hardware, reading the Status register has the side effect of clearing the interrupt condition. + Most drivers for taskfile-based hardware use + ata_check_status() for this hook. + </para> + <para> + Note that because this is called from ata_device_add(), at + least a dummy function that clears device interrupts must be + provided for all drivers, even if the controller doesn't + actually have a taskfile status register. </para> </sect2> @@ -188,7 +219,13 @@ void (*dev_select)(struct ata_port *ap, unsigned int device); Issues the low-level hardware command(s) that causes one of N hardware devices to be considered 'selected' (active and available for use) on the ATA bus. This generally has no -meaning on FIS-based devices. + meaning on FIS-based devices. + </para> + <para> + Most drivers for taskfile-based hardware use + ata_std_dev_select() for this hook. Controllers which do not + support second drives on a port (such as SATA contollers) will + use ata_noop_dev_select(). </para> </sect2> @@ -204,6 +241,8 @@ void (*phy_reset) (struct ata_port *ap); for device presence (PATA and SATA), typically a soft reset (SRST) will be performed. Drivers typically use the helper functions ata_bus_reset() or sata_phy_reset() for this hook. + Many SATA drivers use sata_phy_reset() or call it from within + their own phy_reset() functions. </para> </sect2> @@ -227,6 +266,25 @@ PCI IDE DMA Status register. These hooks are typically either no-ops, or simply not implemented, in FIS-based drivers. </para> + <para> +Most legacy IDE drivers use ata_bmdma_setup() for the bmdma_setup() +hook. ata_bmdma_setup() will write the pointer to the PRD table to +the IDE PRD Table Address register, enable DMA in the DMA Command +register, and call exec_command() to begin the transfer. + </para> + <para> +Most legacy IDE drivers use ata_bmdma_start() for the bmdma_start() +hook. ata_bmdma_start() will write the ATA_DMA_START flag to the DMA +Command register. + </para> + <para> +Many legacy IDE drivers use ata_bmdma_stop() for the bmdma_stop() +hook. ata_bmdma_stop() clears the ATA_DMA_START flag in the DMA +command register. + </para> + <para> +Many legacy IDE drivers use ata_bmdma_status() as the bmdma_status() hook. + </para> </sect2> @@ -250,6 +308,10 @@ int (*qc_issue) (struct ata_queued_cmd *qc); helper function ata_qc_issue_prot() for taskfile protocol-based dispatch. More advanced drivers implement their own ->qc_issue. </para> + <para> + ata_qc_issue_prot() calls ->tf_load(), ->bmdma_setup(), and + ->bmdma_start() as necessary to initiate a transfer. + </para> </sect2> @@ -279,6 +341,21 @@ void (*irq_clear) (struct ata_port *); before the interrupt handler is registered, to be sure hardware is quiet. </para> + <para> + The second argument, dev_instance, should be cast to a pointer + to struct ata_host_set. + </para> + <para> + Most legacy IDE drivers use ata_interrupt() for the + irq_handler hook, which scans all ports in the host_set, + determines which queued command was active (if any), and calls + ata_host_intr(ap,qc). + </para> + <para> + Most legacy IDE drivers use ata_bmdma_irq_clear() for the + irq_clear() hook, which simply clears the interrupt and error + flags in the DMA status register. + </para> </sect2> @@ -292,6 +369,7 @@ void (*scr_write) (struct ata_port *ap, unsigned int sc_reg, <para> Read and write standard SATA phy registers. Currently only used if ->phy_reset hook called the sata_phy_reset() helper function. + sc_reg is one of SCR_STATUS, SCR_CONTROL, SCR_ERROR, or SCR_ACTIVE. </para> </sect2> @@ -307,17 +385,29 @@ void (*host_stop) (struct ata_host_set *host_set); ->port_start() is called just after the data structures for each port are initialized. Typically this is used to alloc per-port DMA buffers / tables / rings, enable DMA engines, and similar - tasks. + tasks. Some drivers also use this entry point as a chance to + allocate driver-private memory for ap->private_data. + </para> + <para> + Many drivers use ata_port_start() as this hook or call + it from their own port_start() hooks. ata_port_start() + allocates space for a legacy IDE PRD table and returns. </para> <para> ->port_stop() is called after ->host_stop(). It's sole function is to release DMA/memory resources, now that they are no longer - actively being used. + actively being used. Many drivers also free driver-private + data from port at this time. + </para> + <para> + Many drivers use ata_port_stop() as this hook, which frees the + PRD table. </para> <para> ->host_stop() is called after all ->port_stop() calls have completed. The hook must finalize hardware shutdown, release DMA and other resources, etc. + This hook may be specified as NULL, in which case it is not called. </para> </sect2> diff --git a/Documentation/DocBook/stylesheet.xsl b/Documentation/DocBook/stylesheet.xsl index e14c21dda40..64be9f7ee3b 100644 --- a/Documentation/DocBook/stylesheet.xsl +++ b/Documentation/DocBook/stylesheet.xsl @@ -2,4 +2,5 @@ <stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="1.0"> <param name="chunk.quietly">1</param> <param name="funcsynopsis.style">ansi</param> +<param name="funcsynopsis.tabular.threshold">80</param> </stylesheet> diff --git a/Documentation/IPMI.txt b/Documentation/IPMI.txt index 90d10e708ca..84d3d4d10c1 100644 --- a/Documentation/IPMI.txt +++ b/Documentation/IPMI.txt @@ -25,9 +25,10 @@ subject and I can't cover it all here! Configuration ------------- -The LinuxIPMI driver is modular, which means you have to pick several +The Linux IPMI driver is modular, which means you have to pick several things to have it work right depending on your hardware. Most of -these are available in the 'Character Devices' menu. +these are available in the 'Character Devices' menu then the IPMI +menu. No matter what, you must pick 'IPMI top-level message handler' to use IPMI. What you do beyond that depends on your needs and hardware. @@ -35,33 +36,30 @@ IPMI. What you do beyond that depends on your needs and hardware. The message handler does not provide any user-level interfaces. Kernel code (like the watchdog) can still use it. If you need access from userland, you need to select 'Device interface for IPMI' if you -want access through a device driver. Another interface is also -available, you may select 'IPMI sockets' in the 'Networking Support' -main menu. This provides a socket interface to IPMI. You may select -both of these at the same time, they will both work together. - -The driver interface depends on your hardware. If you have a board -with a standard interface (These will generally be either "KCS", -"SMIC", or "BT", consult your hardware manual), choose the 'IPMI SI -handler' option. A driver also exists for direct I2C access to the -IPMI management controller. Some boards support this, but it is -unknown if it will work on every board. For this, choose 'IPMI SMBus -handler', but be ready to try to do some figuring to see if it will -work. - -There is also a KCS-only driver interface supplied, but it is -depracated in favor of the SI interface. +want access through a device driver. + +The driver interface depends on your hardware. If your system +properly provides the SMBIOS info for IPMI, the driver will detect it +and just work. If you have a board with a standard interface (These +will generally be either "KCS", "SMIC", or "BT", consult your hardware +manual), choose the 'IPMI SI handler' option. A driver also exists +for direct I2C access to the IPMI management controller. Some boards +support this, but it is unknown if it will work on every board. For +this, choose 'IPMI SMBus handler', but be ready to try to do some +figuring to see if it will work on your system if the SMBIOS/APCI +information is wrong or not present. It is fairly safe to have both +these enabled and let the drivers auto-detect what is present. You should generally enable ACPI on your system, as systems with IPMI -should have ACPI tables describing them. +can have ACPI tables describing them. If you have a standard interface and the board manufacturer has done their job correctly, the IPMI controller should be automatically -detect (via ACPI or SMBIOS tables) and should just work. Sadly, many -boards do not have this information. The driver attempts standard -defaults, but they may not work. If you fall into this situation, you -need to read the section below named 'The SI Driver' on how to -hand-configure your system. +detected (via ACPI or SMBIOS tables) and should just work. Sadly, +many boards do not have this information. The driver attempts +standard defaults, but they may not work. If you fall into this +situation, you need to read the section below named 'The SI Driver' or +"The SMBus Driver" on how to hand-configure your system. IPMI defines a standard watchdog timer. You can enable this with the 'IPMI Watchdog Timer' config option. If you compile the driver into @@ -73,6 +71,18 @@ closed (by default it is disabled on close). Go into the 'Watchdog Cards' menu, enable 'Watchdog Timer Support', and enable the option 'Disable watchdog shutdown on close'. +IPMI systems can often be powered off using IPMI commands. Select +'IPMI Poweroff' to do this. The driver will auto-detect if the system +can be powered off by IPMI. It is safe to enable this even if your +system doesn't support this option. This works on ATCA systems, the +Radisys CPI1 card, and any IPMI system that supports standard chassis +management commands. + +If you want the driver to put an event into the event log on a panic, +enable the 'Generate a panic event to all BMCs on a panic' option. If +you want the whole panic string put into the event log using OEM +events, enable the 'Generate OEM events containing the panic string' +option. Basic Design ------------ @@ -80,7 +90,7 @@ Basic Design The Linux IPMI driver is designed to be very modular and flexible, you only need to take the pieces you need and you can use it in many different ways. Because of that, it's broken into many chunks of -code. These chunks are: +code. These chunks (by module name) are: ipmi_msghandler - This is the central piece of software for the IPMI system. It handles all messages, message timing, and responses. The @@ -93,18 +103,26 @@ ipmi_devintf - This provides a userland IOCTL interface for the IPMI driver, each open file for this device ties in to the message handler as an IPMI user. -ipmi_si - A driver for various system interfaces. This supports -KCS, SMIC, and may support BT in the future. Unless you have your own -custom interface, you probably need to use this. +ipmi_si - A driver for various system interfaces. This supports KCS, +SMIC, and BT interfaces. Unless you have an SMBus interface or your +own custom interface, you probably need to use this. ipmi_smb - A driver for accessing BMCs on the SMBus. It uses the I2C kernel driver's SMBus interfaces to send and receive IPMI messages over the SMBus. -af_ipmi - A network socket interface to IPMI. This doesn't take up -a character device in your system. +ipmi_watchdog - IPMI requires systems to have a very capable watchdog +timer. This driver implements the standard Linux watchdog timer +interface on top of the IPMI message handler. + +ipmi_poweroff - Some systems support the ability to be turned off via +IPMI commands. -Note that the KCS-only interface ahs been removed. +These are all individually selectable via configuration options. + +Note that the KCS-only interface has been removed. The af_ipmi driver +is no longer supported and has been removed because it was impossible +to do 32 bit emulation on 64-bit kernels with it. Much documentation for the interface is in the include files. The IPMI include files are: @@ -424,7 +442,7 @@ at module load time (for a module) with: modprobe ipmi_smb.o addr=<adapter1>,<i2caddr1>[,<adapter2>,<i2caddr2>[,...]] dbg=<flags1>,<flags2>... - [defaultprobe=0] [dbg_probe=1] + [defaultprobe=1] [dbg_probe=1] The addresses are specified in pairs, the first is the adapter ID and the second is the I2C address on that adapter. @@ -532,3 +550,67 @@ Once you open the watchdog timer, you must write a 'V' character to the device to close it, or the timer will not stop. This is a new semantic for the driver, but makes it consistent with the rest of the watchdog drivers in Linux. + + +Panic Timeouts +-------------- + +The OpenIPMI driver supports the ability to put semi-custom and custom +events in the system event log if a panic occurs. if you enable the +'Generate a panic event to all BMCs on a panic' option, you will get +one event on a panic in a standard IPMI event format. If you enable +the 'Generate OEM events containing the panic string' option, you will +also get a bunch of OEM events holding the panic string. + + +The field settings of the events are: +* Generator ID: 0x21 (kernel) +* EvM Rev: 0x03 (this event is formatting in IPMI 1.0 format) +* Sensor Type: 0x20 (OS critical stop sensor) +* Sensor #: The first byte of the panic string (0 if no panic string) +* Event Dir | Event Type: 0x6f (Assertion, sensor-specific event info) +* Event Data 1: 0xa1 (Runtime stop in OEM bytes 2 and 3) +* Event data 2: second byte of panic string +* Event data 3: third byte of panic string +See the IPMI spec for the details of the event layout. This event is +always sent to the local management controller. It will handle routing +the message to the right place + +Other OEM events have the following format: +Record ID (bytes 0-1): Set by the SEL. +Record type (byte 2): 0xf0 (OEM non-timestamped) +byte 3: The slave address of the card saving the panic +byte 4: A sequence number (starting at zero) +The rest of the bytes (11 bytes) are the panic string. If the panic string +is longer than 11 bytes, multiple messages will be sent with increasing +sequence numbers. + +Because you cannot send OEM events using the standard interface, this +function will attempt to find an SEL and add the events there. It +will first query the capabilities of the local management controller. +If it has an SEL, then they will be stored in the SEL of the local +management controller. If not, and the local management controller is +an event generator, the event receiver from the local management +controller will be queried and the events sent to the SEL on that +device. Otherwise, the events go nowhere since there is nowhere to +send them. + + +Poweroff +-------- + +If the poweroff capability is selected, the IPMI driver will install +a shutdown function into the standard poweroff function pointer. This +is in the ipmi_poweroff module. When the system requests a powerdown, +it will send the proper IPMI commands to do this. This is supported on +several platforms. + +There is a module parameter named "poweroff_control" that may either be zero +(do a power down) or 2 (do a power cycle, power the system off, then power +it on in a few seconds). Setting ipmi_poweroff.poweroff_control=x will do +the same thing on the kernel command line. The parameter is also available +via the proc filesystem in /proc/ipmi/poweroff_control. Note that if the +system does not support power cycling, it will always to the power off. + +Note that if you have ACPI enabled, the system will prefer using ACPI to +power off. diff --git a/Documentation/SubmittingDrivers b/Documentation/SubmittingDrivers index de3b252e717..c3cca924e94 100644 --- a/Documentation/SubmittingDrivers +++ b/Documentation/SubmittingDrivers @@ -13,13 +13,14 @@ Allocating Device Numbers ------------------------- Major and minor numbers for block and character devices are allocated -by the Linux assigned name and number authority (currently better -known as H Peter Anvin). The site is http://www.lanana.org/. This +by the Linux assigned name and number authority (currently this is +Torben Mathiasen). The site is http://www.lanana.org/. This also deals with allocating numbers for devices that are not going to be submitted to the mainstream kernel. +See Documentation/devices.txt for more information on this. -If you don't use assigned numbers then when you device is submitted it will -get given an assigned number even if that is different from values you may +If you don't use assigned numbers then when your device is submitted it will +be given an assigned number even if that is different from values you may have shipped to customers before. Who To Submit Drivers To @@ -32,7 +33,8 @@ Linux 2.2: If the code area has a general maintainer then please submit it to the maintainer listed in MAINTAINERS in the kernel file. If the maintainer does not respond or you cannot find the appropriate - maintainer then please contact Alan Cox <alan@lxorguk.ukuu.org.uk> + maintainer then please contact the 2.2 kernel maintainer: + Marc-Christian Petersen <m.c.p@wolk-project.de>. Linux 2.4: The same rules apply as 2.2. The final contact point for Linux 2.4 @@ -48,7 +50,7 @@ What Criteria Determine Acceptance Licensing: The code must be released to us under the GNU General Public License. We don't insist on any kind - of exclusively GPL licensing, and if you wish the driver + of exclusive GPL licensing, and if you wish the driver to be useful to other communities such as BSD you may well wish to release under multiple licenses. diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 4d35562b1cf..7f43b040311 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -35,7 +35,7 @@ not in any lower subdirectory. To create a patch for a single file, it is often sufficient to do: - SRCTREE= linux-2.4 + SRCTREE= linux-2.6 MYFILE= drivers/net/mydriver.c cd $SRCTREE @@ -48,17 +48,18 @@ To create a patch for multiple files, you should unpack a "vanilla", or unmodified kernel source tree, and generate a diff against your own source tree. For example: - MYSRC= /devel/linux-2.4 + MYSRC= /devel/linux-2.6 - tar xvfz linux-2.4.0-test11.tar.gz - mv linux linux-vanilla - wget http://www.moses.uklinux.net/patches/dontdiff - diff -uprN -X dontdiff linux-vanilla $MYSRC > /tmp/patch - rm -f dontdiff + tar xvfz linux-2.6.12.tar.gz + mv linux-2.6.12 linux-2.6.12-vanilla + diff -uprN -X linux-2.6.12-vanilla/Documentation/dontdiff \ + linux-2.6.12-vanilla $MYSRC > /tmp/patch "dontdiff" is a list of files which are generated by the kernel during the build process, and should be ignored in any diff(1)-generated -patch. dontdiff is maintained by Tigran Aivazian <tigran@veritas.com> +patch. The "dontdiff" file is included in the kernel tree in +2.6.12 and later. For earlier kernel versions, you can get it +from <http://www.xenotime.net/linux/doc/dontdiff>. Make sure your patch does not include any extra files which do not belong in a patch submission. Make sure to review your patch -after- @@ -66,18 +67,20 @@ generated it with diff(1), to ensure accuracy. If your changes produce a lot of deltas, you may want to look into splitting them into individual patches which modify things in -logical stages, this will facilitate easier reviewing by other +logical stages. This will facilitate easier reviewing by other kernel developers, very important if you want your patch accepted. -There are a number of scripts which can aid in this; +There are a number of scripts which can aid in this: Quilt: http://savannah.nongnu.org/projects/quilt Randy Dunlap's patch scripts: -http://developer.osdl.org/rddunlap/scripts/patching-scripts.tgz +http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz Andrew Morton's patch scripts: -http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.16 +http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.20 + + 2) Describe your changes. @@ -132,21 +135,6 @@ which require discussion or do not have a clear advantage should usually be sent first to linux-kernel. Only after the patch is discussed should the patch then be submitted to Linus. -For small patches you may want to CC the Trivial Patch Monkey -trivial@rustcorp.com.au set up by Rusty Russell; which collects "trivial" -patches. Trivial patches must qualify for one of the following rules: - Spelling fixes in documentation - Spelling fixes which could break grep(1). - Warning fixes (cluttering with useless warnings is bad) - Compilation fixes (only if they are actually correct) - Runtime fixes (only if they actually fix things) - Removing use of deprecated functions/macros (eg. check_region). - Contact detail and documentation fixes - Non-portable code replaced by portable code (even in arch-specific, - since people copy, as long as it's trivial) - Any fix by the author/maintainer of the file. (ie. patch monkey - in re-transmission mode) - 5) Select your CC (e-mail carbon copy) list. @@ -161,6 +149,11 @@ USB, framebuffer devices, the VFS, the SCSI subsystem, etc. See the MAINTAINERS file for a mailing list that relates specifically to your change. +If changes affect userland-kernel interfaces, please send +the MAN-PAGES maintainer (as listed in the MAINTAINERS file) +a man-pages patch, or at least a notification of the change, +so that some information makes its way into the manual pages. + Even if the maintainer did not respond in step #4, make sure to ALWAYS copy the maintainer when you change their code. @@ -178,6 +171,8 @@ patches. Trivial patches must qualify for one of the following rules: since people copy, as long as it's trivial) Any fix by the author/maintainer of the file. (ie. patch monkey in re-transmission mode) +URL: <http://www.kernel.org/pub/linux/kernel/people/rusty/trivial/> + @@ -299,13 +294,24 @@ can certify the below: then you just add a line saying - Signed-off-by: Random J Developer <random@developer.org> + Signed-off-by: Random J Developer <random@developer.example.org> Some people also put extra tags at the end. They'll just be ignored for now, but you can do this to mark internal company procedures or just point out some special detail about the sign-off. + +12) More references for submitting patches + +Andrew Morton, "The perfect patch" (tpp). + <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt> + +Jeff Garzik, "Linux kernel patch submission format." + <http://linux.yyz.us/patch-format.html> + + + ----------------------------------- SECTION 2 - HINTS, TIPS, AND TRICKS ----------------------------------- @@ -374,7 +380,5 @@ and 'extern __inline__'. 4) Don't over-design. Don't try to anticipate nebulous future cases which may or may not -be useful: "Make it as simple as you can, and no simpler" - - +be useful: "Make it as simple as you can, and no simpler." diff --git a/Documentation/acpi-hotkey.txt b/Documentation/acpi-hotkey.txt new file mode 100644 index 00000000000..0acdc80c30c --- /dev/null +++ b/Documentation/acpi-hotkey.txt @@ -0,0 +1,38 @@ +driver/acpi/hotkey.c implement: +1. /proc/acpi/hotkey/event_config +(event based hotkey or event config interface): +a. add a event based hotkey(event) : +echo "0:bus::action:method:num:num" > event_config + +b. delete a event based hotkey(event): +echo "1:::::num:num" > event_config + +c. modify a event based hotkey(event): +echo "2:bus::action:method:num:num" > event_config + +2. /proc/acpi/hotkey/poll_config +(polling based hotkey or event config interface): +a.add a polling based hotkey(event) : +echo "0:bus:method:action:method:num" > poll_config +this adding command will create a proc file +/proc/acpi/hotkey/method, which is used to get +result of polling. + +b.delete a polling based hotkey(event): +echo "1:::::num" > event_config + +c.modify a polling based hotkey(event): +echo "2:bus:method:action:method:num" > poll_config + +3./proc/acpi/hotkey/action +(interface to call aml method associated with a +specific hotkey(event)) +echo "event_num:event_type:event_argument" > + /proc/acpi/hotkey/action. +The result of the execution of this aml method is +attached to /proc/acpi/hotkey/poll_method, which is dnyamically +created. Please use command "cat /proc/acpi/hotkey/polling_method" +to retrieve it. + +Note: Use cmdline "acpi_generic_hotkey" to over-ride +loading any platform specific drivers. diff --git a/Documentation/arm/Samsung-S3C24XX/USB-Host.txt b/Documentation/arm/Samsung-S3C24XX/USB-Host.txt new file mode 100644 index 00000000000..b93b68e2b14 --- /dev/null +++ b/Documentation/arm/Samsung-S3C24XX/USB-Host.txt @@ -0,0 +1,93 @@ + S3C24XX USB Host support + ======================== + + + +Introduction +------------ + + This document details the S3C2410/S3C2440 in-built OHCI USB host support. + +Configuration +------------- + + Enable at least the following kernel options: + + menuconfig: + + Device Drivers ---> + USB support ---> + <*> Support for Host-side USB + <*> OHCI HCD support + + + .config: + CONFIG_USB + CONFIG_USB_OHCI_HCD + + + Once these options are configured, the standard set of USB device + drivers can be configured and used. + + +Board Support +------------- + + The driver attaches to a platform device, which will need to be + added by the board specific support file in linux/arch/arm/mach-s3c2410, + such as mach-bast.c or mach-smdk2410.c + + The platform device's platform_data field is only needed if the + board implements extra power control or over-current monitoring. + + The OHCI driver does not ensure the state of the S3C2410's MISCCTRL + register, so if both ports are to be used for the host, then it is + the board support file's responsibility to ensure that the second + port is configured to be connected to the OHCI core. + + +Platform Data +------------- + + See linux/include/asm-arm/arch-s3c2410/usb-control.h for the + descriptions of the platform device data. An implementation + can be found in linux/arch/arm/mach-s3c2410/usb-simtec.c . + + The `struct s3c2410_hcd_info` contains a pair of functions + that get called to enable over-current detection, and to + control the port power status. + + The ports are numbered 0 and 1. + + power_control: + + Called to enable or disable the power on the port. + + enable_oc: + + Called to enable or disable the over-current monitoring. + This should claim or release the resources being used to + check the power condition on the port, such as an IRQ. + + report_oc: + + The OHCI driver fills this field in for the over-current code + to call when there is a change to the over-current state on + an port. The ports argument is a bitmask of 1 bit per port, + with bit X being 1 for an over-current on port X. + + The function s3c2410_usb_report_oc() has been provided to + ensure this is called correctly. + + port[x]: + + This is struct describes each port, 0 or 1. The platform driver + should set the flags field of each port to S3C_HCDFLG_USED if + the port is enabled. + + + +Document Author +--------------- + +Ben Dooks, (c) 2005 Simtec Electronics diff --git a/Documentation/basic_profiling.txt b/Documentation/basic_profiling.txt index 65e3dc2d443..8764e9f7082 100644 --- a/Documentation/basic_profiling.txt +++ b/Documentation/basic_profiling.txt @@ -27,9 +27,13 @@ dump output readprofile -m /boot/System.map > captured_profile Oprofile -------- -Get the source (I use 0.8) from http://oprofile.sourceforge.net/ -and add "idle=poll" to the kernel command line + +Get the source (see Changes for required version) from +http://oprofile.sourceforge.net/ and add "idle=poll" to the kernel command +line. + Configure with CONFIG_PROFILING=y and CONFIG_OPROFILE=y & reboot on new kernel + ./configure --with-kernel-support make install @@ -46,7 +50,7 @@ start opcontrol --start stop opcontrol --stop dump output opreport > output_file -To only report on the kernel, run opreport /boot/vmlinux > output_file +To only report on the kernel, run opreport -l /boot/vmlinux > output_file A reset is needed to clear old statistics, which survive a reboot. diff --git a/Documentation/block/ioprio.txt b/Documentation/block/ioprio.txt new file mode 100644 index 00000000000..96ccf681075 --- /dev/null +++ b/Documentation/block/ioprio.txt @@ -0,0 +1,176 @@ +Block io priorities +=================== + + +Intro +----- + +With the introduction of cfq v3 (aka cfq-ts or time sliced cfq), basic io +priorities is supported for reads on files. This enables users to io nice +processes or process groups, similar to what has been possible to cpu +scheduling for ages. This document mainly details the current possibilites +with cfq, other io schedulers do not support io priorities so far. + +Scheduling classes +------------------ + +CFQ implements three generic scheduling classes that determine how io is +served for a process. + +IOPRIO_CLASS_RT: This is the realtime io class. This scheduling class is given +higher priority than any other in the system, processes from this class are +given first access to the disk every time. Thus it needs to be used with some +care, one io RT process can starve the entire system. Within the RT class, +there are 8 levels of class data that determine exactly how much time this +process needs the disk for on each service. In the future this might change +to be more directly mappable to performance, by passing in a wanted data +rate instead. + +IOPRIO_CLASS_BE: This is the best-effort scheduling class, which is the default +for any process that hasn't set a specific io priority. The class data +determines how much io bandwidth the process will get, it's directly mappable +to the cpu nice levels just more coarsely implemented. 0 is the highest +BE prio level, 7 is the lowest. The mapping between cpu nice level and io +nice level is determined as: io_nice = (cpu_nice + 20) / 5. + +IOPRIO_CLASS_IDLE: This is the idle scheduling class, processes running at this +level only get io time when no one else needs the disk. The idle class has no +class data, since it doesn't really apply here. + +Tools +----- + +See below for a sample ionice tool. Usage: + +# ionice -c<class> -n<level> -p<pid> + +If pid isn't given, the current process is assumed. IO priority settings +are inherited on fork, so you can use ionice to start the process at a given +level: + +# ionice -c2 -n0 /bin/ls + +will run ls at the best-effort scheduling class at the highest priority. +For a running process, you can give the pid instead: + +# ionice -c1 -n2 -p100 + +will change pid 100 to run at the realtime scheduling class, at priority 2. + +---> snip ionice.c tool <--- + +#include <stdio.h> +#include <stdlib.h> +#include <errno.h> +#include <getopt.h> +#include <unistd.h> +#include <sys/ptrace.h> +#include <asm/unistd.h> + +extern int sys_ioprio_set(int, int, int); +extern int sys_ioprio_get(int, int); + +#if defined(__i386__) +#define __NR_ioprio_set 289 +#define __NR_ioprio_get 290 +#elif defined(__ppc__) +#define __NR_ioprio_set 273 +#define __NR_ioprio_get 274 +#elif defined(__x86_64__) +#define __NR_ioprio_set 251 +#define __NR_ioprio_get 252 +#elif defined(__ia64__) +#define __NR_ioprio_set 1274 +#define __NR_ioprio_get 1275 +#else +#error "Unsupported arch" +#endif + +_syscall3(int, ioprio_set, int, which, int, who, int, ioprio); +_syscall2(int, ioprio_get, int, which, int, who); + +enum { + IOPRIO_CLASS_NONE, + IOPRIO_CLASS_RT, + IOPRIO_CLASS_BE, + IOPRIO_CLASS_IDLE, +}; + +enum { + IOPRIO_WHO_PROCESS = 1, + IOPRIO_WHO_PGRP, + IOPRIO_WHO_USER, +}; + +#define IOPRIO_CLASS_SHIFT 13 + +const char *to_prio[] = { "none", "realtime", "best-effort", "idle", }; + +int main(int argc, char *argv[]) +{ + int ioprio = 4, set = 0, ioprio_class = IOPRIO_CLASS_BE; + int c, pid = 0; + + while ((c = getopt(argc, argv, "+n:c:p:")) != EOF) { + switch (c) { + case 'n': + ioprio = strtol(optarg, NULL, 10); + set = 1; + break; + case 'c': + ioprio_class = strtol(optarg, NULL, 10); + set = 1; + break; + case 'p': + pid = strtol(optarg, NULL, 10); + break; + } + } + + switch (ioprio_class) { + case IOPRIO_CLASS_NONE: + ioprio_class = IOPRIO_CLASS_BE; + break; + case IOPRIO_CLASS_RT: + case IOPRIO_CLASS_BE: + break; + case IOPRIO_CLASS_IDLE: + ioprio = 7; + break; + default: + printf("bad prio class %d\n", ioprio_class); + return 1; + } + + if (!set) { + if (!pid && argv[optind]) + pid = strtol(argv[optind], NULL, 10); + + ioprio = ioprio_get(IOPRIO_WHO_PROCESS, pid); + + printf("pid=%d, %d\n", pid, ioprio); + + if (ioprio == -1) + perror("ioprio_get"); + else { + ioprio_class = ioprio >> IOPRIO_CLASS_SHIFT; + ioprio = ioprio & 0xff; + printf("%s: prio %d\n", to_prio[ioprio_class], ioprio); + } + } else { + if (ioprio_set(IOPRIO_WHO_PROCESS, pid, ioprio | ioprio_class << IOPRIO_CLASS_SHIFT) == -1) { + perror("ioprio_set"); + return 1; + } + + if (argv[optind]) + execvp(argv[optind], &argv[optind]); + } + + return 0; +} + +---> snip ionice.c tool <--- + + +March 11 2005, Jens Axboe <axboe@suse.de> diff --git a/Documentation/cciss.txt b/Documentation/cciss.txt index d599beb9df8..c8f9a73111d 100644 --- a/Documentation/cciss.txt +++ b/Documentation/cciss.txt @@ -17,6 +17,7 @@ This driver is known to work with the following cards: * SA P600 * SA P800 * SA E400 + * SA E300 If nodes are not already created in the /dev/cciss directory, run as root: diff --git a/Documentation/cdrom/sbpcd b/Documentation/cdrom/sbpcd index d1825dffca3..b3ba63f4ce3 100644 --- a/Documentation/cdrom/sbpcd +++ b/Documentation/cdrom/sbpcd @@ -419,6 +419,7 @@ into the file "track01": */ #include <stdio.h> #include <sys/ioctl.h> +#include <sys/types.h> #include <linux/cdrom.h> static struct cdrom_tochdr hdr; @@ -429,7 +430,7 @@ static int datafile, drive; static int i, j, limit, track, err; static char filename[32]; -main(int argc, char *argv[]) +int main(int argc, char *argv[]) { /* * open /dev/cdrom @@ -516,6 +517,7 @@ entry[track+1].cdte_addr.lba=entry[track].cdte_addr.lba+300; } arg.addr.lba++; } + return 0; } /*===================== end program ========================================*/ @@ -564,15 +566,16 @@ Appendix -- the "cdtester" utility: #include <stdio.h> #include <malloc.h> #include <sys/ioctl.h> +#include <sys/types.h> #include <linux/cdrom.h> #ifdef AZT_PRIVATE_IOCTLS #include <linux/../../drivers/cdrom/aztcd.h> -#endif AZT_PRIVATE_IOCTLS +#endif /* AZT_PRIVATE_IOCTLS */ #ifdef SBP_PRIVATE_IOCTLS #include <linux/../../drivers/cdrom/sbpcd.h> #include <linux/fs.h> -#endif SBP_PRIVATE_IOCTLS +#endif /* SBP_PRIVATE_IOCTLS */ struct cdrom_tochdr hdr; struct cdrom_tochdr tocHdr; @@ -590,7 +593,7 @@ union struct cdrom_msf msf; unsigned char buf[CD_FRAMESIZE_RAW]; } azt; -#endif AZT_PRIVATE_IOCTLS +#endif /* AZT_PRIVATE_IOCTLS */ int i, i1, i2, i3, j, k; unsigned char sequence=0; unsigned char command[80]; @@ -738,7 +741,7 @@ void display(int size,unsigned char *buffer) } } -main(int argc, char *argv[]) +int main(int argc, char *argv[]) { printf("\nTesting tool for a CDROM driver's audio functions V0.1\n"); printf("(C) 1995 Eberhard Moenkeberg <emoenke@gwdg.de>\n"); @@ -1046,12 +1049,13 @@ main(int argc, char *argv[]) rc=ioctl(drive,CDROMAUDIOBUFSIZ,j); printf("%d frames granted.\n",rc); break; -#endif SBP_PRIVATE_IOCTLS +#endif /* SBP_PRIVATE_IOCTLS */ default: printf("unknown command: \"%s\".\n",command); break; } } + return 0; } /*==========================================================================*/ diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt index b85481acd0c..933fae74c33 100644 --- a/Documentation/cpu-freq/governors.txt +++ b/Documentation/cpu-freq/governors.txt @@ -9,6 +9,7 @@ Dominik Brodowski <linux@brodo.de> + some additions and corrections by Nico Golde <nico@ngolde.de> @@ -25,6 +26,7 @@ Contents: 2.1 Performance 2.2 Powersave 2.3 Userspace +2.4 Ondemand 3. The Governor Interface in the CPUfreq Core @@ -86,7 +88,7 @@ highest frequency within the borders of scaling_min_freq and scaling_max_freq. -2.1 Powersave +2.2 Powersave ------------- The CPUfreq governor "powersave" sets the CPU statically to the @@ -94,7 +96,7 @@ lowest frequency within the borders of scaling_min_freq and scaling_max_freq. -2.2 Userspace +2.3 Userspace ------------- The CPUfreq governor "userspace" allows the user, or any userspace @@ -103,6 +105,14 @@ by making a sysfs file "scaling_setspeed" available in the CPU-device directory. +2.4 Ondemand +------------ + +The CPUfreq govenor "ondemand" sets the CPU depending on the +current usage. To do this the CPU must have the capability to +switch the frequency very fast. + + 3. The Governor Interface in the CPUfreq Core ============================================= diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt index 2f8f24eaefd..ad944c06031 100644 --- a/Documentation/cpusets.txt +++ b/Documentation/cpusets.txt @@ -51,6 +51,14 @@ mems_allowed vector. If a cpuset is cpu or mem exclusive, no other cpuset, other than a direct ancestor or descendent, may share any of the same CPUs or Memory Nodes. +A cpuset that is cpu exclusive has a sched domain associated with it. +The sched domain consists of all cpus in the current cpuset that are not +part of any exclusive child cpusets. +This ensures that the scheduler load balacing code only balances +against the cpus that are in the sched domain as defined above and not +all of the cpus in the system. This removes any overhead due to +load balancing code trying to pull tasks outside of the cpu exclusive +cpuset only to be prevented by the tasks' cpus_allowed mask. User level code may create and destroy cpusets by name in the cpuset virtual file system, manage the attributes and permissions of these @@ -84,6 +92,9 @@ This can be especially valuable on: and a database), or * NUMA systems running large HPC applications with demanding performance characteristics. + * Also cpu_exclusive cpusets are useful for servers running orthogonal + workloads such as RT applications requiring low latency and HPC + applications that are throughput sensitive These subsets, or "soft partitions" must be able to be dynamically adjusted, as the job mix changes, without impacting other concurrently @@ -125,6 +136,8 @@ Cpusets extends these two mechanisms as follows: - A cpuset may be marked exclusive, which ensures that no other cpuset (except direct ancestors and descendents) may contain any overlapping CPUs or Memory Nodes. + Also a cpu_exclusive cpuset would be associated with a sched + domain. - You can list all the tasks (by pid) attached to any cpuset. The implementation of cpusets requires a few, simple hooks @@ -136,6 +149,9 @@ into the rest of the kernel, none in performance critical paths: allowed in that tasks cpuset. - in sched.c migrate_all_tasks(), to keep migrating tasks within the CPUs allowed by their cpuset, if possible. + - in sched.c, a new API partition_sched_domains for handling + sched domain changes associated with cpu_exclusive cpusets + and related changes in both sched.c and arch/ia64/kernel/domain.c - in the mbind and set_mempolicy system calls, to mask the requested Memory Nodes by what's allowed in that tasks cpuset. - in page_alloc, to restrict memory to allowed nodes. diff --git a/Documentation/devices.txt b/Documentation/devices.txt index bb67cf25010..0f515175c72 100644 --- a/Documentation/devices.txt +++ b/Documentation/devices.txt @@ -94,6 +94,7 @@ Your cooperation is appreciated. 9 = /dev/urandom Faster, less secure random number gen. 10 = /dev/aio Asyncronous I/O notification interface 11 = /dev/kmsg Writes to this come out as printk's + 12 = /dev/oldmem Access to crash dump from kexec kernel 1 block RAM disk 0 = /dev/ram0 First RAM disk 1 = /dev/ram1 Second RAM disk diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 9a33bb94f74..96bea278bbf 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff @@ -41,6 +41,7 @@ COPYING CREDITS CVS ChangeSet +Image Kerntypes MODS.txt Module.symvers @@ -103,6 +104,8 @@ logo_*.c logo_*_clut224.c logo_*_mono.c lxdialog +mach-types +mach-types.h make_times_h map maui_boot.h @@ -111,6 +114,7 @@ mkdep mktables modpost modversions.h* +offset.h offsets.h oui.c* parse.c* diff --git a/Documentation/dvb/README.dibusb b/Documentation/dvb/README.dibusb deleted file mode 100644 index 7a9e958513f..00000000000 --- a/Documentation/dvb/README.dibusb +++ /dev/null @@ -1,285 +0,0 @@ -Documentation for dib3000* frontend drivers and dibusb device driver -==================================================================== - -Copyright (C) 2004-5 Patrick Boettcher (patrick.boettcher@desy.de), - -dibusb and dib3000mb/mc drivers based on GPL code, which has - -Copyright (C) 2004 Amaury Demol for DiBcom (ademol@dibcom.fr) - -This program is free software; you can redistribute it and/or -modify it under the terms of the GNU General Public License as -published by the Free Software Foundation, version 2. - - -Supported devices USB1.1 -======================== - -Produced and reselled by Twinhan: ---------------------------------- -- TwinhanDTV USB-Ter DVB-T Device (VP7041) - http://www.twinhan.com/product_terrestrial_3.asp - -- TwinhanDTV Magic Box (VP7041e) - http://www.twinhan.com/product_terrestrial_4.asp - -- HAMA DVB-T USB device - http://www.hama.de/portal/articleId*110620/action*2598 - -- CTS Portable (Chinese Television System) (2) - http://www.2cts.tv/ctsportable/ - -- Unknown USB DVB-T device with vendor ID Hyper-Paltek - - -Produced and reselled by KWorld: --------------------------------- -- KWorld V-Stream XPERT DTV DVB-T USB - http://www.kworld.com.tw/en/product/DVBT-USB/DVBT-USB.html - -- JetWay DTV DVB-T USB - http://www.jetway.com.tw/evisn/product/lcd-tv/DVT-USB/dtv-usb.htm - -- ADSTech Instant TV DVB-T USB - http://www.adstech.com/products/PTV-333/intro/PTV-333_intro.asp?pid=PTV-333 - - -Others: -------- -- Ultima Electronic/Artec T1 USB TVBOX (AN2135, AN2235, AN2235 with Panasonic Tuner) - http://82.161.246.249/products-tvbox.html - -- Compro Videomate DVB-U2000 - DVB-T USB (2) - http://www.comprousa.com/products/vmu2000.htm - -- Grandtec USB DVB-T - http://www.grand.com.tw/ - -- Avermedia AverTV DVBT USB (2) - http://www.avermedia.com/ - -- DiBcom USB DVB-T reference device (non-public) - - -Supported devices USB2.0 -======================== -- Twinhan MagicBox II (2) - http://www.twinhan.com/product_terrestrial_7.asp - -- Hanftek UMT-010 (1) - http://www.globalsources.com/si/6008819757082/ProductDetail/Digital-TV/product_id-100046529 - -- Typhoon/Yakumo/HAMA DVB-T mobile USB2.0 (1) - http://www.yakumo.de/produkte/index.php?pid=1&ag=DVB-T - -- Artec T1 USB TVBOX (FX2) (2) - -- Hauppauge WinTV NOVA-T USB2 - http://www.hauppauge.com/ - -- KWorld/ADSTech Instant DVB-T USB2.0 (DiB3000M-B) - -- DiBcom USB2.0 DVB-T reference device (non-public) - -1) It is working almost. -2) No test reports received yet. - - -0. NEWS: - 2005-02-11 - added support for the KWorld/ADSTech Instant DVB-T USB2.0. Thanks a lot to Joachim von Caron - 2005-02-02 - added support for the Hauppauge Win-TV Nova-T USB2 - 2005-01-31 - distorted streaming is finally gone for USB1.1 devices - 2005-01-13 - moved the mirrored pid_filter_table back to dvb-dibusb - - first almost working version for HanfTek UMT-010 - - found out, that Yakumo/HAMA/Typhoon are predessors of the HanfTek UMT-010 - 2005-01-10 - refactoring completed, now everything is very delightful - - tuner quirks for some weird devices (Artec T1 AN2235 device has sometimes a - Panasonic Tuner assembled). Tunerprobing implemented. Thanks a lot to Gunnar Wittich. - 2004-12-29 - after several days of struggling around bug of no returning URBs fixed. - 2004-12-26 - refactored the dibusb-driver, splitted into separate files - - i2c-probing enabled - 2004-12-06 - possibility for demod i2c-address probing - - new usb IDs (Compro,Artec) - 2004-11-23 - merged changes from DiB3000MC_ver2.1 - - revised the debugging - - possibility to deliver the complete TS for USB2.0 - 2004-11-21 - first working version of the dib3000mc/p frontend driver. - 2004-11-12 - added additional remote control keys. Thanks to Uwe Hanke. - 2004-11-07 - added remote control support. Thanks to David Matthews. - 2004-11-05 - added support for a new devices (Grandtec/Avermedia/Artec) - - merged my changes (for dib3000mb/dibusb) to the FE_REFACTORING, because it became HEAD - - moved transfer control (pid filter, fifo control) from usb driver to frontend, it seems - better settled there (added xfer_ops-struct) - - created a common files for frontends (mc/p/mb) - 2004-09-28 - added support for a new device (Unkown, vendor ID is Hyper-Paltek) - 2004-09-20 - added support for a new device (Compro DVB-U2000), thanks - to Amaury Demol for reporting - - changed usb TS transfer method (several urbs, stopping transfer - before setting a new pid) - 2004-09-13 - added support for a new device (Artec T1 USB TVBOX), thanks - to Christian Motschke for reporting - 2004-09-05 - released the dibusb device and dib3000mb-frontend driver - - (old news for vp7041.c) - 2004-07-15 - found out, by accident, that the device has a TUA6010XS for - PLL - 2004-07-12 - figured out, that the driver should also work with the - CTS Portable (Chinese Television System) - 2004-07-08 - firmware-extraction-2.422-problem solved, driver is now working - properly with firmware extracted from 2.422 - - #if for 2.6.4 (dvb), compile issue - - changed firmware handling, see vp7041.txt sec 1.1 - 2004-07-02 - some tuner modifications, v0.1, cleanups, first public - 2004-06-28 - now using the dvb_dmx_swfilter_packets, everything - runs fine now - 2004-06-27 - able to watch and switching channels (pre-alpha) - - no section filtering yet - 2004-06-06 - first TS received, but kernel oops :/ - 2004-05-14 - firmware loader is working - 2004-05-11 - start writing the driver - -1. How to use? -NOTE: This driver was developed using Linux 2.6.6., -it is working with 2.6.7 and above. - -Linux 2.4.x support is not planned, but patches are very welcome. - -NOTE: I'm using Debian testing, so the following explaination (especially -the hotplug-path) needn't match your system, but probably it will :). - -The driver is included in the kernel since Linux 2.6.10. - -1.1. Firmware - -The USB driver needs to download a firmware to start working. - -You can either use "get_dvb_firmware dibusb" to download the firmware or you -can get it directly via - -for USB1.1 (AN2135) -http://www.linuxtv.org/downloads/firmware/dvb-dibusb-5.0.0.11.fw - -for USB1.1 (AN2235) (a few Artec T1 devices) -http://www.linuxtv.org/downloads/firmware/dvb-dibusb-an2235-1.fw - -for USB2.0 (FX2) Hauppauge, DiBcom -http://www.linuxtv.org/downloads/firmware/dvb-dibusb-6.0.0.5.fw - -for USB2.0 ADSTech/Kworld USB2.0 -http://www.linuxtv.org/downloads/firmware/dvb-dibusb-adstech-usb2-1.fw - -for USB2.0 HanfTek -http://www.linuxtv.org/downloads/firmware/dvb-dibusb-an2235-1.fw - - -1.2. Compiling - -Since the driver is in the linux kernel, activating the driver in -your favorite config-environment should sufficient. I recommend -to compile the driver as module. Hotplug does the rest. - -1.3. Loading the drivers - -Hotplug is able to load the driver, when it is needed (because you plugged -in the device). - -If you want to enable debug output, you have to load the driver manually and -from withing the dvb-kernel cvs repository. - -first have a look, which debug level are available: - -modinfo dib3000mb -modinfo dib3000-common -modinfo dib3000mc -modinfo dvb-dibusb - -modprobe dib3000-common debug=<level> -modprobe dib3000mb debug=<level> -modprobe dib3000mc debug=<level> -modprobe dvb-dibusb debug=<level> - -should do the trick. - -When the driver is loaded successfully, the firmware file was in -the right place and the device is connected, the "Power"-LED should be -turned on. - -At this point you should be able to start a dvb-capable application. For myself -I used mplayer, dvbscan, tzap and kaxtv, they are working. Using the device -in vdr is working now also. - -2. Known problems and bugs - -- Don't remove the USB device while running an DVB application, your system will die. - -2.1. Adding support for devices - -It is not possible to determine the range of devices based on the DiBcom -reference designs. This is because the reference design of DiBcom can be sold -to thirds, without telling DiBcom (so done with the Twinhan VP7041 and -the HAMA device). - -When you think you have a device like this and the driver does not recognizes it, -please send the ****load*.inf and the ****cap*.inf of the Windows driver to me. - -Sometimes the Vendor or Product ID is identical to the ones of Twinhan, even -though it is not a Twinhan device (e.g. HAMA), then please send me the name -of the device. I will add it to this list in order to make this clear to -others. - -If you are familar with C you can also add the VID and PID of the device to -the dvb-dibusb-core.c-file and create a patch and send it over to me or to -the linux-dvb mailing list, _after_ you have tried compiling and modprobing -it. - -2.2. USB1.1 Bandwidth limitation - -Most of the currently supported devices are USB1.1 and thus they have a -maximum bandwidth of about 5-6 MBit/s when connected to a USB2.0 hub. -This is not enough for receiving the complete transport stream of a -DVB-T channel (which can be about 16 MBit/s). Normally this is not a -problem, if you only want to watch TV (this does not apply for HDTV), -but watching a channel while recording another channel on the same -frequency simply does not work very well. This applies to all USB1.1 -DVB-T devices, not just dibusb) - -Update: For the USB1.1 and VDR some work has been done (patches and comments -are still very welcome). Maybe the problem is solved in the meantime because I -now use the dmx_sw_filter function instead of dmx_sw_filter_packet. I hope the -linux-dvb software filter is able to get the best of the garbled TS. - -The bug, where the TS is distorted by a heavy usage of the device is gone -definitely. All dibusb-devices I was using (Twinhan, Kworld, DiBcom) are -working like charm now with VDR. Sometimes I even was able to record a channel -and watch another one. - -2.3. Comments - -Patches, comments and suggestions are very very welcome. - -3. Acknowledgements - Amaury Demol (ademol@dibcom.fr) and Francois Kanounnikoff from DiBcom for - providing specs, code and help, on which the dvb-dibusb, dib3000mb and - dib3000mc are based. - - David Matthews for identifying a new device type (Artec T1 with AN2235) - and for extending dibusb with remote control event handling. Thank you. - - Alex Woods for frequently answering question about usb and dvb - stuff, a big thank you. - - Bernd Wagner for helping with huge bug reports and discussions. - - Gunnar Wittich and Joachim von Caron for their trust for giving me - root-shells on their machines to implement support for new devices. - - Some guys on the linux-dvb mailing list for encouraging me - - Peter Schildmann >peter.schildmann-nospam-at-web.de< for his - user-level firmware loader, which saves a lot of time - (when writing the vp7041 driver) - - Ulf Hermenau for helping me out with traditional chinese. - - André Smoktun and Christian Frömmel for supporting me with - hardware and listening to my problems very patient diff --git a/Documentation/dvb/README.dvb-usb b/Documentation/dvb/README.dvb-usb new file mode 100644 index 00000000000..ac0797ea646 --- /dev/null +++ b/Documentation/dvb/README.dvb-usb @@ -0,0 +1,232 @@ +Documentation for dvb-usb-framework module and its devices + +Idea behind the dvb-usb-framework +================================= + +In March 2005 I got the new Twinhan USB2.0 DVB-T device. They provided specs and a firmware. + +Quite keen I wanted to put the driver (with some quirks of course) into dibusb. +After reading some specs and doing some USB snooping, it realized, that the +dibusb-driver would be a complete mess afterwards. So I decided to do it in a +different way: With the help of a dvb-usb-framework. + +The framework provides generic functions (mostly kernel API calls), such as: + +- Transport Stream URB handling in conjunction with dvb-demux-feed-control + (bulk and isoc are supported) +- registering the device for the DVB-API +- registering an I2C-adapter if applicable +- remote-control/input-device handling +- firmware requesting and loading (currently just for the Cypress USB + controllers) +- other functions/methods which can be shared by several drivers (such as + functions for bulk-control-commands) +- TODO: a I2C-chunker. It creates device-specific chunks of register-accesses + depending on length of a register and the number of values that can be + multi-written and multi-read. + +The source code of the particular DVB USB devices does just the communication +with the device via the bus. The connection between the DVB-API-functionality +is done via callbacks, assigned in a static device-description (struct +dvb_usb_device) each device-driver has to have. + +For an example have a look in drivers/media/dvb/dvb-usb/vp7045*. + +Objective is to migrate all the usb-devices (dibusb, cinergyT2, maybe the +ttusb; flexcop-usb already benefits from the generic flexcop-device) to use +the dvb-usb-lib. + +TODO: dynamic enabling and disabling of the pid-filter in regard to number of +feeds requested. + +Supported devices +======================== + +See the LinuxTV DVB Wiki at www.linuxtv.org for a complete list of +cards/drivers/firmwares: + +http://www.linuxtv.org/wiki/index.php/DVB_USB + +0. History & News: + 2005-06-30 - added support for WideView WT-220U (Thanks to Steve Chang) + 2005-05-30 - added basic isochronous support to the dvb-usb-framework + added support for Conexant Hybrid reference design and Nebula DigiTV USB + 2005-04-17 - all dibusb devices ported to make use of the dvb-usb-framework + 2005-04-02 - re-enabled and improved remote control code. + 2005-03-31 - ported the Yakumo/Hama/Typhoon DVB-T USB2.0 device to dvb-usb. + 2005-03-30 - first commit of the dvb-usb-module based on the dibusb-source. First device is a new driver for the + TwinhanDTV Alpha / MagicBox II USB2.0-only DVB-T device. + + (change from dvb-dibusb to dvb-usb) + 2005-03-28 - added support for the AVerMedia AverTV DVB-T USB2.0 device (Thanks to Glen Harris and Jiun-Kuei Jung, AVerMedia) + 2005-03-14 - added support for the Typhoon/Yakumo/HAMA DVB-T mobile USB2.0 + 2005-02-11 - added support for the KWorld/ADSTech Instant DVB-T USB2.0. Thanks a lot to Joachim von Caron + 2005-02-02 - added support for the Hauppauge Win-TV Nova-T USB2 + 2005-01-31 - distorted streaming is gone for USB1.1 devices + 2005-01-13 - moved the mirrored pid_filter_table back to dvb-dibusb + - first almost working version for HanfTek UMT-010 + - found out, that Yakumo/HAMA/Typhoon are predecessors of the HanfTek UMT-010 + 2005-01-10 - refactoring completed, now everything is very delightful + - tuner quirks for some weird devices (Artec T1 AN2235 device has sometimes a + Panasonic Tuner assembled). Tunerprobing implemented. Thanks a lot to Gunnar Wittich. + 2004-12-29 - after several days of struggling around bug of no returning URBs fixed. + 2004-12-26 - refactored the dibusb-driver, splitted into separate files + - i2c-probing enabled + 2004-12-06 - possibility for demod i2c-address probing + - new usb IDs (Compro, Artec) + 2004-11-23 - merged changes from DiB3000MC_ver2.1 + - revised the debugging + - possibility to deliver the complete TS for USB2.0 + 2004-11-21 - first working version of the dib3000mc/p frontend driver. + 2004-11-12 - added additional remote control keys. Thanks to Uwe Hanke. + 2004-11-07 - added remote control support. Thanks to David Matthews. + 2004-11-05 - added support for a new devices (Grandtec/Avermedia/Artec) + - merged my changes (for dib3000mb/dibusb) to the FE_REFACTORING, because it became HEAD + - moved transfer control (pid filter, fifo control) from usb driver to frontend, it seems + better settled there (added xfer_ops-struct) + - created a common files for frontends (mc/p/mb) + 2004-09-28 - added support for a new device (Unkown, vendor ID is Hyper-Paltek) + 2004-09-20 - added support for a new device (Compro DVB-U2000), thanks + to Amaury Demol for reporting + - changed usb TS transfer method (several urbs, stopping transfer + before setting a new pid) + 2004-09-13 - added support for a new device (Artec T1 USB TVBOX), thanks + to Christian Motschke for reporting + 2004-09-05 - released the dibusb device and dib3000mb-frontend driver + + (old news for vp7041.c) + 2004-07-15 - found out, by accident, that the device has a TUA6010XS for + PLL + 2004-07-12 - figured out, that the driver should also work with the + CTS Portable (Chinese Television System) + 2004-07-08 - firmware-extraction-2.422-problem solved, driver is now working + properly with firmware extracted from 2.422 + - #if for 2.6.4 (dvb), compile issue + - changed firmware handling, see vp7041.txt sec 1.1 + 2004-07-02 - some tuner modifications, v0.1, cleanups, first public + 2004-06-28 - now using the dvb_dmx_swfilter_packets, everything + runs fine now + 2004-06-27 - able to watch and switching channels (pre-alpha) + - no section filtering yet + 2004-06-06 - first TS received, but kernel oops :/ + 2004-05-14 - firmware loader is working + 2004-05-11 - start writing the driver + +1. How to use? +1.1. Firmware + +Most of the USB drivers need to download a firmware to the device before start +working. + +Have a look at the Wikipage for the DVB-USB-drivers to find out, which firmware +you need for your device: + +http://www.linuxtv.org/wiki/index.php/DVB_USB + +1.2. Compiling + +Since the driver is in the linux kernel, activating the driver in +your favorite config-environment should sufficient. I recommend +to compile the driver as module. Hotplug does the rest. + +If you use dvb-kernel enter the build-2.6 directory run 'make' and 'insmod.sh +load' afterwards. + +1.3. Loading the drivers + +Hotplug is able to load the driver, when it is needed (because you plugged +in the device). + +If you want to enable debug output, you have to load the driver manually and +from withing the dvb-kernel cvs repository. + +first have a look, which debug level are available: + +modinfo dvb-usb +modinfo dvb-usb-vp7045 +etc. + +modprobe dvb-usb debug=<level> +modprobe dvb-usb-vp7045 debug=<level> +etc. + +should do the trick. + +When the driver is loaded successfully, the firmware file was in +the right place and the device is connected, the "Power"-LED should be +turned on. + +At this point you should be able to start a dvb-capable application. I'm use +(t|s)zap, mplayer and dvbscan to test the basics. VDR-xine provides the +long-term test scenario. + +2. Known problems and bugs + +- Don't remove the USB device while running an DVB application, your system + will go crazy or die most likely. + +2.1. Adding support for devices + +TODO + +2.2. USB1.1 Bandwidth limitation + +A lot of the currently supported devices are USB1.1 and thus they have a +maximum bandwidth of about 5-6 MBit/s when connected to a USB2.0 hub. +This is not enough for receiving the complete transport stream of a +DVB-T channel (which is about 16 MBit/s). Normally this is not a +problem, if you only want to watch TV (this does not apply for HDTV), +but watching a channel while recording another channel on the same +frequency simply does not work very well. This applies to all USB1.1 +DVB-T devices, not just the dvb-usb-devices) + +The bug, where the TS is distorted by a heavy usage of the device is gone +definitely. All dvb-usb-devices I was using (Twinhan, Kworld, DiBcom) are +working like charm now with VDR. Sometimes I even was able to record a channel +and watch another one. + +2.3. Comments + +Patches, comments and suggestions are very very welcome. + +3. Acknowledgements + Amaury Demol (ademol@dibcom.fr) and Francois Kanounnikoff from DiBcom for + providing specs, code and help, on which the dvb-dibusb, dib3000mb and + dib3000mc are based. + + David Matthews for identifying a new device type (Artec T1 with AN2235) + and for extending dibusb with remote control event handling. Thank you. + + Alex Woods for frequently answering question about usb and dvb + stuff, a big thank you. + + Bernd Wagner for helping with huge bug reports and discussions. + + Gunnar Wittich and Joachim von Caron for their trust for providing + root-shells on their machines to implement support for new devices. + + Allan Third and Michael Hutchinson for their help to write the Nebula + digitv-driver. + + Glen Harris for bringing up, that there is a new dibusb-device and Jiun-Kuei + Jung from AVerMedia who kindly provided a special firmware to get the device + up and running in Linux. + + Jennifer Chen, Jeff and Jack from Twinhan for kindly supporting by + writing the vp7045-driver. + + Steve Chang from WideView for providing information for new devices and + firmware files. + + Michael Paxton for submitting remote control keymaps. + + Some guys on the linux-dvb mailing list for encouraging me. + + Peter Schildmann >peter.schildmann-nospam-at-web.de< for his + user-level firmware loader, which saves a lot of time + (when writing the vp7041 driver) + + Ulf Hermenau for helping me out with traditional chinese. + + André Smoktun and Christian Frömmel for supporting me with + hardware and listening to my problems very patiently. diff --git a/Documentation/dvb/bt8xx.txt b/Documentation/dvb/bt8xx.txt index d64430bf4bb..e6b8d05bc08 100644 --- a/Documentation/dvb/bt8xx.txt +++ b/Documentation/dvb/bt8xx.txt @@ -1,69 +1,55 @@ -How to get the Nebula, PCTV and Twinhan DST cards working -========================================================= +How to get the Nebula Electronics DigiTV, Pinnacle PCTV Sat, Twinhan DST + clones working +========================================================================================= -This class of cards has a bt878a as the PCI interface, and -require the bttv driver. +1) General information +====================== -Please pay close attention to the warning about the bttv module -options below for the DST card. +This class of cards has a bt878a chip as the PCI interface. +The different card drivers require the bttv driver to provide the means +to access the i2c bus and the gpio pins of the bt8xx chipset. -1) General informations -======================= +2) Compilation rules for Kernel >= 2.6.12 +========================================= -These drivers require the bttv driver to provide the means to access -the i2c bus and the gpio pins of the bt8xx chipset. +Enable the following options: -Because of this, you need to enable "Device drivers" => "Multimedia devices" - => "Video For Linux" => "BT848 Video For Linux" - -Furthermore you need to enable + => "Video For Linux" => "BT848 Video For Linux" "Device drivers" => "Multimedia devices" => "Digital Video Broadcasting Devices" - => "DVB for Linux" "DVB Core Support" "Nebula/Pinnacle PCTV/TwinHan PCI Cards" + => "DVB for Linux" "DVB Core Support" "Nebula/Pinnacle PCTV/TwinHan PCI Cards" -2) Loading Modules -================== +3) Loading Modules, described by two approaches +=============================================== In general you need to load the bttv driver, which will handle the gpio and -i2c communication for us, plus the common dvb-bt8xx device driver. -The frontends for Nebula (nxt6000), Pinnacle PCTV (cx24110) and -TwinHan (dst) are loaded automatically by the dvb-bt8xx device driver. +i2c communication for us, plus the common dvb-bt8xx device driver, +which is called the backend. +The frontends for Nebula DigiTV (nxt6000), Pinnacle PCTV Sat (cx24110), +TwinHan DST + clones (dst and dst-ca) are loaded automatically by the backend. +For further details about TwinHan DST + clones see /Documentation/dvb/ci.txt. -3a) Nebula / Pinnacle PCTV --------------------------- +3a) The manual approach +----------------------- - $ modprobe bttv (normally bttv is being loaded automatically by kmod) - $ modprobe dvb-bt8xx (or just place dvb-bt8xx in /etc/modules for automatic loading) +Loading modules: +modprobe bttv +modprobe dvb-bt8xx +Unloading modules: +modprobe -r dvb-bt8xx +modprobe -r bttv -3b) TwinHan and Clones +3b) The automatic approach -------------------------- - $ modprobe bttv i2c_hw=1 card=0x71 - $ modprobe dvb-bt8xx - $ modprobe dst - -The value 0x71 will override the PCI type detection for dvb-bt8xx, -which is necessary for TwinHan cards. - -If you're having an older card (blue color circuit) and card=0x71 locks -your machine, try using 0x68, too. If that does not work, ask on the -mailing list. - -The DST module takes a couple of useful parameters. - -verbose takes values 0 to 5. These values control the verbosity level. - -debug takes values 0 and 1. You can either disable or enable debugging. - -dst_addons takes values 0 and 0x20. A value of 0 means it is a FTA card. -0x20 means it has a Conditional Access slot. - -The autodected values are determined bythe cards 'response -string' which you can see in your logs e.g. +If not already done by installation, place a line either in +/etc/modules.conf or in /etc/modprobe.conf containing this text: +alias char-major-81 bttv -dst_get_device_id: Recognise [DSTMCI] +Then place a line in /etc/modules containing this text: +dvb-bt8xx +Reboot your system and have fun! -- -Authors: Richard Walker, Jamie Honan, Michael Hunold, Manu Abraham +Authors: Richard Walker, Jamie Honan, Michael Hunold, Manu Abraham, Uwe Bugla diff --git a/Documentation/fb/vesafb.txt b/Documentation/fb/vesafb.txt index 814e2f56a6a..62db6758d1c 100644 --- a/Documentation/fb/vesafb.txt +++ b/Documentation/fb/vesafb.txt @@ -144,7 +144,21 @@ vgapal Use the standard vga registers for palette changes. This is the default. pmipal Use the protected mode interface for palette changes. -mtrr setup memory type range registers for the vesafb framebuffer. +mtrr:n setup memory type range registers for the vesafb framebuffer + where n: + 0 - disabled (equivalent to nomtrr) + 1 - uncachable + 2 - write-back + 3 - write-combining (default) + 4 - write-through + + If you see the following in dmesg, choose the type that matches the + old one. In this example, use "mtrr:2". +... +mtrr: type mismatch for e0000000,8000000 old: write-back new: write-combining +... + +nomtrr disable mtrr vremap:n remap 'n' MiB of video RAM. If 0 or not specified, remap memory diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 26414bc87c6..8b1430b4665 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -43,6 +43,14 @@ Who: Randy Dunlap <rddunlap@osdl.org> --------------------------- +What: RAW driver (CONFIG_RAW_DRIVER) +When: December 2005 +Why: declared obsolete since kernel 2.6.3 + O_DIRECT can be used instead +Who: Adrian Bunk <bunk@stusta.de> + +--------------------------- + What: register_ioctl32_conversion() / unregister_ioctl32_conversion() When: April 2005 Why: Replaced by ->compat_ioctl in file_operations and other method @@ -66,6 +74,14 @@ Who: Paul E. McKenney <paulmck@us.ibm.com> --------------------------- +What: remove verify_area() +When: July 2006 +Files: Various uaccess.h headers. +Why: Deprecated and redundant. access_ok() should be used instead. +Who: Jesper Juhl <juhl-lkml@dif.dk> + +--------------------------- + What: IEEE1394 Audio and Music Data Transmission Protocol driver, Connection Management Procedures driver When: November 2005 @@ -86,6 +102,16 @@ Who: Jody McIntyre <scjody@steamballoon.com> --------------------------- +What: register_serial/unregister_serial +When: September 2005 +Why: This interface does not allow serial ports to be registered against + a struct device, and as such does not allow correct power management + of such ports. 8250-based ports should use serial8250_register_port + and serial8250_unregister_port, or platform devices instead. +Who: Russell King <rmk@arm.linux.org.uk> + +--------------------------- + What: i2c sysfs name change: in1_ref, vid deprecated in favour of cpu0_vid When: November 2005 Files: drivers/i2c/chips/adm1025.c, drivers/i2c/chips/adm1026.c @@ -93,3 +119,19 @@ Why: Match the other drivers' name for the same function, duplicate names will be available until removal of old names. Who: Grant Coady <gcoady@gmail.com> +--------------------------- + +What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl]) +When: November 2005 +Files: drivers/pcmcia/: pcmcia_ioctl.c +Why: With the 16-bit PCMCIA subsystem now behaving (almost) like a + normal hotpluggable bus, and with it using the default kernel + infrastructure (hotplug, driver core, sysfs) keeping the PCMCIA + control ioctl needed by cardmgr and cardctl from pcmcia-cs is + unnecessary, and makes further cleanups and integration of the + PCMCIA subsystem into the Linux kernel device driver model more + difficult. The features provided by cardmgr and cardctl are either + handled by the kernel itself now or are available in the new + pcmciautils package available at + http://kernel.org/pub/linux/utils/kernel/pcmcia/ +Who: Dominik Brodowski <linux@brodo.de> diff --git a/Documentation/filesystems/ext2.txt b/Documentation/filesystems/ext2.txt index b5cb9110cc6..d16334ec48b 100644 --- a/Documentation/filesystems/ext2.txt +++ b/Documentation/filesystems/ext2.txt @@ -58,6 +58,8 @@ noacl Don't support POSIX ACLs. nobh Do not attach buffer_heads to file pagecache. +xip Use execute in place (no caching) if possible + grpquota,noquota,quota,usrquota Quota options are silently ignored by ext2. diff --git a/Documentation/filesystems/inotify.txt b/Documentation/filesystems/inotify.txt new file mode 100644 index 00000000000..6d501903f68 --- /dev/null +++ b/Documentation/filesystems/inotify.txt @@ -0,0 +1,151 @@ + inotify + a powerful yet simple file change notification system + + + +Document started 15 Mar 2005 by Robert Love <rml@novell.com> + + +(i) User Interface + +Inotify is controlled by a set of three system calls and normal file I/O on a +returned file descriptor. + +First step in using inotify is to initialise an inotify instance: + + int fd = inotify_init (); + +Each instance is associated with a unique, ordered queue. + +Change events are managed by "watches". A watch is an (object,mask) pair where +the object is a file or directory and the mask is a bit mask of one or more +inotify events that the application wishes to receive. See <linux/inotify.h> +for valid events. A watch is referenced by a watch descriptor, or wd. + +Watches are added via a path to the file. + +Watches on a directory will return events on any files inside of the directory. + +Adding a watch is simple: + + int wd = inotify_add_watch (fd, path, mask); + +Where "fd" is the return value from inotify_init(), path is the path to the +object to watch, and mask is the watch mask (see <linux/inotify.h>). + +You can update an existing watch in the same manner, by passing in a new mask. + +An existing watch is removed via + + int ret = inotify_rm_watch (fd, wd); + +Events are provided in the form of an inotify_event structure that is read(2) +from a given inotify instance. The filename is of dynamic length and follows +the struct. It is of size len. The filename is padded with null bytes to +ensure proper alignment. This padding is reflected in len. + +You can slurp multiple events by passing a large buffer, for example + + size_t len = read (fd, buf, BUF_LEN); + +Where "buf" is a pointer to an array of "inotify_event" structures at least +BUF_LEN bytes in size. The above example will return as many events as are +available and fit in BUF_LEN. + +Each inotify instance fd is also select()- and poll()-able. + +You can find the size of the current event queue via the standard FIONREAD +ioctl on the fd returned by inotify_init(). + +All watches are destroyed and cleaned up on close. + + +(ii) + +Prototypes: + + int inotify_init (void); + int inotify_add_watch (int fd, const char *path, __u32 mask); + int inotify_rm_watch (int fd, __u32 mask); + + +(iii) Internal Kernel Implementation + +Each inotify instance is associated with an inotify_device structure. + +Each watch is associated with an inotify_watch structure. Watches are chained +off of each associated device and each associated inode. + +See fs/inotify.c for the locking and lifetime rules. + + +(iv) Rationale + +Q: What is the design decision behind not tying the watch to the open fd of + the watched object? + +A: Watches are associated with an open inotify device, not an open file. + This solves the primary problem with dnotify: keeping the file open pins + the file and thus, worse, pins the mount. Dnotify is therefore infeasible + for use on a desktop system with removable media as the media cannot be + unmounted. Watching a file should not require that it be open. + +Q: What is the design decision behind using an-fd-per-instance as opposed to + an fd-per-watch? + +A: An fd-per-watch quickly consumes more file descriptors than are allowed, + more fd's than are feasible to manage, and more fd's than are optimally + select()-able. Yes, root can bump the per-process fd limit and yes, users + can use epoll, but requiring both is a silly and extraneous requirement. + A watch consumes less memory than an open file, separating the number + spaces is thus sensible. The current design is what user-space developers + want: Users initialize inotify, once, and add n watches, requiring but one + fd and no twiddling with fd limits. Initializing an inotify instance two + thousand times is silly. If we can implement user-space's preferences + cleanly--and we can, the idr layer makes stuff like this trivial--then we + should. + + There are other good arguments. With a single fd, there is a single + item to block on, which is mapped to a single queue of events. The single + fd returns all watch events and also any potential out-of-band data. If + every fd was a separate watch, + + - There would be no way to get event ordering. Events on file foo and + file bar would pop poll() on both fd's, but there would be no way to tell + which happened first. A single queue trivially gives you ordering. Such + ordering is crucial to existing applications such as Beagle. Imagine + "mv a b ; mv b a" events without ordering. + + - We'd have to maintain n fd's and n internal queues with state, + versus just one. It is a lot messier in the kernel. A single, linear + queue is the data structure that makes sense. + + - User-space developers prefer the current API. The Beagle guys, for + example, love it. Trust me, I asked. It is not a surprise: Who'd want + to manage and block on 1000 fd's via select? + + - No way to get out of band data. + + - 1024 is still too low. ;-) + + When you talk about designing a file change notification system that + scales to 1000s of directories, juggling 1000s of fd's just does not seem + the right interface. It is too heavy. + + Additionally, it _is_ possible to more than one instance and + juggle more than one queue and thus more than one associated fd. There + need not be a one-fd-per-process mapping; it is one-fd-per-queue and a + process can easily want more than one queue. + +Q: Why the system call approach? + +A: The poor user-space interface is the second biggest problem with dnotify. + Signals are a terrible, terrible interface for file notification. Or for + anything, for that matter. The ideal solution, from all perspectives, is a + file descriptor-based one that allows basic file I/O and poll/select. + Obtaining the fd and managing the watches could have been done either via a + device file or a family of new system calls. We decided to implement a + family of system calls because that is the preffered approach for new kernel + interfaces. The only real difference was whether we wanted to use open(2) + and ioctl(2) or a couple of new system calls. System calls beat ioctls. + diff --git a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.txt index f89b440fad1..eef4aca0c75 100644 --- a/Documentation/filesystems/ntfs.txt +++ b/Documentation/filesystems/ntfs.txt @@ -21,7 +21,7 @@ Overview ======== Linux-NTFS comes with a number of user-space programs known as ntfsprogs. -These include mkntfs, a full-featured ntfs file system format utility, +These include mkntfs, a full-featured ntfs filesystem format utility, ntfsundelete used for recovering files that were unintentionally deleted from an NTFS volume and ntfsresize which is used to resize an NTFS partition. See the web site for more information. @@ -149,7 +149,14 @@ case_sensitive=<BOOL> If case_sensitive is specified, treat all file names as name, if it exists. If case_sensitive, you will need to provide the correct case of the short file name. -errors=opt What to do when critical file system errors are found. +disable_sparse=<BOOL> If disable_sparse is specified, creation of sparse + regions, i.e. holes, inside files is disabled for the + volume (for the duration of this mount only). By + default, creation of sparse regions is enabled, which + is consistent with the behaviour of traditional Unix + filesystems. + +errors=opt What to do when critical filesystem errors are found. Following values can be used for "opt": continue: DEFAULT, try to clean-up as much as possible, e.g. marking a corrupt inode as @@ -432,6 +439,24 @@ ChangeLog Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog. +2.1.23: + - Stamp the user space journal, aka transaction log, aka $UsnJrnl, if + it is present and active thus telling Windows and applications using + the transaction log that changes can have happened on the volume + which are not recorded in $UsnJrnl. + - Detect the case when Windows has been hibernated (suspended to disk) + and if this is the case do not allow (re)mounting read-write to + prevent data corruption when you boot back into the suspended + Windows session. + - Implement extension of resident files using the normal file write + code paths, i.e. most very small files can be extended to be a little + bit bigger but not by much. + - Add new mount option "disable_sparse". (See list of mount options + above for details.) + - Improve handling of ntfs volumes with errors and strange boot sectors + in particular. + - Fix various bugs including a nasty deadlock that appeared in recent + kernels (around 2.6.11-2.6.12 timeframe). 2.1.22: - Improve handling of ntfs volumes with errors. - Fix various bugs and race conditions. diff --git a/Documentation/filesystems/xip.txt b/Documentation/filesystems/xip.txt new file mode 100644 index 00000000000..6c0cef10eb4 --- /dev/null +++ b/Documentation/filesystems/xip.txt @@ -0,0 +1,67 @@ +Execute-in-place for file mappings +---------------------------------- + +Motivation +---------- +File mappings are performed by mapping page cache pages to userspace. In +addition, read&write type file operations also transfer data from/to the page +cache. + +For memory backed storage devices that use the block device interface, the page +cache pages are in fact copies of the original storage. Various approaches +exist to work around the need for an extra copy. The ramdisk driver for example +does read the data into the page cache, keeps a reference, and discards the +original data behind later on. + +Execute-in-place solves this issue the other way around: instead of keeping +data in the page cache, the need to have a page cache copy is eliminated +completely. With execute-in-place, read&write type operations are performed +directly from/to the memory backed storage device. For file mappings, the +storage device itself is mapped directly into userspace. + +This implementation was initialy written for shared memory segments between +different virtual machines on s390 hardware to allow multiple machines to +share the same binaries and libraries. + +Implementation +-------------- +Execute-in-place is implemented in three steps: block device operation, +address space operation, and file operations. + +A block device operation named direct_access is used to retrieve a +reference (pointer) to a block on-disk. The reference is supposed to be +cpu-addressable, physical address and remain valid until the release operation +is performed. A struct block_device reference is used to address the device, +and a sector_t argument is used to identify the individual block. As an +alternative, memory technology devices can be used for this. + +The block device operation is optional, these block devices support it as of +today: +- dcssblk: s390 dcss block device driver + +An address space operation named get_xip_page is used to retrieve reference +to a struct page. To address the target page, a reference to an address_space, +and a sector number is provided. A 3rd argument indicates whether the +function should allocate blocks if needed. + +This address space operation is mutually exclusive with readpage&writepage that +do page cache read/write operations. +The following filesystems support it as of today: +- ext2: the second extended filesystem, see Documentation/filesystems/ext2.txt + +A set of file operations that do utilize get_xip_page can be found in +mm/filemap_xip.c . The following file operation implementations are provided: +- aio_read/aio_write +- readv/writev +- sendfile + +The generic file operations do_sync_read/do_sync_write can be used to implement +classic synchronous IO calls. + +Shortcomings +------------ +This implementation is limited to storage devices that are cpu addressable at +all times (no highmem or such). It works well on rom/ram, but enhancements are +needed to make it work with flash in read+write mode. +Putting the Linux kernel and/or its modules on a xip filesystem does not mean +they are not copied. diff --git a/Documentation/i2c/chips/adm1021 b/Documentation/hwmon/adm1021 index 03d02bfb3df..03d02bfb3df 100644 --- a/Documentation/i2c/chips/adm1021 +++ b/Documentation/hwmon/adm1021 diff --git a/Documentation/i2c/chips/adm1025 b/Documentation/hwmon/adm1025 index 39d2b781b5d..39d2b781b5d 100644 --- a/Documentation/i2c/chips/adm1025 +++ b/Documentation/hwmon/adm1025 diff --git a/Documentation/i2c/chips/adm1026 b/Documentation/hwmon/adm1026 index 473c689d792..473c689d792 100644 --- a/Documentation/i2c/chips/adm1026 +++ b/Documentation/hwmon/adm1026 diff --git a/Documentation/i2c/chips/adm1031 b/Documentation/hwmon/adm1031 index 130a38382b9..130a38382b9 100644 --- a/Documentation/i2c/chips/adm1031 +++ b/Documentation/hwmon/adm1031 diff --git a/Documentation/i2c/chips/adm9240 b/Documentation/hwmon/adm9240 index 35f618f3289..35f618f3289 100644 --- a/Documentation/i2c/chips/adm9240 +++ b/Documentation/hwmon/adm9240 diff --git a/Documentation/i2c/chips/asb100 b/Documentation/hwmon/asb100 index ab7365e139b..ab7365e139b 100644 --- a/Documentation/i2c/chips/asb100 +++ b/Documentation/hwmon/asb100 diff --git a/Documentation/i2c/chips/ds1621 b/Documentation/hwmon/ds1621 index 1fee6f1e6bc..1fee6f1e6bc 100644 --- a/Documentation/i2c/chips/ds1621 +++ b/Documentation/hwmon/ds1621 diff --git a/Documentation/i2c/chips/fscher b/Documentation/hwmon/fscher index 64031659aff..64031659aff 100644 --- a/Documentation/i2c/chips/fscher +++ b/Documentation/hwmon/fscher diff --git a/Documentation/i2c/chips/gl518sm b/Documentation/hwmon/gl518sm index ce0881883bc..ce0881883bc 100644 --- a/Documentation/i2c/chips/gl518sm +++ b/Documentation/hwmon/gl518sm diff --git a/Documentation/i2c/chips/it87 b/Documentation/hwmon/it87 index 0d0195040d8..0d0195040d8 100644 --- a/Documentation/i2c/chips/it87 +++ b/Documentation/hwmon/it87 diff --git a/Documentation/i2c/chips/lm63 b/Documentation/hwmon/lm63 index 31660bf9797..31660bf9797 100644 --- a/Documentation/i2c/chips/lm63 +++ b/Documentation/hwmon/lm63 diff --git a/Documentation/i2c/chips/lm75 b/Documentation/hwmon/lm75 index 8e6356fe05d..8e6356fe05d 100644 --- a/Documentation/i2c/chips/lm75 +++ b/Documentation/hwmon/lm75 diff --git a/Documentation/i2c/chips/lm77 b/Documentation/hwmon/lm77 index 57c3a46d637..57c3a46d637 100644 --- a/Documentation/i2c/chips/lm77 +++ b/Documentation/hwmon/lm77 diff --git a/Documentation/i2c/chips/lm78 b/Documentation/hwmon/lm78 index 357086ed7f6..357086ed7f6 100644 --- a/Documentation/i2c/chips/lm78 +++ b/Documentation/hwmon/lm78 diff --git a/Documentation/i2c/chips/lm80 b/Documentation/hwmon/lm80 index cb5b407ba3e..cb5b407ba3e 100644 --- a/Documentation/i2c/chips/lm80 +++ b/Documentation/hwmon/lm80 diff --git a/Documentation/i2c/chips/lm83 b/Documentation/hwmon/lm83 index 061d9ed8ff4..061d9ed8ff4 100644 --- a/Documentation/i2c/chips/lm83 +++ b/Documentation/hwmon/lm83 diff --git a/Documentation/i2c/chips/lm85 b/Documentation/hwmon/lm85 index 9549237530c..9549237530c 100644 --- a/Documentation/i2c/chips/lm85 +++ b/Documentation/hwmon/lm85 diff --git a/Documentation/i2c/chips/lm87 b/Documentation/hwmon/lm87 index c952c57f0e1..c952c57f0e1 100644 --- a/Documentation/i2c/chips/lm87 +++ b/Documentation/hwmon/lm87 diff --git a/Documentation/i2c/chips/lm90 b/Documentation/hwmon/lm90 index 2c4cf39471f..2c4cf39471f 100644 --- a/Documentation/i2c/chips/lm90 +++ b/Documentation/hwmon/lm90 diff --git a/Documentation/i2c/chips/lm92 b/Documentation/hwmon/lm92 index 7705bfaa070..7705bfaa070 100644 --- a/Documentation/i2c/chips/lm92 +++ b/Documentation/hwmon/lm92 diff --git a/Documentation/i2c/chips/max1619 b/Documentation/hwmon/max1619 index d6f8d9cd7d7..d6f8d9cd7d7 100644 --- a/Documentation/i2c/chips/max1619 +++ b/Documentation/hwmon/max1619 diff --git a/Documentation/i2c/chips/pc87360 b/Documentation/hwmon/pc87360 index 89a8fcfa78d..89a8fcfa78d 100644 --- a/Documentation/i2c/chips/pc87360 +++ b/Documentation/hwmon/pc87360 diff --git a/Documentation/i2c/chips/sis5595 b/Documentation/hwmon/sis5595 index b7ae36b8cdf..b7ae36b8cdf 100644 --- a/Documentation/i2c/chips/sis5595 +++ b/Documentation/hwmon/sis5595 diff --git a/Documentation/i2c/chips/smsc47b397 b/Documentation/hwmon/smsc47b397 index da9d80c9643..da9d80c9643 100644 --- a/Documentation/i2c/chips/smsc47b397 +++ b/Documentation/hwmon/smsc47b397 diff --git a/Documentation/i2c/chips/smsc47m1 b/Documentation/hwmon/smsc47m1 index 34e6478c142..34e6478c142 100644 --- a/Documentation/i2c/chips/smsc47m1 +++ b/Documentation/hwmon/smsc47m1 diff --git a/Documentation/i2c/sysfs-interface b/Documentation/hwmon/sysfs-interface index 346400519d0..346400519d0 100644 --- a/Documentation/i2c/sysfs-interface +++ b/Documentation/hwmon/sysfs-interface diff --git a/Documentation/i2c/userspace-tools b/Documentation/hwmon/userspace-tools index 2622aac6542..2622aac6542 100644 --- a/Documentation/i2c/userspace-tools +++ b/Documentation/hwmon/userspace-tools diff --git a/Documentation/i2c/chips/via686a b/Documentation/hwmon/via686a index b82014cb7c5..b82014cb7c5 100644 --- a/Documentation/i2c/chips/via686a +++ b/Documentation/hwmon/via686a diff --git a/Documentation/i2c/chips/w83627hf b/Documentation/hwmon/w83627hf index 78f37c2d602..78f37c2d602 100644 --- a/Documentation/i2c/chips/w83627hf +++ b/Documentation/hwmon/w83627hf diff --git a/Documentation/i2c/chips/w83781d b/Documentation/hwmon/w83781d index e5459333ba6..e5459333ba6 100644 --- a/Documentation/i2c/chips/w83781d +++ b/Documentation/hwmon/w83781d diff --git a/Documentation/i2c/chips/w83l785ts b/Documentation/hwmon/w83l785ts index 1841cedc25b..1841cedc25b 100644 --- a/Documentation/i2c/chips/w83l785ts +++ b/Documentation/hwmon/w83l785ts diff --git a/Documentation/i2c/chips/max6875 b/Documentation/i2c/chips/max6875 index b4fb49b4181..b02002898a0 100644 --- a/Documentation/i2c/chips/max6875 +++ b/Documentation/i2c/chips/max6875 @@ -2,10 +2,10 @@ Kernel driver max6875 ===================== Supported chips: - * Maxim max6874, max6875 - Prefixes: 'max6875' + * Maxim MAX6874, MAX6875 + Prefix: 'max6875' Addresses scanned: 0x50, 0x52 - Datasheets: + Datasheet: http://pdfserv.maxim-ic.com/en/ds/MAX6874-MAX6875.pdf Author: Ben Gardner <bgardner@wabtec.com> @@ -23,14 +23,26 @@ Module Parameters Description ----------- -The MAXIM max6875 is a EEPROM-programmable power-supply sequencer/supervisor. +The Maxim MAX6875 is an EEPROM-programmable power-supply sequencer/supervisor. It provides timed outputs that can be used as a watchdog, if properly wired. It also provides 512 bytes of user EEPROM. -At reset, the max6875 reads the configuration eeprom into its configuration +At reset, the MAX6875 reads the configuration EEPROM into its configuration registers. The chip then begins to operate according to the values in the registers. +The Maxim MAX6874 is a similar, mostly compatible device, with more intputs +and outputs: + + vin gpi vout +MAX6874 6 4 8 +MAX6875 4 3 5 + +MAX6874 chips can have four different addresses (as opposed to only two for +the MAX6875). The additional addresses (0x54 and 0x56) are not probed by +this driver by default, but the probe module parameter can be used if +needed. + See the datasheet for details on how to program the EEPROM. diff --git a/Documentation/i2c/dev-interface b/Documentation/i2c/dev-interface index 09d6cda2a1f..b849ad63658 100644 --- a/Documentation/i2c/dev-interface +++ b/Documentation/i2c/dev-interface @@ -14,9 +14,12 @@ C example ========= So let's say you want to access an i2c adapter from a C program. The -first thing to do is `#include <linux/i2c.h>" and "#include <linux/i2c-dev.h>. -Yes, I know, you should never include kernel header files, but until glibc -knows about i2c, there is not much choice. +first thing to do is "#include <linux/i2c-dev.h>". Please note that +there are two files named "i2c-dev.h" out there, one is distributed +with the Linux kernel and is meant to be included from kernel +driver code, the other one is distributed with lm_sensors and is +meant to be included from user-space programs. You obviously want +the second one here. Now, you have to decide which adapter you want to access. You should inspect /sys/class/i2c-dev/ to decide this. Adapter numbers are assigned @@ -78,7 +81,7 @@ Full interface description ========================== The following IOCTLs are defined and fully supported -(see also i2c-dev.h and i2c.h): +(see also i2c-dev.h): ioctl(file,I2C_SLAVE,long addr) Change slave address. The address is passed in the 7 lower bits of the @@ -97,10 +100,10 @@ ioctl(file,I2C_PEC,long select) ioctl(file,I2C_FUNCS,unsigned long *funcs) Gets the adapter functionality and puts it in *funcs. -ioctl(file,I2C_RDWR,struct i2c_ioctl_rdwr_data *msgset) +ioctl(file,I2C_RDWR,struct i2c_rdwr_ioctl_data *msgset) Do combined read/write transaction without stop in between. - The argument is a pointer to a struct i2c_ioctl_rdwr_data { + The argument is a pointer to a struct i2c_rdwr_ioctl_data { struct i2c_msg *msgs; /* ptr to array of simple messages */ int nmsgs; /* number of messages to exchange */ diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients index f482dae81de..91664be91ff 100644 --- a/Documentation/i2c/writing-clients +++ b/Documentation/i2c/writing-clients @@ -27,7 +27,6 @@ address. static struct i2c_driver foo_driver = { .owner = THIS_MODULE, .name = "Foo version 2.3 driver", - .id = I2C_DRIVERID_FOO, /* from i2c-id.h, optional */ .flags = I2C_DF_NOTIFY, .attach_adapter = &foo_attach_adapter, .detach_client = &foo_detach_client, @@ -37,12 +36,6 @@ static struct i2c_driver foo_driver = { The name can be chosen freely, and may be upto 40 characters long. Please use something descriptive here. -If used, the id should be a unique ID. The range 0xf000 to 0xffff is -reserved for local use, and you can use one of those until you start -distributing the driver, at which time you should contact the i2c authors -to get your own ID(s). Note that most of the time you don't need an ID -at all so you can just omit it. - Don't worry about the flags field; just put I2C_DF_NOTIFY into it. This means that your driver will be notified when new adapters are found. This is almost always what you want. diff --git a/Documentation/infiniband/core_locking.txt b/Documentation/infiniband/core_locking.txt new file mode 100644 index 00000000000..e1678542279 --- /dev/null +++ b/Documentation/infiniband/core_locking.txt @@ -0,0 +1,114 @@ +INFINIBAND MIDLAYER LOCKING + + This guide is an attempt to make explicit the locking assumptions + made by the InfiniBand midlayer. It describes the requirements on + both low-level drivers that sit below the midlayer and upper level + protocols that use the midlayer. + +Sleeping and interrupt context + + With the following exceptions, a low-level driver implementation of + all of the methods in struct ib_device may sleep. The exceptions + are any methods from the list: + + create_ah + modify_ah + query_ah + destroy_ah + bind_mw + post_send + post_recv + poll_cq + req_notify_cq + map_phys_fmr + + which may not sleep and must be callable from any context. + + The corresponding functions exported to upper level protocol + consumers: + + ib_create_ah + ib_modify_ah + ib_query_ah + ib_destroy_ah + ib_bind_mw + ib_post_send + ib_post_recv + ib_req_notify_cq + ib_map_phys_fmr + + are therefore safe to call from any context. + + In addition, the function + + ib_dispatch_event + + used by low-level drivers to dispatch asynchronous events through + the midlayer is also safe to call from any context. + +Reentrancy + + All of the methods in struct ib_device exported by a low-level + driver must be fully reentrant. The low-level driver is required to + perform all synchronization necessary to maintain consistency, even + if multiple function calls using the same object are run + simultaneously. + + The IB midlayer does not perform any serialization of function calls. + + Because low-level drivers are reentrant, upper level protocol + consumers are not required to perform any serialization. However, + some serialization may be required to get sensible results. For + example, a consumer may safely call ib_poll_cq() on multiple CPUs + simultaneously. However, the ordering of the work completion + information between different calls of ib_poll_cq() is not defined. + +Callbacks + + A low-level driver must not perform a callback directly from the + same callchain as an ib_device method call. For example, it is not + allowed for a low-level driver to call a consumer's completion event + handler directly from its post_send method. Instead, the low-level + driver should defer this callback by, for example, scheduling a + tasklet to perform the callback. + + The low-level driver is responsible for ensuring that multiple + completion event handlers for the same CQ are not called + simultaneously. The driver must guarantee that only one CQ event + handler for a given CQ is running at a time. In other words, the + following situation is not allowed: + + CPU1 CPU2 + + low-level driver -> + consumer CQ event callback: + /* ... */ + ib_req_notify_cq(cq, ...); + low-level driver -> + /* ... */ consumer CQ event callback: + /* ... */ + return from CQ event handler + + The context in which completion event and asynchronous event + callbacks run is not defined. Depending on the low-level driver, it + may be process context, softirq context, or interrupt context. + Upper level protocol consumers may not sleep in a callback. + +Hot-plug + + A low-level driver announces that a device is ready for use by + consumers when it calls ib_register_device(), all initialization + must be complete before this call. The device must remain usable + until the driver's call to ib_unregister_device() has returned. + + A low-level driver must call ib_register_device() and + ib_unregister_device() from process context. It must not hold any + semaphores that could cause deadlock if a consumer calls back into + the driver across these calls. + + An upper level protocol consumer may begin using an IB device as + soon as the add method of its struct ib_client is called for that + device. A consumer must finish all cleanup and free all resources + relating to a device before returning from the remove method. + + A consumer is permitted to sleep in its add and remove methods. diff --git a/Documentation/infiniband/user_mad.txt b/Documentation/infiniband/user_mad.txt index cae0c83f1ee..750fe5e80eb 100644 --- a/Documentation/infiniband/user_mad.txt +++ b/Documentation/infiniband/user_mad.txt @@ -28,13 +28,37 @@ Creating MAD agents Receiving MADs - MADs are received using read(). The buffer passed to read() must be - large enough to hold at least one struct ib_user_mad. For example: - - struct ib_user_mad mad; - ret = read(fd, &mad, sizeof mad); - if (ret != sizeof mad) + MADs are received using read(). The receive side now supports + RMPP. The buffer passed to read() must be at least one + struct ib_user_mad + 256 bytes. For example: + + If the buffer passed is not large enough to hold the received + MAD (RMPP), the errno is set to ENOSPC and the length of the + buffer needed is set in mad.length. + + Example for normal MAD (non RMPP) reads: + struct ib_user_mad *mad; + mad = malloc(sizeof *mad + 256); + ret = read(fd, mad, sizeof *mad + 256); + if (ret != sizeof mad + 256) { + perror("read"); + free(mad); + } + + Example for RMPP reads: + struct ib_user_mad *mad; + mad = malloc(sizeof *mad + 256); + ret = read(fd, mad, sizeof *mad + 256); + if (ret == -ENOSPC)) { + length = mad.length; + free(mad); + mad = malloc(sizeof *mad + length); + ret = read(fd, mad, sizeof *mad + length); + } + if (ret < 0) { perror("read"); + free(mad); + } In addition to the actual MAD contents, the other struct ib_user_mad fields will be filled in with information on the received MAD. For @@ -50,18 +74,21 @@ Sending MADs MADs are sent using write(). The agent ID for sending should be filled into the id field of the MAD, the destination LID should be - filled into the lid field, and so on. For example: + filled into the lid field, and so on. The send side does support + RMPP so arbitrary length MAD can be sent. For example: + + struct ib_user_mad *mad; - struct ib_user_mad mad; + mad = malloc(sizeof *mad + mad_length); - /* fill in mad.data */ + /* fill in mad->data */ - mad.id = my_agent; /* req.id from agent registration */ - mad.lid = my_dest; /* in network byte order... */ + mad->hdr.id = my_agent; /* req.id from agent registration */ + mad->hdr.lid = my_dest; /* in network byte order... */ /* etc. */ - ret = write(fd, &mad, sizeof mad); - if (ret != sizeof mad) + ret = write(fd, &mad, sizeof *mad + mad_length); + if (ret != sizeof *mad + mad_length) perror("write"); Setting IsSM Capability Bit diff --git a/Documentation/infiniband/user_verbs.txt b/Documentation/infiniband/user_verbs.txt new file mode 100644 index 00000000000..f847501e50b --- /dev/null +++ b/Documentation/infiniband/user_verbs.txt @@ -0,0 +1,69 @@ +USERSPACE VERBS ACCESS + + The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS, + enables direct userspace access to IB hardware via "verbs," as + described in chapter 11 of the InfiniBand Architecture Specification. + + To use the verbs, the libibverbs library, available from + <http://openib.org/>, is required. libibverbs contains a + device-independent API for using the ib_uverbs interface. + libibverbs also requires appropriate device-dependent kernel and + userspace driver for your InfiniBand hardware. For example, to use + a Mellanox HCA, you will need the ib_mthca kernel module and the + libmthca userspace driver be installed. + +User-kernel communication + + Userspace communicates with the kernel for slow path, resource + management operations via the /dev/infiniband/uverbsN character + devices. Fast path operations are typically performed by writing + directly to hardware registers mmap()ed into userspace, with no + system call or context switch into the kernel. + + Commands are sent to the kernel via write()s on these device files. + The ABI is defined in drivers/infiniband/include/ib_user_verbs.h. + The structs for commands that require a response from the kernel + contain a 64-bit field used to pass a pointer to an output buffer. + Status is returned to userspace as the return value of the write() + system call. + +Resource management + + Since creation and destruction of all IB resources is done by + commands passed through a file descriptor, the kernel can keep track + of which resources are attached to a given userspace context. The + ib_uverbs module maintains idr tables that are used to translate + between kernel pointers and opaque userspace handles, so that kernel + pointers are never exposed to userspace and userspace cannot trick + the kernel into following a bogus pointer. + + This also allows the kernel to clean up when a process exits and + prevent one process from touching another process's resources. + +Memory pinning + + Direct userspace I/O requires that memory regions that are potential + I/O targets be kept resident at the same physical address. The + ib_uverbs module manages pinning and unpinning memory regions via + get_user_pages() and put_page() calls. It also accounts for the + amount of memory pinned in the process's locked_vm, and checks that + unprivileged processes do not exceed their RLIMIT_MEMLOCK limit. + + Pages that are pinned multiple times are counted each time they are + pinned, so the value of locked_vm may be an overestimate of the + number of pages pinned by a process. + +/dev files + + To create the appropriate character device files automatically with + udev, a rule like + + KERNEL="uverbs*", NAME="infiniband/%k" + + can be used. This will create device nodes named + + /dev/infiniband/uverbs0 + + and so on. Since the InfiniBand userspace verbs should be safe for + use by non-privileged processes, it may be useful to add an + appropriate MODE or GROUP to the udev rule. diff --git a/Documentation/kdump/gdbmacros.txt b/Documentation/kdump/gdbmacros.txt new file mode 100644 index 00000000000..bc1b9eb92ae --- /dev/null +++ b/Documentation/kdump/gdbmacros.txt @@ -0,0 +1,179 @@ +# +# This file contains a few gdb macros (user defined commands) to extract +# useful information from kernel crashdump (kdump) like stack traces of +# all the processes or a particular process and trapinfo. +# +# These macros can be used by copying this file in .gdbinit (put in home +# directory or current directory) or by invoking gdb command with +# --command=<command-file-name> option +# +# Credits: +# Alexander Nyberg <alexn@telia.com> +# V Srivatsa <vatsa@in.ibm.com> +# Maneesh Soni <maneesh@in.ibm.com> +# + +define bttnobp + set $tasks_off=((size_t)&((struct task_struct *)0)->tasks) + set $pid_off=((size_t)&((struct task_struct *)0)->pids[1].pid_list.next) + set $init_t=&init_task + set $next_t=(((char *)($init_t->tasks).next) - $tasks_off) + while ($next_t != $init_t) + set $next_t=(struct task_struct *)$next_t + printf "\npid %d; comm %s:\n", $next_t.pid, $next_t.comm + printf "===================\n" + set var $stackp = $next_t.thread.esp + set var $stack_top = ($stackp & ~4095) + 4096 + + while ($stackp < $stack_top) + if (*($stackp) > _stext && *($stackp) < _sinittext) + info symbol *($stackp) + end + set $stackp += 4 + end + set $next_th=(((char *)$next_t->pids[1].pid_list.next) - $pid_off) + while ($next_th != $next_t) + set $next_th=(struct task_struct *)$next_th + printf "\npid %d; comm %s:\n", $next_t.pid, $next_t.comm + printf "===================\n" + set var $stackp = $next_t.thread.esp + set var $stack_top = ($stackp & ~4095) + 4096 + + while ($stackp < $stack_top) + if (*($stackp) > _stext && *($stackp) < _sinittext) + info symbol *($stackp) + end + set $stackp += 4 + end + set $next_th=(((char *)$next_th->pids[1].pid_list.next) - $pid_off) + end + set $next_t=(char *)($next_t->tasks.next) - $tasks_off + end +end +document bttnobp + dump all thread stack traces on a kernel compiled with !CONFIG_FRAME_POINTER +end + +define btt + set $tasks_off=((size_t)&((struct task_struct *)0)->tasks) + set $pid_off=((size_t)&((struct task_struct *)0)->pids[1].pid_list.next) + set $init_t=&init_task + set $next_t=(((char *)($init_t->tasks).next) - $tasks_off) + while ($next_t != $init_t) + set $next_t=(struct task_struct *)$next_t + printf "\npid %d; comm %s:\n", $next_t.pid, $next_t.comm + printf "===================\n" + set var $stackp = $next_t.thread.esp + set var $stack_top = ($stackp & ~4095) + 4096 + set var $stack_bot = ($stackp & ~4095) + + set $stackp = *($stackp) + while (($stackp < $stack_top) && ($stackp > $stack_bot)) + set var $addr = *($stackp + 4) + info symbol $addr + set $stackp = *($stackp) + end + + set $next_th=(((char *)$next_t->pids[1].pid_list.next) - $pid_off) + while ($next_th != $next_t) + set $next_th=(struct task_struct *)$next_th + printf "\npid %d; comm %s:\n", $next_t.pid, $next_t.comm + printf "===================\n" + set var $stackp = $next_t.thread.esp + set var $stack_top = ($stackp & ~4095) + 4096 + set var $stack_bot = ($stackp & ~4095) + + set $stackp = *($stackp) + while (($stackp < $stack_top) && ($stackp > $stack_bot)) + set var $addr = *($stackp + 4) + info symbol $addr + set $stackp = *($stackp) + end + set $next_th=(((char *)$next_th->pids[1].pid_list.next) - $pid_off) + end + set $next_t=(char *)($next_t->tasks.next) - $tasks_off + end +end +document btt + dump all thread stack traces on a kernel compiled with CONFIG_FRAME_POINTER +end + +define btpid + set var $pid = $arg0 + set $tasks_off=((size_t)&((struct task_struct *)0)->tasks) + set $pid_off=((size_t)&((struct task_struct *)0)->pids[1].pid_list.next) + set $init_t=&init_task + set $next_t=(((char *)($init_t->tasks).next) - $tasks_off) + set var $pid_task = 0 + + while ($next_t != $init_t) + set $next_t=(struct task_struct *)$next_t + + if ($next_t.pid == $pid) + set $pid_task = $next_t + end + + set $next_th=(((char *)$next_t->pids[1].pid_list.next) - $pid_off) + while ($next_th != $next_t) + set $next_th=(struct task_struct *)$next_th + if ($next_th.pid == $pid) + set $pid_task = $next_th + end + set $next_th=(((char *)$next_th->pids[1].pid_list.next) - $pid_off) + end + set $next_t=(char *)($next_t->tasks.next) - $tasks_off + end + + printf "\npid %d; comm %s:\n", $pid_task.pid, $pid_task.comm + printf "===================\n" + set var $stackp = $pid_task.thread.esp + set var $stack_top = ($stackp & ~4095) + 4096 + set var $stack_bot = ($stackp & ~4095) + + set $stackp = *($stackp) + while (($stackp < $stack_top) && ($stackp > $stack_bot)) + set var $addr = *($stackp + 4) + info symbol $addr + set $stackp = *($stackp) + end +end +document btpid + backtrace of pid +end + + +define trapinfo + set var $pid = $arg0 + set $tasks_off=((size_t)&((struct task_struct *)0)->tasks) + set $pid_off=((size_t)&((struct task_struct *)0)->pids[1].pid_list.next) + set $init_t=&init_task + set $next_t=(((char *)($init_t->tasks).next) - $tasks_off) + set var $pid_task = 0 + + while ($next_t != $init_t) + set $next_t=(struct task_struct *)$next_t + + if ($next_t.pid == $pid) + set $pid_task = $next_t + end + + set $next_th=(((char *)$next_t->pids[1].pid_list.next) - $pid_off) + while ($next_th != $next_t) + set $next_th=(struct task_struct *)$next_th + if ($next_th.pid == $pid) + set $pid_task = $next_th + end + set $next_th=(((char *)$next_th->pids[1].pid_list.next) - $pid_off) + end + set $next_t=(char *)($next_t->tasks.next) - $tasks_off + end + + printf "Trapno %ld, cr2 0x%lx, error_code %ld\n", $pid_task.thread.trap_no, \ + $pid_task.thread.cr2, $pid_task.thread.error_code + +end +document trapinfo + Run info threads and lookup pid of thread #1 + 'trapinfo <pid>' will tell you by which trap & possibly + addresthe kernel paniced. +end diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt new file mode 100644 index 00000000000..7ff213f4bec --- /dev/null +++ b/Documentation/kdump/kdump.txt @@ -0,0 +1,141 @@ +Documentation for kdump - the kexec-based crash dumping solution +================================================================ + +DESIGN +====== + +Kdump uses kexec to reboot to a second kernel whenever a dump needs to be taken. +This second kernel is booted with very little memory. The first kernel reserves +the section of memory that the second kernel uses. This ensures that on-going +DMA from the first kernel does not corrupt the second kernel. + +All the necessary information about Core image is encoded in ELF format and +stored in reserved area of memory before crash. Physical address of start of +ELF header is passed to new kernel through command line parameter elfcorehdr=. + +On i386, the first 640 KB of physical memory is needed to boot, irrespective +of where the kernel loads. Hence, this region is backed up by kexec just before +rebooting into the new kernel. + +In the second kernel, "old memory" can be accessed in two ways. + +- The first one is through a /dev/oldmem device interface. A capture utility + can read the device file and write out the memory in raw format. This is raw + dump of memory and analysis/capture tool should be intelligent enough to + determine where to look for the right information. ELF headers (elfcorehdr=) + can become handy here. + +- The second interface is through /proc/vmcore. This exports the dump as an ELF + format file which can be written out using any file copy command + (cp, scp, etc). Further, gdb can be used to perform limited debugging on + the dump file. This method ensures methods ensure that there is correct + ordering of the dump pages (corresponding to the first 640 KB that has been + relocated). + +SETUP +===== + +1) Download http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz + and apply http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump.patch + and after that build the source. + +2) Download and build the appropriate (latest) kexec/kdump (-mm) kernel + patchset and apply it to the vanilla kernel tree. + + Two kernels need to be built in order to get this feature working. + + A) First kernel: + a) Enable "kexec system call" feature (in Processor type and features). + CONFIG_KEXEC=y + b) This kernel's physical load address should be the default value of + 0x100000 (0x100000, 1 MB) (in Processor type and features). + CONFIG_PHYSICAL_START=0x100000 + c) Enable "sysfs file system support" (in Pseudo filesystems). + CONFIG_SYSFS=y + d) Boot into first kernel with the command line parameter "crashkernel=Y@X". + Use appropriate values for X and Y. Y denotes how much memory to reserve + for the second kernel, and X denotes at what physical address the reserved + memory section starts. For example: "crashkernel=64M@16M". + + B) Second kernel: + a) Enable "kernel crash dumps" feature (in Processor type and features). + CONFIG_CRASH_DUMP=y + b) Specify a suitable value for "Physical address where the kernel is + loaded" (in Processor type and features). Typically this value + should be same as X (See option d) above, e.g., 16 MB or 0x1000000. + CONFIG_PHYSICAL_START=0x1000000 + c) Enable "/proc/vmcore support" (Optional, in Pseudo filesystems). + CONFIG_PROC_VMCORE=y + d) Disable SMP support and build a UP kernel (Until it is fixed). + CONFIG_SMP=n + e) Enable "Local APIC support on uniprocessors". + CONFIG_X86_UP_APIC=y + f) Enable "IO-APIC support on uniprocessors" + CONFIG_X86_UP_IOAPIC=y + + Note: i) Options a) and b) depend upon "Configure standard kernel features + (for small systems)" (under General setup). + ii) Option a) also depends on CONFIG_HIGHMEM (under Processor + type and features). + iii) Both option a) and b) are under "Processor type and features". + +3) Boot into the first kernel. You are now ready to try out kexec-based crash + dumps. + +4) Load the second kernel to be booted using: + + kexec -p <second-kernel> --crash-dump --args-linux --append="root=<root-dev> + init 1 irqpoll" + + Note: i) <second-kernel> has to be a vmlinux image. bzImage will not work, + as of now. + ii) By default ELF headers are stored in ELF32 format (for i386). This + is sufficient to represent the physical memory up to 4GB. To store + headers in ELF64 format, specifiy "--elf64-core-headers" on the + kexec command line additionally. + iii) Specify "irqpoll" as command line parameter. This reduces driver + initialization failures in second kernel due to shared interrupts. + +5) System reboots into the second kernel when a panic occurs. A module can be + written to force the panic or "ALT-SysRq-c" can be used initiate a crash + dump for testing purposes. + +6) Write out the dump file using + + cp /proc/vmcore <dump-file> + + Dump memory can also be accessed as a /dev/oldmem device for a linear/raw + view. To create the device, type: + + mknod /dev/oldmem c 1 12 + + Use "dd" with suitable options for count, bs and skip to access specific + portions of the dump. + + Entire memory: dd if=/dev/oldmem of=oldmem.001 + +ANALYSIS +======== + +Limited analysis can be done using gdb on the dump file copied out of +/proc/vmcore. Use vmlinux built with -g and run + + gdb vmlinux <dump-file> + +Stack trace for the task on processor 0, register display, memory display +work fine. + +Note: gdb cannot analyse core files generated in ELF64 format for i386. + +TODO +==== + +1) Provide a kernel pages filtering mechanism so that core file size is not + insane on systems having huge memory banks. +2) Modify "crash" tool to make it recognize this dump. + +CONTACT +======= + +Vivek Goyal (vgoyal@in.ibm.com) +Maneesh Soni (maneesh@in.ibm.com) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 4924d387a65..3d5cd7a09b2 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -37,7 +37,7 @@ restrictions referred to are that the relevant option is valid if: IA-32 IA-32 aka i386 architecture is enabled. IA-64 IA-64 architecture is enabled. IOSCHED More than one I/O scheduler is enabled. - IP_PNP IP DCHP, BOOTP, or RARP is enabled. + IP_PNP IP DHCP, BOOTP, or RARP is enabled. ISAPNP ISA PnP code is enabled. ISDN Appropriate ISDN support is enabled. JOY Appropriate joystick support is enabled. @@ -159,6 +159,11 @@ running once the system is up. acpi_fake_ecdt [HW,ACPI] Workaround failure due to BIOS lacking ECDT + acpi_generic_hotkey [HW,ACPI] + Allow consolidated generic hotkey driver to + over-ride platform specific driver. + See also Documentation/acpi-hotkey.txt. + ad1816= [HW,OSS] Format: <io>,<irq>,<dma>,<dma2> See also Documentation/sound/oss/AD1816. @@ -358,6 +363,10 @@ running once the system is up. cpia_pp= [HW,PPT] Format: { parport<nr> | auto | none } + crashkernel=nn[KMG]@ss[KMG] + [KNL] Reserve a chunk of physical memory to + hold a kernel to switch to with kexec on panic. + cs4232= [HW,OSS] Format: <io>,<irq>,<dma>,<dma2>,<mpuio>,<mpuirq> @@ -447,6 +456,10 @@ running once the system is up. Format: {"as"|"cfq"|"deadline"|"noop"} See Documentation/block/as-iosched.txt and Documentation/block/deadline-iosched.txt for details. + elfcorehdr= [IA-32] + Specifies physical address of start of kernel core image + elf header. + See Documentation/kdump.txt for details. enforcing [SELINUX] Set initial enforcing status. Format: {"0" | "1"} @@ -548,6 +561,9 @@ running once the system is up. i810= [HW,DRM] + i8k.ignore_dmi [HW] Continue probing hardware even if DMI data + indicates that the driver is running on unsupported + hardware. i8k.force [HW] Activate i8k driver even if SMM BIOS signature does not match list of supported models. i8k.power_status @@ -611,6 +627,17 @@ running once the system is up. ips= [HW,SCSI] Adaptec / IBM ServeRAID controller See header of drivers/scsi/ips.c. + irqfixup [HW] + When an interrupt is not handled search all handlers + for it. Intended to get systems with badly broken + firmware running. + + irqpoll [HW] + When an interrupt is not handled search all handlers + for it. Also check all handlers each timer + interrupt. Intended to get systems with badly broken + firmware running. + isapnp= [ISAPNP] Format: <RDP>, <reset>, <pci_scan>, <verbosity> @@ -736,6 +763,9 @@ running once the system is up. maxcpus= [SMP] Maximum number of processors that an SMP kernel should make use of + max_addr=[KMG] [KNL,BOOT,ia64] All physical memory greater than or + equal to this physical address is ignored. + max_luns= [SCSI] Maximum number of LUNs to probe Should be between 1 and 2^32-1. @@ -1019,6 +1049,10 @@ running once the system is up. irqmask=0xMMMM [IA-32] Set a bit mask of IRQs allowed to be assigned automatically to PCI devices. You can make the kernel exclude IRQs of your ISA cards this way. + pirqaddr=0xAAAAA [IA-32] Specify the physical address + of the PIRQ table (normally generated + by the BIOS) if it is outside the + F0000h-100000h range. lastbus=N [IA-32] Scan all buses till bus #N. Can be useful if the kernel is unable to find your secondary buses and you want to tell it explicitly which ones they are. @@ -1104,7 +1138,7 @@ running once the system is up. See Documentation/ramdisk.txt. psmouse.proto= [HW,MOUSE] Highest PS2 mouse protocol extension to - probe for (bare|imps|exps). + probe for (bare|imps|exps|lifebook|any). psmouse.rate= [HW,MOUSE] Set desired mouse report rate, in reports per second. psmouse.resetafter= diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 36d80aeeaf2..0321ded4b9a 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -22,6 +22,7 @@ This document has the following sections: - New procfs files - Userspace system call interface - Kernel services + - Notes on accessing payload contents - Defining a key type - Request-key callback service - Key access filesystem @@ -45,27 +46,26 @@ Each key has a number of attributes: - State. - (*) Each key is issued a serial number of type key_serial_t that is unique - for the lifetime of that key. All serial numbers are positive non-zero - 32-bit integers. + (*) Each key is issued a serial number of type key_serial_t that is unique for + the lifetime of that key. All serial numbers are positive non-zero 32-bit + integers. Userspace programs can use a key's serial numbers as a way to gain access to it, subject to permission checking. (*) Each key is of a defined "type". Types must be registered inside the - kernel by a kernel service (such as a filesystem) before keys of that - type can be added or used. Userspace programs cannot define new types - directly. + kernel by a kernel service (such as a filesystem) before keys of that type + can be added or used. Userspace programs cannot define new types directly. - Key types are represented in the kernel by struct key_type. This defines - a number of operations that can be performed on a key of that type. + Key types are represented in the kernel by struct key_type. This defines a + number of operations that can be performed on a key of that type. Should a type be removed from the system, all the keys of that type will be invalidated. (*) Each key has a description. This should be a printable string. The key - type provides an operation to perform a match between the description on - a key and a criterion string. + type provides an operation to perform a match between the description on a + key and a criterion string. (*) Each key has an owner user ID, a group ID and a permissions mask. These are used to control what a process may do to a key from userspace, and @@ -74,10 +74,10 @@ Each key has a number of attributes: (*) Each key can be set to expire at a specific time by the key type's instantiation function. Keys can also be immortal. - (*) Each key can have a payload. This is a quantity of data that represent - the actual "key". In the case of a keyring, this is a list of keys to - which the keyring links; in the case of a user-defined key, it's an - arbitrary blob of data. + (*) Each key can have a payload. This is a quantity of data that represent the + actual "key". In the case of a keyring, this is a list of keys to which + the keyring links; in the case of a user-defined key, it's an arbitrary + blob of data. Having a payload is not required; and the payload can, in fact, just be a value stored in the struct key itself. @@ -92,8 +92,8 @@ Each key has a number of attributes: (*) Each key can be in one of a number of basic states: - (*) Uninstantiated. The key exists, but does not have any data - attached. Keys being requested from userspace will be in this state. + (*) Uninstantiated. The key exists, but does not have any data attached. + Keys being requested from userspace will be in this state. (*) Instantiated. This is the normal state. The key is fully formed, and has data attached. @@ -140,10 +140,10 @@ The key service provides a number of features besides keys: clone, fork, vfork or execve occurs. A new keyring is created only when required. - The process-specific keyring is replaced with an empty one in the child - on clone, fork, vfork unless CLONE_THREAD is supplied, in which case it - is shared. execve also discards the process's process keyring and creates - a new one. + The process-specific keyring is replaced with an empty one in the child on + clone, fork, vfork unless CLONE_THREAD is supplied, in which case it is + shared. execve also discards the process's process keyring and creates a + new one. The session-specific keyring is persistent across clone, fork, vfork and execve, even when the latter executes a set-UID or set-GID binary. A @@ -177,11 +177,11 @@ The key service provides a number of features besides keys: If a system call that modifies a key or keyring in some way would put the user over quota, the operation is refused and error EDQUOT is returned. - (*) There's a system call interface by which userspace programs can create - and manipulate keys and keyrings. + (*) There's a system call interface by which userspace programs can create and + manipulate keys and keyrings. - (*) There's a kernel interface by which services can register types and - search for keys. + (*) There's a kernel interface by which services can register types and search + for keys. (*) There's a way for the a search done from the kernel to call back to userspace to request a key that can't be found in a process's keyrings. @@ -194,9 +194,9 @@ The key service provides a number of features besides keys: KEY ACCESS PERMISSIONS ====================== -Keys have an owner user ID, a group access ID, and a permissions mask. The -mask has up to eight bits each for user, group and other access. Only five of -each set of eight bits are defined. These permissions granted are: +Keys have an owner user ID, a group access ID, and a permissions mask. The mask +has up to eight bits each for user, group and other access. Only five of each +set of eight bits are defined. These permissions granted are: (*) View @@ -210,8 +210,8 @@ each set of eight bits are defined. These permissions granted are: (*) Write - This permits a key's payload to be instantiated or updated, or it allows - a link to be added to or removed from a keyring. + This permits a key's payload to be instantiated or updated, or it allows a + link to be added to or removed from a keyring. (*) Search @@ -238,8 +238,8 @@ about the status of the key service: (*) /proc/keys This lists all the keys on the system, giving information about their - type, description and permissions. The payload of the key is not - available this way: + type, description and permissions. The payload of the key is not available + this way: SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY 00000001 I----- 39 perm 1f0000 0 0 keyring _uid_ses.0: 1/4 @@ -318,21 +318,21 @@ The main syscalls are: If a key of the same type and description as that proposed already exists in the keyring, this will try to update it with the given payload, or it will return error EEXIST if that function is not supported by the key - type. The process must also have permission to write to the key to be - able to update it. The new key will have all user permissions granted and - no group or third party permissions. + type. The process must also have permission to write to the key to be able + to update it. The new key will have all user permissions granted and no + group or third party permissions. - Otherwise, this will attempt to create a new key of the specified type - and description, and to instantiate it with the supplied payload and - attach it to the keyring. In this case, an error will be generated if the - process does not have permission to write to the keyring. + Otherwise, this will attempt to create a new key of the specified type and + description, and to instantiate it with the supplied payload and attach it + to the keyring. In this case, an error will be generated if the process + does not have permission to write to the keyring. The payload is optional, and the pointer can be NULL if not required by the type. The payload is plen in size, and plen can be zero for an empty payload. - A new keyring can be generated by setting type "keyring", the keyring - name as the description (or NULL) and setting the payload to NULL. + A new keyring can be generated by setting type "keyring", the keyring name + as the description (or NULL) and setting the payload to NULL. User defined keys can be created by specifying type "user". It is recommended that a user defined key's description by prefixed with a type @@ -369,9 +369,9 @@ The keyctl syscall functions are: key_serial_t keyctl(KEYCTL_GET_KEYRING_ID, key_serial_t id, int create); - The special key specified by "id" is looked up (with the key being - created if necessary) and the ID of the key or keyring thus found is - returned if it exists. + The special key specified by "id" is looked up (with the key being created + if necessary) and the ID of the key or keyring thus found is returned if + it exists. If the key does not yet exist, the key will be created if "create" is non-zero; and the error ENOKEY will be returned if "create" is zero. @@ -402,8 +402,8 @@ The keyctl syscall functions are: This will try to update the specified key with the given payload, or it will return error EOPNOTSUPP if that function is not supported by the key - type. The process must also have permission to write to the key to be - able to update it. + type. The process must also have permission to write to the key to be able + to update it. The payload is of length plen, and may be absent or empty as for add_key(). @@ -422,8 +422,8 @@ The keyctl syscall functions are: long keyctl(KEYCTL_CHOWN, key_serial_t key, uid_t uid, gid_t gid); - This function permits a key's owner and group ID to be changed. Either - one of uid or gid can be set to -1 to suppress that change. + This function permits a key's owner and group ID to be changed. Either one + of uid or gid can be set to -1 to suppress that change. Only the superuser can change a key's owner to something other than the key's current owner. Similarly, only the superuser can change a key's @@ -484,12 +484,12 @@ The keyctl syscall functions are: long keyctl(KEYCTL_LINK, key_serial_t keyring, key_serial_t key); - This function creates a link from the keyring to the key. The process - must have write permission on the keyring and must have link permission - on the key. + This function creates a link from the keyring to the key. The process must + have write permission on the keyring and must have link permission on the + key. - Should the keyring not be a keyring, error ENOTDIR will result; and if - the keyring is full, error ENFILE will result. + Should the keyring not be a keyring, error ENOTDIR will result; and if the + keyring is full, error ENFILE will result. The link procedure checks the nesting of the keyrings, returning ELOOP if it appears to deep or EDEADLK if the link would introduce a cycle. @@ -503,8 +503,8 @@ The keyctl syscall functions are: specified key, and removes it if found. Subsequent links to that key are ignored. The process must have write permission on the keyring. - If the keyring is not a keyring, error ENOTDIR will result; and if the - key is not present, error ENOENT will be the result. + If the keyring is not a keyring, error ENOTDIR will result; and if the key + is not present, error ENOENT will be the result. (*) Search a keyring tree for a key: @@ -513,9 +513,9 @@ The keyctl syscall functions are: const char *type, const char *description, key_serial_t dest_keyring); - This searches the keyring tree headed by the specified keyring until a - key is found that matches the type and description criteria. Each keyring - is checked for keys before recursion into its children occurs. + This searches the keyring tree headed by the specified keyring until a key + is found that matches the type and description criteria. Each keyring is + checked for keys before recursion into its children occurs. The process must have search permission on the top level keyring, or else error EACCES will result. Only keyrings that the process has search @@ -549,8 +549,8 @@ The keyctl syscall functions are: As much of the data as can be fitted into the buffer will be copied to userspace if the buffer pointer is not NULL. - On a successful return, the function will always return the amount of - data available rather than the amount copied. + On a successful return, the function will always return the amount of data + available rather than the amount copied. (*) Instantiate a partially constructed key. @@ -568,8 +568,8 @@ The keyctl syscall functions are: it, and the key must be uninstantiated. If a keyring is specified (non-zero), the key will also be linked into - that keyring, however all the constraints applying in KEYCTL_LINK apply - in this case too. + that keyring, however all the constraints applying in KEYCTL_LINK apply in + this case too. The payload and plen arguments describe the payload data as for add_key(). @@ -587,8 +587,39 @@ The keyctl syscall functions are: it, and the key must be uninstantiated. If a keyring is specified (non-zero), the key will also be linked into - that keyring, however all the constraints applying in KEYCTL_LINK apply - in this case too. + that keyring, however all the constraints applying in KEYCTL_LINK apply in + this case too. + + + (*) Set the default request-key destination keyring. + + long keyctl(KEYCTL_SET_REQKEY_KEYRING, int reqkey_defl); + + This sets the default keyring to which implicitly requested keys will be + attached for this thread. reqkey_defl should be one of these constants: + + CONSTANT VALUE NEW DEFAULT KEYRING + ====================================== ====== ======================= + KEY_REQKEY_DEFL_NO_CHANGE -1 No change + KEY_REQKEY_DEFL_DEFAULT 0 Default[1] + KEY_REQKEY_DEFL_THREAD_KEYRING 1 Thread keyring + KEY_REQKEY_DEFL_PROCESS_KEYRING 2 Process keyring + KEY_REQKEY_DEFL_SESSION_KEYRING 3 Session keyring + KEY_REQKEY_DEFL_USER_KEYRING 4 User keyring + KEY_REQKEY_DEFL_USER_SESSION_KEYRING 5 User session keyring + KEY_REQKEY_DEFL_GROUP_KEYRING 6 Group keyring + + The old default will be returned if successful and error EINVAL will be + returned if reqkey_defl is not one of the above values. + + The default keyring can be overridden by the keyring indicated to the + request_key() system call. + + Note that this setting is inherited across fork/exec. + + [1] The default default is: the thread keyring if there is one, otherwise + the process keyring if there is one, otherwise the session keyring if + there is one, otherwise the user default session keyring. =============== @@ -601,17 +632,14 @@ be broken down into two areas: keys and key types. Dealing with keys is fairly straightforward. Firstly, the kernel service registers its type, then it searches for a key of that type. It should retain the key as long as it has need of it, and then it should release it. For a -filesystem or device file, a search would probably be performed during the -open call, and the key released upon close. How to deal with conflicting keys -due to two different users opening the same file is left to the filesystem -author to solve. - -When accessing a key's payload data, key->lock should be at least read locked, -or else the data may be changed by an update being performed from userspace -whilst the driver or filesystem is trying to access it. If no update method is -supplied, then the key's payload may be accessed without holding a lock as -there is no way to change it, provided it can be guaranteed that the key's -type definition won't go away. +filesystem or device file, a search would probably be performed during the open +call, and the key released upon close. How to deal with conflicting keys due to +two different users opening the same file is left to the filesystem author to +solve. + +When accessing a key's payload contents, certain precautions must be taken to +prevent access vs modification races. See the section "Notes on accessing +payload contents" for more information. (*) To search for a key, call: @@ -629,6 +657,9 @@ type definition won't go away. Should the function fail error ENOKEY, EKEYEXPIRED or EKEYREVOKED will be returned. + If successful, the key will have been attached to the default keyring for + implicitly obtained request-key keys, as set by KEYCTL_SET_REQKEY_KEYRING. + (*) When it is no longer required, the key should be released using: @@ -690,6 +721,54 @@ type definition won't go away. void unregister_key_type(struct key_type *type); +=================================== +NOTES ON ACCESSING PAYLOAD CONTENTS +=================================== + +The simplest payload is just a number in key->payload.value. In this case, +there's no need to indulge in RCU or locking when accessing the payload. + +More complex payload contents must be allocated and a pointer to them set in +key->payload.data. One of the following ways must be selected to access the +data: + + (1) Unmodifyable key type. + + If the key type does not have a modify method, then the key's payload can + be accessed without any form of locking, provided that it's known to be + instantiated (uninstantiated keys cannot be "found"). + + (2) The key's semaphore. + + The semaphore could be used to govern access to the payload and to control + the payload pointer. It must be write-locked for modifications and would + have to be read-locked for general access. The disadvantage of doing this + is that the accessor may be required to sleep. + + (3) RCU. + + RCU must be used when the semaphore isn't already held; if the semaphore + is held then the contents can't change under you unexpectedly as the + semaphore must still be used to serialise modifications to the key. The + key management code takes care of this for the key type. + + However, this means using: + + rcu_read_lock() ... rcu_dereference() ... rcu_read_unlock() + + to read the pointer, and: + + rcu_dereference() ... rcu_assign_pointer() ... call_rcu() + + to set the pointer and dispose of the old contents after a grace period. + Note that only the key type should ever modify a key's payload. + + Furthermore, an RCU controlled payload must hold a struct rcu_head for the + use of call_rcu() and, if the payload is of variable size, the length of + the payload. key->datalen cannot be relied upon to be consistent with the + payload just dereferenced if the key's semaphore is not held. + + =================== DEFINING A KEY TYPE =================== @@ -717,15 +796,15 @@ The structure has a number of fields, some of which are mandatory: int key_payload_reserve(struct key *key, size_t datalen); - With the revised data length. Error EDQUOT will be returned if this is - not viable. + With the revised data length. Error EDQUOT will be returned if this is not + viable. (*) int (*instantiate)(struct key *key, const void *data, size_t datalen); This method is called to attach a payload to a key during construction. - The payload attached need not bear any relation to the data passed to - this function. + The payload attached need not bear any relation to the data passed to this + function. If the amount of data attached to the key differs from the size in keytype->def_datalen, then key_payload_reserve() should be called. @@ -734,38 +813,47 @@ The structure has a number of fields, some of which are mandatory: The fact that KEY_FLAG_INSTANTIATED is not set in key->flags prevents anything else from gaining access to the key. - This method may sleep if it wishes. + It is safe to sleep in this method. (*) int (*duplicate)(struct key *key, const struct key *source); If this type of key can be duplicated, then this method should be - provided. It is called to copy the payload attached to the source into - the new key. The data length on the new key will have been updated and - the quota adjusted already. + provided. It is called to copy the payload attached to the source into the + new key. The data length on the new key will have been updated and the + quota adjusted already. This method will be called with the source key's semaphore read-locked to - prevent its payload from being changed. It is safe to sleep here. + prevent its payload from being changed, thus RCU constraints need not be + applied to the source key. + + This method does not have to lock the destination key in order to attach a + payload. The fact that KEY_FLAG_INSTANTIATED is not set in key->flags + prevents anything else from gaining access to the key. + + It is safe to sleep in this method. (*) int (*update)(struct key *key, const void *data, size_t datalen); - If this type of key can be updated, then this method should be - provided. It is called to update a key's payload from the blob of data - provided. + If this type of key can be updated, then this method should be provided. + It is called to update a key's payload from the blob of data provided. key_payload_reserve() should be called if the data length might change - before any changes are actually made. Note that if this succeeds, the - type is committed to changing the key because it's already been altered, - so all memory allocation must be done first. + before any changes are actually made. Note that if this succeeds, the type + is committed to changing the key because it's already been altered, so all + memory allocation must be done first. + + The key will have its semaphore write-locked before this method is called, + but this only deters other writers; any changes to the key's payload must + be made under RCU conditions, and call_rcu() must be used to dispose of + the old payload. - key_payload_reserve() should be called with the key->lock write locked, - and the changes to the key's attached payload should be made before the - key is locked. + key_payload_reserve() should be called before the changes are made, but + after all allocations and other potentially failing function calls are + made. - The key will have its semaphore write-locked before this method is - called. Any changes to the key should be made with the key's rwlock - write-locked also. It is safe to sleep here. + It is safe to sleep in this method. (*) int (*match)(const struct key *key, const void *desc); @@ -782,12 +870,12 @@ The structure has a number of fields, some of which are mandatory: (*) void (*destroy)(struct key *key); - This method is optional. It is called to discard the payload data on a - key when it is being destroyed. + This method is optional. It is called to discard the payload data on a key + when it is being destroyed. - This method does not need to lock the key; it can consider the key as - being inaccessible. Note that the key's type may have changed before this - function is called. + This method does not need to lock the key to access the payload; it can + consider the key as being inaccessible at this time. Note that the key's + type may have been changed before this function is called. It is not safe to sleep in this method; the caller may hold spinlocks. @@ -797,26 +885,31 @@ The structure has a number of fields, some of which are mandatory: This method is optional. It is called during /proc/keys reading to summarise a key's description and payload in text form. - This method will be called with the key's rwlock read-locked. This will - prevent the key's payload and state changing; also the description should - not change. This also means it is not safe to sleep in this method. + This method will be called with the RCU read lock held. rcu_dereference() + should be used to read the payload pointer if the payload is to be + accessed. key->datalen cannot be trusted to stay consistent with the + contents of the payload. + + The description will not change, though the key's state may. + + It is not safe to sleep in this method; the RCU read lock is held by the + caller. (*) long (*read)(const struct key *key, char __user *buffer, size_t buflen); This method is optional. It is called by KEYCTL_READ to translate the - key's payload into something a blob of data for userspace to deal - with. Ideally, the blob should be in the same format as that passed in to - the instantiate and update methods. + key's payload into something a blob of data for userspace to deal with. + Ideally, the blob should be in the same format as that passed in to the + instantiate and update methods. If successful, the blob size that could be produced should be returned rather than the size copied. - This method will be called with the key's semaphore read-locked. This - will prevent the key's payload changing. It is not necessary to also - read-lock key->lock when accessing the key's payload. It is safe to sleep - in this method, such as might happen when the userspace buffer is - accessed. + This method will be called with the key's semaphore read-locked. This will + prevent the key's payload changing. It is not necessary to use RCU locking + when accessing the key's payload. It is safe to sleep in this method, such + as might happen when the userspace buffer is accessed. ============================ @@ -853,8 +946,8 @@ If it returns with the key remaining in the unconstructed state, the key will be marked as being negative, it will be added to the session keyring, and an error will be returned to the key requestor. -Supplementary information may be provided from whoever or whatever invoked -this service. This will be passed as the <callout_info> parameter. If no such +Supplementary information may be provided from whoever or whatever invoked this +service. This will be passed as the <callout_info> parameter. If no such information was made available, then "-" will be passed as this parameter instead. diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt new file mode 100644 index 00000000000..0541fe1de70 --- /dev/null +++ b/Documentation/kprobes.txt @@ -0,0 +1,588 @@ +Title : Kernel Probes (Kprobes) +Authors : Jim Keniston <jkenisto@us.ibm.com> + : Prasanna S Panchamukhi <prasanna@in.ibm.com> + +CONTENTS + +1. Concepts: Kprobes, Jprobes, Return Probes +2. Architectures Supported +3. Configuring Kprobes +4. API Reference +5. Kprobes Features and Limitations +6. Probe Overhead +7. TODO +8. Kprobes Example +9. Jprobes Example +10. Kretprobes Example + +1. Concepts: Kprobes, Jprobes, Return Probes + +Kprobes enables you to dynamically break into any kernel routine and +collect debugging and performance information non-disruptively. You +can trap at almost any kernel code address, specifying a handler +routine to be invoked when the breakpoint is hit. + +There are currently three types of probes: kprobes, jprobes, and +kretprobes (also called return probes). A kprobe can be inserted +on virtually any instruction in the kernel. A jprobe is inserted at +the entry to a kernel function, and provides convenient access to the +function's arguments. A return probe fires when a specified function +returns. + +In the typical case, Kprobes-based instrumentation is packaged as +a kernel module. The module's init function installs ("registers") +one or more probes, and the exit function unregisters them. A +registration function such as register_kprobe() specifies where +the probe is to be inserted and what handler is to be called when +the probe is hit. + +The next three subsections explain how the different types of +probes work. They explain certain things that you'll need to +know in order to make the best use of Kprobes -- e.g., the +difference between a pre_handler and a post_handler, and how +to use the maxactive and nmissed fields of a kretprobe. But +if you're in a hurry to start using Kprobes, you can skip ahead +to section 2. + +1.1 How Does a Kprobe Work? + +When a kprobe is registered, Kprobes makes a copy of the probed +instruction and replaces the first byte(s) of the probed instruction +with a breakpoint instruction (e.g., int3 on i386 and x86_64). + +When a CPU hits the breakpoint instruction, a trap occurs, the CPU's +registers are saved, and control passes to Kprobes via the +notifier_call_chain mechanism. Kprobes executes the "pre_handler" +associated with the kprobe, passing the handler the addresses of the +kprobe struct and the saved registers. + +Next, Kprobes single-steps its copy of the probed instruction. +(It would be simpler to single-step the actual instruction in place, +but then Kprobes would have to temporarily remove the breakpoint +instruction. This would open a small time window when another CPU +could sail right past the probepoint.) + +After the instruction is single-stepped, Kprobes executes the +"post_handler," if any, that is associated with the kprobe. +Execution then continues with the instruction following the probepoint. + +1.2 How Does a Jprobe Work? + +A jprobe is implemented using a kprobe that is placed on a function's +entry point. It employs a simple mirroring principle to allow +seamless access to the probed function's arguments. The jprobe +handler routine should have the same signature (arg list and return +type) as the function being probed, and must always end by calling +the Kprobes function jprobe_return(). + +Here's how it works. When the probe is hit, Kprobes makes a copy of +the saved registers and a generous portion of the stack (see below). +Kprobes then points the saved instruction pointer at the jprobe's +handler routine, and returns from the trap. As a result, control +passes to the handler, which is presented with the same register and +stack contents as the probed function. When it is done, the handler +calls jprobe_return(), which traps again to restore the original stack +contents and processor state and switch to the probed function. + +By convention, the callee owns its arguments, so gcc may produce code +that unexpectedly modifies that portion of the stack. This is why +Kprobes saves a copy of the stack and restores it after the jprobe +handler has run. Up to MAX_STACK_SIZE bytes are copied -- e.g., +64 bytes on i386. + +Note that the probed function's args may be passed on the stack +or in registers (e.g., for x86_64 or for an i386 fastcall function). +The jprobe will work in either case, so long as the handler's +prototype matches that of the probed function. + +1.3 How Does a Return Probe Work? + +When you call register_kretprobe(), Kprobes establishes a kprobe at +the entry to the function. When the probed function is called and this +probe is hit, Kprobes saves a copy of the return address, and replaces +the return address with the address of a "trampoline." The trampoline +is an arbitrary piece of code -- typically just a nop instruction. +At boot time, Kprobes registers a kprobe at the trampoline. + +When the probed function executes its return instruction, control +passes to the trampoline and that probe is hit. Kprobes' trampoline +handler calls the user-specified handler associated with the kretprobe, +then sets the saved instruction pointer to the saved return address, +and that's where execution resumes upon return from the trap. + +While the probed function is executing, its return address is +stored in an object of type kretprobe_instance. Before calling +register_kretprobe(), the user sets the maxactive field of the +kretprobe struct to specify how many instances of the specified +function can be probed simultaneously. register_kretprobe() +pre-allocates the indicated number of kretprobe_instance objects. + +For example, if the function is non-recursive and is called with a +spinlock held, maxactive = 1 should be enough. If the function is +non-recursive and can never relinquish the CPU (e.g., via a semaphore +or preemption), NR_CPUS should be enough. If maxactive <= 0, it is +set to a default value. If CONFIG_PREEMPT is enabled, the default +is max(10, 2*NR_CPUS). Otherwise, the default is NR_CPUS. + +It's not a disaster if you set maxactive too low; you'll just miss +some probes. In the kretprobe struct, the nmissed field is set to +zero when the return probe is registered, and is incremented every +time the probed function is entered but there is no kretprobe_instance +object available for establishing the return probe. + +2. Architectures Supported + +Kprobes, jprobes, and return probes are implemented on the following +architectures: + +- i386 +- x86_64 (AMD-64, E64MT) +- ppc64 +- ia64 (Support for probes on certain instruction types is still in progress.) +- sparc64 (Return probes not yet implemented.) + +3. Configuring Kprobes + +When configuring the kernel using make menuconfig/xconfig/oldconfig, +ensure that CONFIG_KPROBES is set to "y". Under "Kernel hacking", +look for "Kprobes". You may have to enable "Kernel debugging" +(CONFIG_DEBUG_KERNEL) before you can enable Kprobes. + +You may also want to ensure that CONFIG_KALLSYMS and perhaps even +CONFIG_KALLSYMS_ALL are set to "y", since kallsyms_lookup_name() +is a handy, version-independent way to find a function's address. + +If you need to insert a probe in the middle of a function, you may find +it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO), +so you can use "objdump -d -l vmlinux" to see the source-to-object +code mapping. + +4. API Reference + +The Kprobes API includes a "register" function and an "unregister" +function for each type of probe. Here are terse, mini-man-page +specifications for these functions and the associated probe handlers +that you'll write. See the latter half of this document for examples. + +4.1 register_kprobe + +#include <linux/kprobes.h> +int register_kprobe(struct kprobe *kp); + +Sets a breakpoint at the address kp->addr. When the breakpoint is +hit, Kprobes calls kp->pre_handler. After the probed instruction +is single-stepped, Kprobe calls kp->post_handler. If a fault +occurs during execution of kp->pre_handler or kp->post_handler, +or during single-stepping of the probed instruction, Kprobes calls +kp->fault_handler. Any or all handlers can be NULL. + +register_kprobe() returns 0 on success, or a negative errno otherwise. + +User's pre-handler (kp->pre_handler): +#include <linux/kprobes.h> +#include <linux/ptrace.h> +int pre_handler(struct kprobe *p, struct pt_regs *regs); + +Called with p pointing to the kprobe associated with the breakpoint, +and regs pointing to the struct containing the registers saved when +the breakpoint was hit. Return 0 here unless you're a Kprobes geek. + +User's post-handler (kp->post_handler): +#include <linux/kprobes.h> +#include <linux/ptrace.h> +void post_handler(struct kprobe *p, struct pt_regs *regs, + unsigned long flags); + +p and regs are as described for the pre_handler. flags always seems +to be zero. + +User's fault-handler (kp->fault_handler): +#include <linux/kprobes.h> +#include <linux/ptrace.h> +int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr); + +p and regs are as described for the pre_handler. trapnr is the +architecture-specific trap number associated with the fault (e.g., +on i386, 13 for a general protection fault or 14 for a page fault). +Returns 1 if it successfully handled the exception. + +4.2 register_jprobe + +#include <linux/kprobes.h> +int register_jprobe(struct jprobe *jp) + +Sets a breakpoint at the address jp->kp.addr, which must be the address +of the first instruction of a function. When the breakpoint is hit, +Kprobes runs the handler whose address is jp->entry. + +The handler should have the same arg list and return type as the probed +function; and just before it returns, it must call jprobe_return(). +(The handler never actually returns, since jprobe_return() returns +control to Kprobes.) If the probed function is declared asmlinkage, +fastcall, or anything else that affects how args are passed, the +handler's declaration must match. + +register_jprobe() returns 0 on success, or a negative errno otherwise. + +4.3 register_kretprobe + +#include <linux/kprobes.h> +int register_kretprobe(struct kretprobe *rp); + +Establishes a return probe for the function whose address is +rp->kp.addr. When that function returns, Kprobes calls rp->handler. +You must set rp->maxactive appropriately before you call +register_kretprobe(); see "How Does a Return Probe Work?" for details. + +register_kretprobe() returns 0 on success, or a negative errno +otherwise. + +User's return-probe handler (rp->handler): +#include <linux/kprobes.h> +#include <linux/ptrace.h> +int kretprobe_handler(struct kretprobe_instance *ri, struct pt_regs *regs); + +regs is as described for kprobe.pre_handler. ri points to the +kretprobe_instance object, of which the following fields may be +of interest: +- ret_addr: the return address +- rp: points to the corresponding kretprobe object +- task: points to the corresponding task struct +The handler's return value is currently ignored. + +4.4 unregister_*probe + +#include <linux/kprobes.h> +void unregister_kprobe(struct kprobe *kp); +void unregister_jprobe(struct jprobe *jp); +void unregister_kretprobe(struct kretprobe *rp); + +Removes the specified probe. The unregister function can be called +at any time after the probe has been registered. + +5. Kprobes Features and Limitations + +As of Linux v2.6.12, Kprobes allows multiple probes at the same +address. Currently, however, there cannot be multiple jprobes on +the same function at the same time. + +In general, you can install a probe anywhere in the kernel. +In particular, you can probe interrupt handlers. Known exceptions +are discussed in this section. + +For obvious reasons, it's a bad idea to install a probe in +the code that implements Kprobes (mostly kernel/kprobes.c and +arch/*/kernel/kprobes.c). A patch in the v2.6.13 timeframe instructs +Kprobes to reject such requests. + +If you install a probe in an inline-able function, Kprobes makes +no attempt to chase down all inline instances of the function and +install probes there. gcc may inline a function without being asked, +so keep this in mind if you're not seeing the probe hits you expect. + +A probe handler can modify the environment of the probed function +-- e.g., by modifying kernel data structures, or by modifying the +contents of the pt_regs struct (which are restored to the registers +upon return from the breakpoint). So Kprobes can be used, for example, +to install a bug fix or to inject faults for testing. Kprobes, of +course, has no way to distinguish the deliberately injected faults +from the accidental ones. Don't drink and probe. + +Kprobes makes no attempt to prevent probe handlers from stepping on +each other -- e.g., probing printk() and then calling printk() from a +probe handler. As of Linux v2.6.12, if a probe handler hits a probe, +that second probe's handlers won't be run in that instance. + +In Linux v2.6.12 and previous versions, Kprobes' data structures are +protected by a single lock that is held during probe registration and +unregistration and while handlers are run. Thus, no two handlers +can run simultaneously. To improve scalability on SMP systems, +this restriction will probably be removed soon, in which case +multiple handlers (or multiple instances of the same handler) may +run concurrently on different CPUs. Code your handlers accordingly. + +Kprobes does not use semaphores or allocate memory except during +registration and unregistration. + +Probe handlers are run with preemption disabled. Depending on the +architecture, handlers may also run with interrupts disabled. In any +case, your handler should not yield the CPU (e.g., by attempting to +acquire a semaphore). + +Since a return probe is implemented by replacing the return +address with the trampoline's address, stack backtraces and calls +to __builtin_return_address() will typically yield the trampoline's +address instead of the real return address for kretprobed functions. +(As far as we can tell, __builtin_return_address() is used only +for instrumentation and error reporting.) + +If the number of times a function is called does not match the +number of times it returns, registering a return probe on that +function may produce undesirable results. We have the do_exit() +and do_execve() cases covered. do_fork() is not an issue. We're +unaware of other specific cases where this could be a problem. + +6. Probe Overhead + +On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0 +microseconds to process. Specifically, a benchmark that hits the same +probepoint repeatedly, firing a simple handler each time, reports 1-2 +million hits per second, depending on the architecture. A jprobe or +return-probe hit typically takes 50-75% longer than a kprobe hit. +When you have a return probe set on a function, adding a kprobe at +the entry to that function adds essentially no overhead. + +Here are sample overhead figures (in usec) for different architectures. +k = kprobe; j = jprobe; r = return probe; kr = kprobe + return probe +on same function; jr = jprobe + return probe on same function + +i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips +k = 0.57 usec; j = 1.00; r = 0.92; kr = 0.99; jr = 1.40 + +x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips +k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07 + +ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU) +k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99 + +7. TODO + +a. SystemTap (http://sourceware.org/systemtap): Work in progress +to provide a simplified programming interface for probe-based +instrumentation. +b. Improved SMP scalability: Currently, work is in progress to handle +multiple kprobes in parallel. +c. Kernel return probes for sparc64. +d. Support for other architectures. +e. User-space probes. + +8. Kprobes Example + +Here's a sample kernel module showing the use of kprobes to dump a +stack trace and selected i386 registers when do_fork() is called. +----- cut here ----- +/*kprobe_example.c*/ +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/kprobes.h> +#include <linux/kallsyms.h> +#include <linux/sched.h> + +/*For each probe you need to allocate a kprobe structure*/ +static struct kprobe kp; + +/*kprobe pre_handler: called just before the probed instruction is executed*/ +int handler_pre(struct kprobe *p, struct pt_regs *regs) +{ + printk("pre_handler: p->addr=0x%p, eip=%lx, eflags=0x%lx\n", + p->addr, regs->eip, regs->eflags); + dump_stack(); + return 0; +} + +/*kprobe post_handler: called after the probed instruction is executed*/ +void handler_post(struct kprobe *p, struct pt_regs *regs, unsigned long flags) +{ + printk("post_handler: p->addr=0x%p, eflags=0x%lx\n", + p->addr, regs->eflags); +} + +/* fault_handler: this is called if an exception is generated for any + * instruction within the pre- or post-handler, or when Kprobes + * single-steps the probed instruction. + */ +int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr) +{ + printk("fault_handler: p->addr=0x%p, trap #%dn", + p->addr, trapnr); + /* Return 0 because we don't handle the fault. */ + return 0; +} + +int init_module(void) +{ + int ret; + kp.pre_handler = handler_pre; + kp.post_handler = handler_post; + kp.fault_handler = handler_fault; + kp.addr = (kprobe_opcode_t*) kallsyms_lookup_name("do_fork"); + /* register the kprobe now */ + if (!kp.addr) { + printk("Couldn't find %s to plant kprobe\n", "do_fork"); + return -1; + } + if ((ret = register_kprobe(&kp) < 0)) { + printk("register_kprobe failed, returned %d\n", ret); + return -1; + } + printk("kprobe registered\n"); + return 0; +} + +void cleanup_module(void) +{ + unregister_kprobe(&kp); + printk("kprobe unregistered\n"); +} + +MODULE_LICENSE("GPL"); +----- cut here ----- + +You can build the kernel module, kprobe-example.ko, using the following +Makefile: +----- cut here ----- +obj-m := kprobe-example.o +KDIR := /lib/modules/$(shell uname -r)/build +PWD := $(shell pwd) +default: + $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules +clean: + rm -f *.mod.c *.ko *.o +----- cut here ----- + +$ make +$ su - +... +# insmod kprobe-example.ko + +You will see the trace data in /var/log/messages and on the console +whenever do_fork() is invoked to create a new process. + +9. Jprobes Example + +Here's a sample kernel module showing the use of jprobes to dump +the arguments of do_fork(). +----- cut here ----- +/*jprobe-example.c */ +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/fs.h> +#include <linux/uio.h> +#include <linux/kprobes.h> +#include <linux/kallsyms.h> + +/* + * Jumper probe for do_fork. + * Mirror principle enables access to arguments of the probed routine + * from the probe handler. + */ + +/* Proxy routine having the same arguments as actual do_fork() routine */ +long jdo_fork(unsigned long clone_flags, unsigned long stack_start, + struct pt_regs *regs, unsigned long stack_size, + int __user * parent_tidptr, int __user * child_tidptr) +{ + printk("jprobe: clone_flags=0x%lx, stack_size=0x%lx, regs=0x%p\n", + clone_flags, stack_size, regs); + /* Always end with a call to jprobe_return(). */ + jprobe_return(); + /*NOTREACHED*/ + return 0; +} + +static struct jprobe my_jprobe = { + .entry = (kprobe_opcode_t *) jdo_fork +}; + +int init_module(void) +{ + int ret; + my_jprobe.kp.addr = (kprobe_opcode_t *) kallsyms_lookup_name("do_fork"); + if (!my_jprobe.kp.addr) { + printk("Couldn't find %s to plant jprobe\n", "do_fork"); + return -1; + } + + if ((ret = register_jprobe(&my_jprobe)) <0) { + printk("register_jprobe failed, returned %d\n", ret); + return -1; + } + printk("Planted jprobe at %p, handler addr %p\n", + my_jprobe.kp.addr, my_jprobe.entry); + return 0; +} + +void cleanup_module(void) +{ + unregister_jprobe(&my_jprobe); + printk("jprobe unregistered\n"); +} + +MODULE_LICENSE("GPL"); +----- cut here ----- + +Build and insert the kernel module as shown in the above kprobe +example. You will see the trace data in /var/log/messages and on +the console whenever do_fork() is invoked to create a new process. +(Some messages may be suppressed if syslogd is configured to +eliminate duplicate messages.) + +10. Kretprobes Example + +Here's a sample kernel module showing the use of return probes to +report failed calls to sys_open(). +----- cut here ----- +/*kretprobe-example.c*/ +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/kprobes.h> +#include <linux/kallsyms.h> + +static const char *probed_func = "sys_open"; + +/* Return-probe handler: If the probed function fails, log the return value. */ +static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs) +{ + // Substitute the appropriate register name for your architecture -- + // e.g., regs->rax for x86_64, regs->gpr[3] for ppc64. + int retval = (int) regs->eax; + if (retval < 0) { + printk("%s returns %d\n", probed_func, retval); + } + return 0; +} + +static struct kretprobe my_kretprobe = { + .handler = ret_handler, + /* Probe up to 20 instances concurrently. */ + .maxactive = 20 +}; + +int init_module(void) +{ + int ret; + my_kretprobe.kp.addr = + (kprobe_opcode_t *) kallsyms_lookup_name(probed_func); + if (!my_kretprobe.kp.addr) { + printk("Couldn't find %s to plant return probe\n", probed_func); + return -1; + } + if ((ret = register_kretprobe(&my_kretprobe)) < 0) { + printk("register_kretprobe failed, returned %d\n", ret); + return -1; + } + printk("Planted return probe at %p\n", my_kretprobe.kp.addr); + return 0; +} + +void cleanup_module(void) +{ + unregister_kretprobe(&my_kretprobe); + printk("kretprobe unregistered\n"); + /* nmissed > 0 suggests that maxactive was set too low. */ + printk("Missed probing %d instances of %s\n", + my_kretprobe.nmissed, probed_func); +} + +MODULE_LICENSE("GPL"); +----- cut here ----- + +Build and insert the kernel module as shown in the above kprobe +example. You will see the trace data in /var/log/messages and on the +console whenever sys_open() returns a negative value. (Some messages +may be suppressed if syslogd is configured to eliminate duplicate +messages.) + +For additional information on Kprobes, refer to the following URLs: +http://www-106.ibm.com/developerworks/library/l-kprobes.html?ca=dgr-lnxw42Kprobe +http://www.redhat.com/magazine/005mar05/features/kprobes/ diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index 834993d2673..5b01d5cc4e9 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -114,9 +114,7 @@ tuntap.txt vortex.txt - info on using 3Com Vortex (3c590, 3c592, 3c595, 3c597) Ethernet cards. wan-router.txt - - Wan router documentation -wanpipe.txt - - WANPIPE(tm) Multiprotocol WAN Driver for Linux WAN Router + - WAN router documentation wavelan.txt - AT&T GIS (nee NCR) WaveLAN card: An Ethernet-like radio transceiver x25.txt diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 0bc2ed136a3..24d029455ba 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -1,5 +1,7 @@ - Linux Ethernet Bonding Driver HOWTO + Linux Ethernet Bonding Driver HOWTO + + Latest update: 21 June 2005 Initial release : Thomas Davis <tadavis at lbl.gov> Corrections, HA extensions : 2000/10/03-15 : @@ -11,15 +13,22 @@ Corrections, HA extensions : 2000/10/03-15 : Reorganized and updated Feb 2005 by Jay Vosburgh -Note : ------- +Introduction +============ + + The Linux bonding driver provides a method for aggregating +multiple network interfaces into a single logical "bonded" interface. +The behavior of the bonded interfaces depends upon the mode; generally +speaking, modes provide either hot standby or load balancing services. +Additionally, link integrity monitoring may be performed. -The bonding driver originally came from Donald Becker's beowulf patches for -kernel 2.0. It has changed quite a bit since, and the original tools from -extreme-linux and beowulf sites will not work with this version of the driver. + The bonding driver originally came from Donald Becker's +beowulf patches for kernel 2.0. It has changed quite a bit since, and +the original tools from extreme-linux and beowulf sites will not work +with this version of the driver. -For new versions of the driver, patches for older kernels and the updated -userspace tools, please follow the links at the end of this file. + For new versions of the driver, updated userspace tools, and +who to ask for help, please follow the links at the end of this file. Table of Contents ================= @@ -30,9 +39,13 @@ Table of Contents 3. Configuring Bonding Devices 3.1 Configuration with sysconfig support +3.1.1 Using DHCP with sysconfig +3.1.2 Configuring Multiple Bonds with sysconfig 3.2 Configuration with initscripts support +3.2.1 Using DHCP with initscripts +3.2.2 Configuring Multiple Bonds with initscripts 3.3 Configuring Bonding Manually -3.4 Configuring Multiple Bonds +3.3.1 Configuring Multiple Bonds Manually 5. Querying Bonding Configuration 5.1 Bonding Configuration @@ -56,21 +69,30 @@ Table of Contents 11. Promiscuous mode -12. High Availability Information +12. Configuring Bonding for High Availability 12.1 High Availability in a Single Switch Topology -12.1.1 Bonding Mode Selection for Single Switch Topology -12.1.2 Link Monitoring for Single Switch Topology 12.2 High Availability in a Multiple Switch Topology -12.2.1 Bonding Mode Selection for Multiple Switch Topology -12.2.2 Link Monitoring for Multiple Switch Topology -12.3 Switch Behavior Issues for High Availability +12.2.1 HA Bonding Mode Selection for Multiple Switch Topology +12.2.2 HA Link Monitoring for Multiple Switch Topology + +13. Configuring Bonding for Maximum Throughput +13.1 Maximum Throughput in a Single Switch Topology +13.1.1 MT Bonding Mode Selection for Single Switch Topology +13.1.2 MT Link Monitoring for Single Switch Topology +13.2 Maximum Throughput in a Multiple Switch Topology +13.2.1 MT Bonding Mode Selection for Multiple Switch Topology +13.2.2 MT Link Monitoring for Multiple Switch Topology -13. Hardware Specific Considerations -13.1 IBM BladeCenter +14. Switch Behavior Issues +14.1 Link Establishment and Failover Delays +14.2 Duplicated Incoming Packets -14. Frequently Asked Questions +15. Hardware Specific Considerations +15.1 IBM BladeCenter -15. Resources and Links +16. Frequently Asked Questions + +17. Resources and Links 1. Bonding Driver Installation @@ -86,16 +108,10 @@ the following steps: 1.1 Configure and build the kernel with bonding ----------------------------------------------- - The latest version of the bonding driver is available in the + The current version of the bonding driver is available in the drivers/net/bonding subdirectory of the most recent kernel source -(which is available on http://kernel.org). - - Prior to the 2.4.11 kernel, the bonding driver was maintained -largely outside the kernel tree; patches for some earlier kernels are -available on the bonding sourceforge site, although those patches are -still several years out of date. Most users will want to use either -the most recent kernel from kernel.org or whatever kernel came with -their distro. +(which is available on http://kernel.org). Most users "rolling their +own" will want to use the most recent kernel from kernel.org. Configure kernel with "make menuconfig" (or "make xconfig" or "make config"), then select "Bonding driver support" in the "Network @@ -103,8 +119,8 @@ device support" section. It is recommended that you configure the driver as module since it is currently the only way to pass parameters to the driver or configure more than one bonding device. - Build and install the new kernel and modules, then proceed to -step 2. + Build and install the new kernel and modules, then continue +below to install ifenslave. 1.2 Install ifenslave Control Utility ------------------------------------- @@ -147,9 +163,9 @@ default kernel source include directory. Options for the bonding driver are supplied as parameters to the bonding module at load time. They may be given as command line arguments to the insmod or modprobe command, but are usually specified -in either the /etc/modprobe.conf configuration file, or in a -distro-specific configuration file (some of which are detailed in the -next section). +in either the /etc/modules.conf or /etc/modprobe.conf configuration +file, or in a distro-specific configuration file (some of which are +detailed in the next section). The available bonding driver parameters are listed below. If a parameter is not specified the default value is used. When initially @@ -162,34 +178,34 @@ degradation will occur during link failures. Very few devices do not support at least miimon, so there is really no reason not to use it. Options with textual values will accept either the text name - or, for backwards compatibility, the option value. E.g., - "mode=802.3ad" and "mode=4" set the same mode. +or, for backwards compatibility, the option value. E.g., +"mode=802.3ad" and "mode=4" set the same mode. The parameters are as follows: arp_interval - Specifies the ARP monitoring frequency in milli-seconds. If - ARP monitoring is used in a load-balancing mode (mode 0 or 2), - the switch should be configured in a mode that evenly - distributes packets across all links - such as round-robin. If - the switch is configured to distribute the packets in an XOR + Specifies the ARP link monitoring frequency in milliseconds. + If ARP monitoring is used in an etherchannel compatible mode + (modes 0 and 2), the switch should be configured in a mode + that evenly distributes packets across all links. If the + switch is configured to distribute the packets in an XOR fashion, all replies from the ARP targets will be received on the same link which could cause the other team members to - fail. ARP monitoring should not be used in conjunction with - miimon. A value of 0 disables ARP monitoring. The default + fail. ARP monitoring should not be used in conjunction with + miimon. A value of 0 disables ARP monitoring. The default value is 0. arp_ip_target - Specifies the ip addresses to use when arp_interval is > 0. - These are the targets of the ARP request sent to determine the - health of the link to the targets. Specify these values in - ddd.ddd.ddd.ddd format. Multiple ip adresses must be - seperated by a comma. At least one IP address must be given - for ARP monitoring to function. The maximum number of targets - that can be specified is 16. The default value is no IP - addresses. + Specifies the IP addresses to use as ARP monitoring peers when + arp_interval is > 0. These are the targets of the ARP request + sent to determine the health of the link to the targets. + Specify these values in ddd.ddd.ddd.ddd format. Multiple IP + addresses must be separated by a comma. At least one IP + address must be given for ARP monitoring to function. The + maximum number of targets that can be specified is 16. The + default value is no IP addresses. downdelay @@ -207,11 +223,13 @@ lacp_rate are: slow or 0 - Request partner to transmit LACPDUs every 30 seconds (default) + Request partner to transmit LACPDUs every 30 seconds fast or 1 Request partner to transmit LACPDUs every 1 second + The default is slow. + max_bonds Specifies the number of bonding devices to create for this @@ -221,10 +239,11 @@ max_bonds miimon - Specifies the frequency in milli-seconds that MII link - monitoring will occur. A value of zero disables MII link - monitoring. A value of 100 is a good starting point. The - use_carrier option, below, affects how the link state is + Specifies the MII link monitoring frequency in milliseconds. + This determines how often the link state of each slave is + inspected for link failures. A value of zero disables MII + link monitoring. A value of 100 is a good starting point. + The use_carrier option, below, affects how the link state is determined. See the High Availability section for additional information. The default value is 0. @@ -246,17 +265,31 @@ mode active. A different slave becomes active if, and only if, the active slave fails. The bond's MAC address is externally visible on only one port (network adapter) - to avoid confusing the switch. This mode provides - fault tolerance. The primary option affects the - behavior of this mode. + to avoid confusing the switch. + + In bonding version 2.6.2 or later, when a failover + occurs in active-backup mode, bonding will issue one + or more gratuitous ARPs on the newly active slave. + One gratutious ARP is issued for the bonding master + interface and each VLAN interfaces configured above + it, provided that the interface has at least one IP + address configured. Gratuitous ARPs issued for VLAN + interfaces are tagged with the appropriate VLAN id. + + This mode provides fault tolerance. The primary + option, documented below, affects the behavior of this + mode. balance-xor or 2 - XOR policy: Transmit based on [(source MAC address - XOR'd with destination MAC address) modulo slave - count]. This selects the same slave for each - destination MAC address. This mode provides load - balancing and fault tolerance. + XOR policy: Transmit based on the selected transmit + hash policy. The default policy is a simple [(source + MAC address XOR'd with destination MAC address) modulo + slave count]. Alternate transmit policies may be + selected via the xmit_hash_policy option, described + below. + + This mode provides load balancing and fault tolerance. broadcast or 3 @@ -270,7 +303,17 @@ mode duplex settings. Utilizes all slaves in the active aggregator according to the 802.3ad specification. - Pre-requisites: + Slave selection for outgoing traffic is done according + to the transmit hash policy, which may be changed from + the default simple XOR policy via the xmit_hash_policy + option, documented below. Note that not all transmit + policies may be 802.3ad compliant, particularly in + regards to the packet mis-ordering requirements of + section 43.2.4 of the 802.3ad standard. Differing + peer implementations will have varying tolerances for + noncompliance. + + Prerequisites: 1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave. @@ -333,7 +376,7 @@ mode When a link is reconnected or a new slave joins the bond the receive traffic is redistributed among all - active slaves in the bond by intiating ARP Replies + active slaves in the bond by initiating ARP Replies with the selected mac address to each of the clients. The updelay parameter (detailed below) must be set to a value equal or greater than the switch's @@ -396,6 +439,60 @@ use_carrier 0 will use the deprecated MII / ETHTOOL ioctls. The default value is 1. +xmit_hash_policy + + Selects the transmit hash policy to use for slave selection in + balance-xor and 802.3ad modes. Possible values are: + + layer2 + + Uses XOR of hardware MAC addresses to generate the + hash. The formula is + + (source MAC XOR destination MAC) modulo slave count + + This algorithm will place all traffic to a particular + network peer on the same slave. + + This algorithm is 802.3ad compliant. + + layer3+4 + + This policy uses upper layer protocol information, + when available, to generate the hash. This allows for + traffic to a particular network peer to span multiple + slaves, although a single connection will not span + multiple slaves. + + The formula for unfragmented TCP and UDP packets is + + ((source port XOR dest port) XOR + ((source IP XOR dest IP) AND 0xffff) + modulo slave count + + For fragmented TCP or UDP packets and all other IP + protocol traffic, the source and destination port + information is omitted. For non-IP traffic, the + formula is the same as for the layer2 transmit hash + policy. + + This policy is intended to mimic the behavior of + certain switches, notably Cisco switches with PFC2 as + well as some Foundry and IBM products. + + This algorithm is not fully 802.3ad compliant. A + single TCP or UDP conversation containing both + fragmented and unfragmented packets will see packets + striped across two interfaces. This may result in out + of order delivery. Most traffic types will not meet + this criteria, as TCP rarely fragments traffic, and + most UDP traffic is not involved in extended + conversations. Other implementations of 802.3ad may + or may not tolerate this noncompliance. + + The default value is layer2. This option was added in bonding +version 2.6.3. In earlier versions of bonding, this parameter does +not exist, and the layer2 policy is the only policy. 3. Configuring Bonding Devices @@ -448,8 +545,9 @@ Bonding devices can be managed by hand, however, as follows. slave devices. On SLES 9, this is most easily done by running the yast2 sysconfig configuration utility. The goal is for to create an ifcfg-id file for each slave device. The simplest way to accomplish -this is to configure the devices for DHCP. The name of the -configuration file for each device will be of the form: +this is to configure the devices for DHCP (this is only to get the +file ifcfg-id file created; see below for some issues with DHCP). The +name of the configuration file for each device will be of the form: ifcfg-id-xx:xx:xx:xx:xx:xx @@ -459,7 +557,7 @@ the device's permanent MAC address. Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been created, it is necessary to edit the configuration files for the slave devices (the MAC addresses correspond to those of the slave devices). -Before editing, the file will contain muliple lines, and will look +Before editing, the file will contain multiple lines, and will look something like this: BOOTPROTO='dhcp' @@ -496,16 +594,11 @@ STARTMODE="onboot" BONDING_MASTER="yes" BONDING_MODULE_OPTS="mode=active-backup miimon=100" BONDING_SLAVE0="eth0" -BONDING_SLAVE1="eth1" +BONDING_SLAVE1="bus-pci-0000:06:08.1" Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK values with the appropriate values for your network. - Note that configuring the bonding device with BOOTPROTO='dhcp' -does not work; the scripts attempt to obtain the device address from -DHCP prior to adding any of the slave devices. Without active slaves, -the DHCP requests are not sent to the network. - The STARTMODE specifies when the device is brought online. The possible values are: @@ -531,9 +624,17 @@ for the bonding mode, link monitoring, and so on here. Do not include the max_bonds bonding parameter; this will confuse the configuration system if you have multiple bonding devices. - Finally, supply one BONDING_SLAVEn="ethX" for each slave, -where "n" is an increasing value, one for each slave, and "ethX" is -the name of the slave device (eth0, eth1, etc). + Finally, supply one BONDING_SLAVEn="slave device" for each +slave. where "n" is an increasing value, one for each slave. The +"slave device" is either an interface name, e.g., "eth0", or a device +specifier for the network device. The interface name is easier to +find, but the ethN names are subject to change at boot time if, e.g., +a device early in the sequence has failed. The device specifiers +(bus-pci-0000:06:08.1 in the example above) specify the physical +network device, and will not change unless the device's bus location +changes (for example, it is moved from one PCI slot to another). The +example above uses one of each type for demonstration purposes; most +configurations will choose one or the other for all slave devices. When all configuration files have been modified or created, networking must be restarted for the configuration changes to take @@ -544,7 +645,7 @@ effect. This can be accomplished via the following: Note that the network control script (/sbin/ifdown) will remove the bonding module as part of the network shutdown processing, so it is not necessary to remove the module by hand if, e.g., the -module paramters have changed. +module parameters have changed. Also, at this writing, YaST/YaST2 will not manage bonding devices (they do not show bonding interfaces on its list of network @@ -559,12 +660,37 @@ format can be found in an example ifcfg template file: Note that the template does not document the various BONDING_ settings described above, but does describe many of the other options. +3.1.1 Using DHCP with sysconfig +------------------------------- + + Under sysconfig, configuring a device with BOOTPROTO='dhcp' +will cause it to query DHCP for its IP address information. At this +writing, this does not function for bonding devices; the scripts +attempt to obtain the device address from DHCP prior to adding any of +the slave devices. Without active slaves, the DHCP requests are not +sent to the network. + +3.1.2 Configuring Multiple Bonds with sysconfig +----------------------------------------------- + + The sysconfig network initialization system is capable of +handling multiple bonding devices. All that is necessary is for each +bonding instance to have an appropriately configured ifcfg-bondX file +(as described above). Do not specify the "max_bonds" parameter to any +instance of bonding, as this will confuse sysconfig. If you require +multiple bonding devices with identical parameters, create multiple +ifcfg-bondX files. + + Because the sysconfig scripts supply the bonding module +options in the ifcfg-bondX file, it is not necessary to add them to +the system /etc/modules.conf or /etc/modprobe.conf configuration file. + 3.2 Configuration with initscripts support ------------------------------------------ This section applies to distros using a version of initscripts with bonding support, for example, Red Hat Linux 9 or Red Hat -Enterprise Linux version 3. On these systems, the network +Enterprise Linux version 3 or 4. On these systems, the network initialization scripts have some knowledge of bonding, and can be configured to control bonding devices. @@ -614,10 +740,11 @@ USERCTL=no Be sure to change the networking specific lines (IPADDR, NETMASK, NETWORK and BROADCAST) to match your network configuration. - Finally, it is necessary to edit /etc/modules.conf to load the -bonding module when the bond0 interface is brought up. The following -sample lines in /etc/modules.conf will load the bonding module, and -select its options: + Finally, it is necessary to edit /etc/modules.conf (or +/etc/modprobe.conf, depending upon your distro) to load the bonding +module with your desired options when the bond0 interface is brought +up. The following lines in /etc/modules.conf (or modprobe.conf) will +load the bonding module, and select its options: alias bond0 bonding options bond0 mode=balance-alb miimon=100 @@ -629,6 +756,33 @@ options for your configuration. will restart the networking subsystem and your bond link should be now up and running. +3.2.1 Using DHCP with initscripts +--------------------------------- + + Recent versions of initscripts (the version supplied with +Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do +have support for assigning IP information to bonding devices via DHCP. + + To configure bonding for DHCP, configure it as described +above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" +and add a line consisting of "TYPE=Bonding". Note that the TYPE value +is case sensitive. + +3.2.2 Configuring Multiple Bonds with initscripts +------------------------------------------------- + + At this writing, the initscripts package does not directly +support loading the bonding driver multiple times, so the process for +doing so is the same as described in the "Configuring Multiple Bonds +Manually" section, below. + + NOTE: It has been observed that some Red Hat supplied kernels +are apparently unable to rename modules at load time (the "-obonding1" +part). Attempts to pass that option to modprobe will produce an +"Operation not permitted" error. This has been reported on some +Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels +exhibiting this problem, it will be impossible to configure multiple +bonds with differing parameters. 3.3 Configuring Bonding Manually -------------------------------- @@ -638,10 +792,11 @@ scripts (the sysconfig or initscripts package) do not have specific knowledge of bonding. One such distro is SuSE Linux Enterprise Server version 8. - The general methodology for these systems is to place the -bonding module parameters into /etc/modprobe.conf, then add modprobe -and/or ifenslave commands to the system's global init script. The -name of the global init script differs; for sysconfig, it is + The general method for these systems is to place the bonding +module parameters into /etc/modules.conf or /etc/modprobe.conf (as +appropriate for the installed distro), then add modprobe and/or +ifenslave commands to the system's global init script. The name of +the global init script differs; for sysconfig, it is /etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. For example, if you wanted to make a simple bond of two e100 @@ -649,7 +804,7 @@ devices (presumed to be eth0 and eth1), and have it persist across reboots, edit the appropriate file (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the following: -modprobe bonding -obond0 mode=balance-alb miimon=100 +modprobe bonding mode=balance-alb miimon=100 modprobe e100 ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up ifenslave bond0 eth0 @@ -657,11 +812,7 @@ ifenslave bond0 eth1 Replace the example bonding module parameters and bond0 network configuration (IP address, netmask, etc) with the appropriate -values for your configuration. The above example loads the bonding -module with the name "bond0," this simplifies the naming if multiple -bonding modules are loaded (each successive instance of the module is -given a different name, and the module instance names match the -bonding interface names). +values for your configuration. Unfortunately, this method will not provide support for the ifup and ifdown scripts on the bond devices. To reload the bonding @@ -684,20 +835,23 @@ appropriate device driver modules. For our example above, you can do the following: # ifconfig bond0 down -# rmmod bond0 +# rmmod bonding # rmmod e100 Again, for convenience, it may be desirable to create a script with these commands. -3.4 Configuring Multiple Bonds ------------------------------- +3.3.1 Configuring Multiple Bonds Manually +----------------------------------------- This section contains information on configuring multiple -bonding devices with differing options. If you require multiple -bonding devices, but all with the same options, see the "max_bonds" -module paramter, documented above. +bonding devices with differing options for those systems whose network +initialization scripts lack support for configuring multiple bonds. + + If you require multiple bonding devices, but all with the same +options, you may wish to use the "max_bonds" module parameter, +documented above. To create multiple bonding devices with differing options, it is necessary to load the bonding driver multiple times. Note that @@ -724,11 +878,16 @@ named "bond0" and creates the bond0 device in balance-rr mode with an miimon of 100. The second instance is named "bond1" and creates the bond1 device in balance-alb mode with an miimon of 50. + In some circumstances (typically with older distributions), +the above does not work, and the second bonding instance never sees +its options. In that case, the second options line can be substituted +as follows: + +install bonding1 /sbin/modprobe bonding -obond1 mode=balance-alb miimon=50 + This may be repeated any number of times, specifying a new and -unique name in place of bond0 or bond1 for each instance. +unique name in place of bond1 for each subsequent instance. - When the appropriate module paramters are in place, then -configure bonding according to the instructions for your distro. 5. Querying Bonding Configuration ================================= @@ -846,8 +1005,8 @@ tagged internally by bonding itself. As a result, bonding must self generated packets. For reasons of simplicity, and to support the use of adapters -that can do VLAN hardware acceleration offloding, the bonding -interface declares itself as fully hardware offloaing capable, it gets +that can do VLAN hardware acceleration offloading, the bonding +interface declares itself as fully hardware offloading capable, it gets the add_vid/kill_vid notifications to gather the necessary information, and it propagates those actions to the slaves. In case of mixed adapter types, hardware accelerated tagged packets that @@ -880,7 +1039,7 @@ bond interface: matches the hardware address of the VLAN interfaces. Note that changing a VLAN interface's HW address would set the -underlying device -- i.e. the bonding interface -- to promiscouos +underlying device -- i.e. the bonding interface -- to promiscuous mode, which might not be what you want. @@ -923,7 +1082,7 @@ down or have a problem making it unresponsive to ARP requests. Having an additional target (or several) increases the reliability of the ARP monitoring. - Multiple ARP targets must be seperated by commas as follows: + Multiple ARP targets must be separated by commas as follows: # example options for ARP monitoring with three targets alias bond0 bonding @@ -1045,7 +1204,7 @@ install bonding /sbin/modprobe tg3; /sbin/modprobe e1000; This will, when loading the bonding module, rather than performing the normal action, instead execute the provided command. This command loads the device drivers in the order needed, then calls -modprobe with --ingore-install to cause the normal action to then take +modprobe with --ignore-install to cause the normal action to then take place. Full documentation on this can be found in the modprobe.conf and modprobe manual pages. @@ -1130,14 +1289,14 @@ association. common to enable promiscuous mode on the device, so that all traffic is seen (instead of seeing only traffic destined for the local host). The bonding driver handles promiscuous mode changes to the bonding -master device (e.g., bond0), and propogates the setting to the slave +master device (e.g., bond0), and propagates the setting to the slave devices. For the balance-rr, balance-xor, broadcast, and 802.3ad modes, -the promiscuous mode setting is propogated to all slaves. +the promiscuous mode setting is propagated to all slaves. For the active-backup, balance-tlb and balance-alb modes, the -promiscuous mode setting is propogated only to the active slave. +promiscuous mode setting is propagated only to the active slave. For balance-tlb mode, the active slave is the slave currently receiving inbound traffic. @@ -1148,46 +1307,182 @@ sending to peers that are unassigned or if the load is unbalanced. For the active-backup, balance-tlb and balance-alb modes, when the active slave changes (e.g., due to a link failure), the -promiscuous setting will be propogated to the new active slave. +promiscuous setting will be propagated to the new active slave. -12. High Availability Information -================================= +12. Configuring Bonding for High Availability +============================================= High Availability refers to configurations that provide maximum network availability by having redundant or backup devices, -links and switches between the host and the rest of the world. - - There are currently two basic methods for configuring to -maximize availability. They are dependent on the network topology and -the primary goal of the configuration, but in general, a configuration -can be optimized for maximum available bandwidth, or for maximum -network availability. +links or switches between the host and the rest of the world. The +goal is to provide the maximum availability of network connectivity +(i.e., the network always works), even though other configurations +could provide higher throughput. 12.1 High Availability in a Single Switch Topology -------------------------------------------------- - If two hosts (or a host and a switch) are directly connected -via multiple physical links, then there is no network availability -penalty for optimizing for maximum bandwidth: there is only one switch -(or peer), so if it fails, you have no alternative access to fail over -to. + If two hosts (or a host and a single switch) are directly +connected via multiple physical links, then there is no availability +penalty to optimizing for maximum bandwidth. In this case, there is +only one switch (or peer), so if it fails, there is no alternative +access to fail over to. Additionally, the bonding load balance modes +support link monitoring of their members, so if individual links fail, +the load will be rebalanced across the remaining devices. + + See Section 13, "Configuring Bonding for Maximum Throughput" +for information on configuring bonding with one peer device. + +12.2 High Availability in a Multiple Switch Topology +---------------------------------------------------- + + With multiple switches, the configuration of bonding and the +network changes dramatically. In multiple switch topologies, there is +a trade off between network availability and usable bandwidth. + + Below is a sample network, configured to maximize the +availability of the network: -Example 1 : host to switch (or other host) + | | + |port3 port3| + +-----+----+ +-----+----+ + | |port2 ISL port2| | + | switch A +--------------------------+ switch B | + | | | | + +-----+----+ +-----++---+ + |port1 port1| + | +-------+ | + +-------------+ host1 +---------------+ + eth0 +-------+ eth1 - +----------+ +----------+ - | |eth0 eth0| switch | - | Host A +--------------------------+ or | - | +--------------------------+ other | - | |eth1 eth1| host | - +----------+ +----------+ + In this configuration, there is a link between the two +switches (ISL, or inter switch link), and multiple ports connecting to +the outside world ("port3" on each switch). There is no technical +reason that this could not be extended to a third switch. +12.2.1 HA Bonding Mode Selection for Multiple Switch Topology +------------------------------------------------------------- -12.1.1 Bonding Mode Selection for single switch topology --------------------------------------------------------- + In a topology such as the example above, the active-backup and +broadcast modes are the only useful bonding modes when optimizing for +availability; the other modes require all links to terminate on the +same peer for them to behave rationally. + +active-backup: This is generally the preferred mode, particularly if + the switches have an ISL and play together well. If the + network configuration is such that one switch is specifically + a backup switch (e.g., has lower capacity, higher cost, etc), + then the primary option can be used to insure that the + preferred link is always used when it is available. + +broadcast: This mode is really a special purpose mode, and is suitable + only for very specific needs. For example, if the two + switches are not connected (no ISL), and the networks beyond + them are totally independent. In this case, if it is + necessary for some specific one-way traffic to reach both + independent networks, then the broadcast mode may be suitable. + +12.2.2 HA Link Monitoring Selection for Multiple Switch Topology +---------------------------------------------------------------- + + The choice of link monitoring ultimately depends upon your +switch. If the switch can reliably fail ports in response to other +failures, then either the MII or ARP monitors should work. For +example, in the above example, if the "port3" link fails at the remote +end, the MII monitor has no direct means to detect this. The ARP +monitor could be configured with a target at the remote end of port3, +thus detecting that failure without switch support. + + In general, however, in a multiple switch topology, the ARP +monitor can provide a higher level of reliability in detecting end to +end connectivity failures (which may be caused by the failure of any +individual component to pass traffic for any reason). Additionally, +the ARP monitor should be configured with multiple targets (at least +one for each switch in the network). This will insure that, +regardless of which switch is active, the ARP monitor has a suitable +target to query. + + +13. Configuring Bonding for Maximum Throughput +============================================== + +13.1 Maximizing Throughput in a Single Switch Topology +------------------------------------------------------ + + In a single switch configuration, the best method to maximize +throughput depends upon the application and network environment. The +various load balancing modes each have strengths and weaknesses in +different environments, as detailed below. + + For this discussion, we will break down the topologies into +two categories. Depending upon the destination of most traffic, we +categorize them into either "gatewayed" or "local" configurations. + + In a gatewayed configuration, the "switch" is acting primarily +as a router, and the majority of traffic passes through this router to +other networks. An example would be the following: + + + +----------+ +----------+ + | |eth0 port1| | to other networks + | Host A +---------------------+ router +-------------------> + | +---------------------+ | Hosts B and C are out + | |eth1 port2| | here somewhere + +----------+ +----------+ + + The router may be a dedicated router device, or another host +acting as a gateway. For our discussion, the important point is that +the majority of traffic from Host A will pass through the router to +some other network before reaching its final destination. + + In a gatewayed network configuration, although Host A may +communicate with many other systems, all of its traffic will be sent +and received via one other peer on the local network, the router. + + Note that the case of two systems connected directly via +multiple physical links is, for purposes of configuring bonding, the +same as a gatewayed configuration. In that case, it happens that all +traffic is destined for the "gateway" itself, not some other network +beyond the gateway. + + In a local configuration, the "switch" is acting primarily as +a switch, and the majority of traffic passes through this switch to +reach other stations on the same network. An example would be the +following: + + +----------+ +----------+ +--------+ + | |eth0 port1| +-------+ Host B | + | Host A +------------+ switch |port3 +--------+ + | +------------+ | +--------+ + | |eth1 port2| +------------------+ Host C | + +----------+ +----------+port4 +--------+ + + + Again, the switch may be a dedicated switch device, or another +host acting as a gateway. For our discussion, the important point is +that the majority of traffic from Host A is destined for other hosts +on the same local network (Hosts B and C in the above example). + + In summary, in a gatewayed configuration, traffic to and from +the bonded device will be to the same MAC level peer on the network +(the gateway itself, i.e., the router), regardless of its final +destination. In a local configuration, traffic flows directly to and +from the final destinations, thus, each destination (Host B, Host C) +will be addressed directly by their individual MAC addresses. + + This distinction between a gatewayed and a local network +configuration is important because many of the load balancing modes +available use the MAC addresses of the local network source and +destination to make load balancing decisions. The behavior of each +mode is described below. + + +13.1.1 MT Bonding Mode Selection for Single Switch Topology +----------------------------------------------------------- This configuration is the easiest to set up and to understand, although you will have to decide which bonding mode best suits your -needs. The tradeoffs for each mode are detailed below: +needs. The trade offs for each mode are detailed below: balance-rr: This mode is the only mode that will permit a single TCP/IP connection to stripe traffic across multiple @@ -1206,6 +1501,23 @@ balance-rr: This mode is the only mode that will permit a single interface's worth of throughput, even after adjusting tcp_reordering. + Note that this out of order delivery occurs when both the + sending and receiving systems are utilizing a multiple + interface bond. Consider a configuration in which a + balance-rr bond feeds into a single higher capacity network + channel (e.g., multiple 100Mb/sec ethernets feeding a single + gigabit ethernet via an etherchannel capable switch). In this + configuration, traffic sent from the multiple 100Mb devices to + a destination connected to the gigabit device will not see + packets out of order. However, traffic sent from the gigabit + device to the multiple 100Mb devices may or may not see + traffic out of order, depending upon the balance policy of the + switch. Many switches do not support any modes that stripe + traffic (instead choosing a port based upon IP or MAC level + addresses); for those devices, traffic flowing from the + gigabit device to the many 100Mb devices will only utilize one + interface. + If you are utilizing protocols other than TCP/IP, UDP for example, and your application can tolerate out of order delivery, then this mode can allow for single stream datagram @@ -1220,16 +1532,21 @@ active-backup: There is not much advantage in this network topology to connected to the same peer as the primary. In this case, a load balancing mode (with link monitoring) will provide the same level of network availability, but with increased - available bandwidth. On the plus side, it does not require - any configuration of the switch. + available bandwidth. On the plus side, active-backup mode + does not require any configuration of the switch, so it may + have value if the hardware available does not support any of + the load balance modes. balance-xor: This mode will limit traffic such that packets destined for specific peers will always be sent over the same interface. Since the destination is determined by the MAC - addresses involved, this may be desirable if you have a large - network with many hosts. It is likely to be suboptimal if all - your traffic is passed through a single router, however. As - with balance-rr, the switch ports need to be configured for + addresses involved, this mode works best in a "local" network + configuration (as described above), with destinations all on + the same local network. This mode is likely to be suboptimal + if all your traffic is passed through a single router (i.e., a + "gatewayed" network configuration, as described above). + + As with balance-rr, the switch ports need to be configured for "etherchannel" or "trunking." broadcast: Like active-backup, there is not much advantage to this @@ -1241,122 +1558,131 @@ broadcast: Like active-backup, there is not much advantage to this protocol includes automatic configuration of the aggregates, so minimal manual configuration of the switch is needed (typically only to designate that some set of devices is - usable for 802.3ad). The 802.3ad standard also mandates that - frames be delivered in order (within certain limits), so in - general single connections will not see misordering of + available for 802.3ad). The 802.3ad standard also mandates + that frames be delivered in order (within certain limits), so + in general single connections will not see misordering of packets. The 802.3ad mode does have some drawbacks: the standard mandates that all devices in the aggregate operate at the same speed and duplex. Also, as with all bonding load balance modes other than balance-rr, no single connection will be able to utilize more than a single interface's worth of - bandwidth. Additionally, the linux bonding 802.3ad - implementation distributes traffic by peer (using an XOR of - MAC addresses), so in general all traffic to a particular - destination will use the same interface. Finally, the 802.3ad - mode mandates the use of the MII monitor, therefore, the ARP - monitor is not available in this mode. - -balance-tlb: This mode is also a good choice for this type of - topology. It has no special switch configuration - requirements, and balances outgoing traffic by peer, in a - vaguely intelligent manner (not a simple XOR as in balance-xor - or 802.3ad mode), so that unlucky MAC addresses will not all - "bunch up" on a single interface. Interfaces may be of - differing speeds. On the down side, in this mode all incoming - traffic arrives over a single interface, this mode requires - certain ethtool support in the network device driver of the - slave interfaces, and the ARP monitor is not available. - -balance-alb: This mode is everything that balance-tlb is, and more. It - has all of the features (and restrictions) of balance-tlb, and - will also balance incoming traffic from peers (as described in - the Bonding Module Options section, above). The only extra - down side to this mode is that the network device driver must - support changing the hardware address while the device is - open. - -12.1.2 Link Monitoring for Single Switch Topology -------------------------------------------------- + bandwidth. + + Additionally, the linux bonding 802.3ad implementation + distributes traffic by peer (using an XOR of MAC addresses), + so in a "gatewayed" configuration, all outgoing traffic will + generally use the same device. Incoming traffic may also end + up on a single device, but that is dependent upon the + balancing policy of the peer's 8023.ad implementation. In a + "local" configuration, traffic will be distributed across the + devices in the bond. + + Finally, the 802.3ad mode mandates the use of the MII monitor, + therefore, the ARP monitor is not available in this mode. + +balance-tlb: The balance-tlb mode balances outgoing traffic by peer. + Since the balancing is done according to MAC address, in a + "gatewayed" configuration (as described above), this mode will + send all traffic across a single device. However, in a + "local" network configuration, this mode balances multiple + local network peers across devices in a vaguely intelligent + manner (not a simple XOR as in balance-xor or 802.3ad mode), + so that mathematically unlucky MAC addresses (i.e., ones that + XOR to the same value) will not all "bunch up" on a single + interface. + + Unlike 802.3ad, interfaces may be of differing speeds, and no + special switch configuration is required. On the down side, + in this mode all incoming traffic arrives over a single + interface, this mode requires certain ethtool support in the + network device driver of the slave interfaces, and the ARP + monitor is not available. + +balance-alb: This mode is everything that balance-tlb is, and more. + It has all of the features (and restrictions) of balance-tlb, + and will also balance incoming traffic from local network + peers (as described in the Bonding Module Options section, + above). + + The only additional down side to this mode is that the network + device driver must support changing the hardware address while + the device is open. + +13.1.2 MT Link Monitoring for Single Switch Topology +---------------------------------------------------- The choice of link monitoring may largely depend upon which mode you choose to use. The more advanced load balancing modes do not support the use of the ARP monitor, and are thus restricted to using -the MII monitor (which does not provide as high a level of assurance -as the ARP monitor). - - -12.2 High Availability in a Multiple Switch Topology ----------------------------------------------------- - - With multiple switches, the configuration of bonding and the -network changes dramatically. In multiple switch topologies, there is -a tradeoff between network availability and usable bandwidth. - - Below is a sample network, configured to maximize the -availability of the network: - - | | - |port3 port3| - +-----+----+ +-----+----+ - | |port2 ISL port2| | - | switch A +--------------------------+ switch B | - | | | | - +-----+----+ +-----++---+ - |port1 port1| - | +-------+ | - +-------------+ host1 +---------------+ - eth0 +-------+ eth1 - - In this configuration, there is a link between the two -switches (ISL, or inter switch link), and multiple ports connecting to -the outside world ("port3" on each switch). There is no technical -reason that this could not be extended to a third switch. - -12.2.1 Bonding Mode Selection for Multiple Switch Topology ----------------------------------------------------------- - - In a topology such as this, the active-backup and broadcast -modes are the only useful bonding modes; the other modes require all -links to terminate on the same peer for them to behave rationally. - -active-backup: This is generally the preferred mode, particularly if - the switches have an ISL and play together well. If the - network configuration is such that one switch is specifically - a backup switch (e.g., has lower capacity, higher cost, etc), - then the primary option can be used to insure that the - preferred link is always used when it is available. - -broadcast: This mode is really a special purpose mode, and is suitable - only for very specific needs. For example, if the two - switches are not connected (no ISL), and the networks beyond - them are totally independant. In this case, if it is - necessary for some specific one-way traffic to reach both - independent networks, then the broadcast mode may be suitable. - -12.2.2 Link Monitoring Selection for Multiple Switch Topology +the MII monitor (which does not provide as high a level of end to end +assurance as the ARP monitor). + +13.2 Maximum Throughput in a Multiple Switch Topology +----------------------------------------------------- + + Multiple switches may be utilized to optimize for throughput +when they are configured in parallel as part of an isolated network +between two or more systems, for example: + + +-----------+ + | Host A | + +-+---+---+-+ + | | | + +--------+ | +---------+ + | | | + +------+---+ +-----+----+ +-----+----+ + | Switch A | | Switch B | | Switch C | + +------+---+ +-----+----+ +-----+----+ + | | | + +--------+ | +---------+ + | | | + +-+---+---+-+ + | Host B | + +-----------+ + + In this configuration, the switches are isolated from one +another. One reason to employ a topology such as this is for an +isolated network with many hosts (a cluster configured for high +performance, for example), using multiple smaller switches can be more +cost effective than a single larger switch, e.g., on a network with 24 +hosts, three 24 port switches can be significantly less expensive than +a single 72 port switch. + + If access beyond the network is required, an individual host +can be equipped with an additional network device connected to an +external network; this host then additionally acts as a gateway. + +13.2.1 MT Bonding Mode Selection for Multiple Switch Topology ------------------------------------------------------------- - The choice of link monitoring ultimately depends upon your -switch. If the switch can reliably fail ports in response to other -failures, then either the MII or ARP monitors should work. For -example, in the above example, if the "port3" link fails at the remote -end, the MII monitor has no direct means to detect this. The ARP -monitor could be configured with a target at the remote end of port3, -thus detecting that failure without switch support. + In actual practice, the bonding mode typically employed in +configurations of this type is balance-rr. Historically, in this +network configuration, the usual caveats about out of order packet +delivery are mitigated by the use of network adapters that do not do +any kind of packet coalescing (via the use of NAPI, or because the +device itself does not generate interrupts until some number of +packets has arrived). When employed in this fashion, the balance-rr +mode allows individual connections between two hosts to effectively +utilize greater than one interface's bandwidth. - In general, however, in a multiple switch topology, the ARP -monitor can provide a higher level of reliability in detecting link -failures. Additionally, it should be configured with multiple targets -(at least one for each switch in the network). This will insure that, -regardless of which switch is active, the ARP monitor has a suitable -target to query. +13.2.2 MT Link Monitoring for Multiple Switch Topology +------------------------------------------------------ + Again, in actual practice, the MII monitor is most often used +in this configuration, as performance is given preference over +availability. The ARP monitor will function in this topology, but its +advantages over the MII monitor are mitigated by the volume of probes +needed as the number of systems involved grows (remember that each +host in the network is configured with bonding). -12.3 Switch Behavior Issues for High Availability -------------------------------------------------- +14. Switch Behavior Issues +========================== - You may encounter issues with the timing of link up and down -reporting by the switch. +14.1 Link Establishment and Failover Delays +------------------------------------------- + + Some switches exhibit undesirable behavior with regard to the +timing of link up and down reporting by the switch. First, when a link comes up, some switches may indicate that the link is up (carrier available), but not pass traffic over the @@ -1370,30 +1696,70 @@ relevant interface(s). Second, some switches may "bounce" the link state one or more times while a link is changing state. This occurs most commonly while the switch is initializing. Again, an appropriate updelay value may -help, but note that if all links are down, then updelay is ignored -when any link becomes active (the slave closest to completing its -updelay is chosen). +help. Note that when a bonding interface has no active links, the -driver will immediately reuse the first link that goes up, even if -updelay parameter was specified. If there are slave interfaces -waiting for the updelay timeout to expire, the interface that first -went into that state will be immediately reused. This reduces down -time of the network if the value of updelay has been overestimated. +driver will immediately reuse the first link that goes up, even if the +updelay parameter has been specified (the updelay is ignored in this +case). If there are slave interfaces waiting for the updelay timeout +to expire, the interface that first went into that state will be +immediately reused. This reduces down time of the network if the +value of updelay has been overestimated, and since this occurs only in +cases with no connectivity, there is no additional penalty for +ignoring the updelay. In addition to the concerns about switch timings, if your switches take a long time to go into backup mode, it may be desirable to not activate a backup interface immediately after a link goes down. Failover may be delayed via the downdelay bonding module option. -13. Hardware Specific Considerations +14.2 Duplicated Incoming Packets +-------------------------------- + + It is not uncommon to observe a short burst of duplicated +traffic when the bonding device is first used, or after it has been +idle for some period of time. This is most easily observed by issuing +a "ping" to some other host on the network, and noticing that the +output from ping flags duplicates (typically one per slave). + + For example, on a bond in active-backup mode with five slaves +all connected to one switch, the output may appear as follows: + +# ping -n 10.0.4.2 +PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data. +64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms +64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) +64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) +64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) +64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) +64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms +64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms +64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms + + This is not due to an error in the bonding driver, rather, it +is a side effect of how many switches update their MAC forwarding +tables. Initially, the switch does not associate the MAC address in +the packet with a particular switch port, and so it may send the +traffic to all ports until its MAC forwarding table is updated. Since +the interfaces attached to the bond may occupy multiple ports on a +single switch, when the switch (temporarily) floods the traffic to all +ports, the bond device receives multiple copies of the same packet +(one per slave device). + + The duplicated packet behavior is switch dependent, some +switches exhibit this, and some do not. On switches that display this +behavior, it can be induced by clearing the MAC forwarding table (on +most Cisco switches, the privileged command "clear mac address-table +dynamic" will accomplish this). + +15. Hardware Specific Considerations ==================================== This section contains additional information for configuring bonding on specific hardware platforms, or for interfacing bonding with particular switches or other devices. -13.1 IBM BladeCenter +15.1 IBM BladeCenter -------------------- This applies to the JS20 and similar systems. @@ -1407,12 +1773,12 @@ JS20 network adapter information -------------------------------- All JS20s come with two Broadcom Gigabit Ethernet ports -integrated on the planar. In the BladeCenter chassis, the eth0 port -of all JS20 blades is hard wired to I/O Module #1; similarly, all eth1 -ports are wired to I/O Module #2. An add-on Broadcom daughter card -can be installed on a JS20 to provide two more Gigabit Ethernet ports. -These ports, eth2 and eth3, are wired to I/O Modules 3 and 4, -respectively. +integrated on the planar (that's "motherboard" in IBM-speak). In the +BladeCenter chassis, the eth0 port of all JS20 blades is hard wired to +I/O Module #1; similarly, all eth1 ports are wired to I/O Module #2. +An add-on Broadcom daughter card can be installed on a JS20 to provide +two more Gigabit Ethernet ports. These ports, eth2 and eth3, are +wired to I/O Modules 3 and 4, respectively. Each I/O Module may contain either a switch or a passthrough module (which allows ports to be directly connected to an external @@ -1432,29 +1798,30 @@ BladeCenter networking configuration of ways, this discussion will be confined to describing basic configurations. - Normally, Ethernet Switch Modules (ESM) are used in I/O + Normally, Ethernet Switch Modules (ESMs) are used in I/O modules 1 and 2. In this configuration, the eth0 and eth1 ports of a JS20 will be connected to different internal switches (in the respective I/O modules). - An optical passthru module (OPM) connects the I/O module -directly to an external switch. By using OPMs in I/O module #1 and -#2, the eth0 and eth1 interfaces of a JS20 can be redirected to the -outside world and connected to a common external switch. - - Depending upon the mix of ESM and OPM modules, the network -will appear to bonding as either a single switch topology (all OPM -modules) or as a multiple switch topology (one or more ESM modules, -zero or more OPM modules). It is also possible to connect ESM modules -together, resulting in a configuration much like the example in "High -Availability in a multiple switch topology." - -Requirements for specifc modes ------------------------------- - - The balance-rr mode requires the use of OPM modules for -devices in the bond, all connected to an common external switch. That -switch must be configured for "etherchannel" or "trunking" on the + A passthrough module (OPM or CPM, optical or copper, +passthrough module) connects the I/O module directly to an external +switch. By using PMs in I/O module #1 and #2, the eth0 and eth1 +interfaces of a JS20 can be redirected to the outside world and +connected to a common external switch. + + Depending upon the mix of ESMs and PMs, the network will +appear to bonding as either a single switch topology (all PMs) or as a +multiple switch topology (one or more ESMs, zero or more PMs). It is +also possible to connect ESMs together, resulting in a configuration +much like the example in "High Availability in a Multiple Switch +Topology," above. + +Requirements for specific modes +------------------------------- + + The balance-rr mode requires the use of passthrough modules +for devices in the bond, all connected to an common external switch. +That switch must be configured for "etherchannel" or "trunking" on the appropriate ports, as is usual for balance-rr. The balance-alb and balance-tlb modes will function with @@ -1484,17 +1851,18 @@ connected to the JS20 system. Other concerns -------------- - The Serial Over LAN link is established over the primary + The Serial Over LAN (SoL) link is established over the primary ethernet (eth0) only, therefore, any loss of link to eth0 will result in losing your SoL connection. It will not fail over with other -network traffic. +network traffic, as the SoL system is beyond the control of the +bonding driver. It may be desirable to disable spanning tree on the switch (either the internal Ethernet Switch Module, or an external switch) to -avoid fail-over delays issues when using bonding. +avoid fail-over delay issues when using bonding. -14. Frequently Asked Questions +16. Frequently Asked Questions ============================== 1. Is it SMP safe? @@ -1505,8 +1873,8 @@ The new driver was designed to be SMP safe from the start. 2. What type of cards will work with it? Any Ethernet type cards (you can even mix cards - a Intel -EtherExpress PRO/100 and a 3com 3c905b, for example). They need not -be of the same speed. +EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, +devices need not be of the same speed. 3. How many bonding devices can I have? @@ -1524,11 +1892,12 @@ system. disabled. The active-backup mode will fail over to a backup link, and other modes will ignore the failed link. The link will continue to be monitored, and should it recover, it will rejoin the bond (in whatever -manner is appropriate for the mode). See the section on High -Availability for additional information. +manner is appropriate for the mode). See the sections on High +Availability and the documentation for each mode for additional +information. Link monitoring can be enabled via either the miimon or -arp_interval paramters (described in the module paramters section, +arp_interval parameters (described in the module parameters section, above). In general, miimon monitors the carrier state as sensed by the underlying network device, and the arp monitor (arp_interval) monitors connectivity to another host on the local network. @@ -1536,7 +1905,7 @@ monitors connectivity to another host on the local network. If no link monitoring is configured, the bonding driver will be unable to detect link failures, and will assume that all links are always available. This will likely result in lost packets, and a -resulting degredation of performance. The precise performance loss +resulting degradation of performance. The precise performance loss depends upon the bonding mode and network configuration. 6. Can bonding be used for High Availability? @@ -1550,12 +1919,12 @@ depends upon the bonding mode and network configuration. In the basic balance modes (balance-rr and balance-xor), it works with any system that supports etherchannel (also called trunking). Most managed switches currently available have such -support, and many unmananged switches as well. +support, and many unmanaged switches as well. The advanced balance modes (balance-tlb and balance-alb) do not have special switch requirements, but do need device drivers that support specific features (described in the appropriate section under -module paramters, above). +module parameters, above). In 802.3ad mode, it works with with systems that support IEEE 802.3ad Dynamic Link Aggregation. Most managed and many unmanaged @@ -1565,17 +1934,19 @@ switches currently available support 802.3ad. 8. Where does a bonding device get its MAC address from? - If not explicitly configured with ifconfig, the MAC address of -the bonding device is taken from its first slave device. This MAC -address is then passed to all following slaves and remains persistent -(even if the the first slave is removed) until the bonding device is -brought down or reconfigured. + If not explicitly configured (with ifconfig or ip link), the +MAC address of the bonding device is taken from its first slave +device. This MAC address is then passed to all following slaves and +remains persistent (even if the the first slave is removed) until the +bonding device is brought down or reconfigured. If you wish to change the MAC address, you can set it with -ifconfig: +ifconfig or ip link: # ifconfig bond0 hw ether 00:11:22:33:44:55 +# ip link set bond0 address 66:77:88:99:aa:bb + The MAC address can be also changed by bringing down/up the device and then changing its slaves (or their order): @@ -1591,23 +1962,28 @@ from the bond (`ifenslave -d bond0 eth0'). The bonding driver will then restore the MAC addresses that the slaves had before they were enslaved. -15. Resources and Links +16. Resources and Links ======================= The latest version of the bonding driver can be found in the latest version of the linux kernel, found on http://kernel.org +The latest version of this document can be found in either the latest +kernel source (named Documentation/networking/bonding.txt), or on the +bonding sourceforge site: + +http://www.sourceforge.net/projects/bonding + Discussions regarding the bonding driver take place primarily on the bonding-devel mailing list, hosted at sourceforge.net. If you have -questions or problems, post them to the list. +questions or problems, post them to the list. The list address is: bonding-devel@lists.sourceforge.net -https://lists.sourceforge.net/lists/listinfo/bonding-devel - -There is also a project site on sourceforge. + The administrative interface (to subscribe or unsubscribe) can +be found at: -http://www.sourceforge.net/projects/bonding +https://lists.sourceforge.net/lists/listinfo/bonding-devel Donald Becker's Ethernet Drivers and diag programs may be found at : - http://www.scyld.com/network/ diff --git a/Documentation/networking/dmfe.txt b/Documentation/networking/dmfe.txt index c0e8398674e..046363552d0 100644 --- a/Documentation/networking/dmfe.txt +++ b/Documentation/networking/dmfe.txt @@ -1,59 +1,65 @@ - dmfe.c: Version 1.28 01/18/2000 +Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver for Linux. - A Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver for Linux. - Copyright (C) 1997 Sten Wang +This program is free software; you can redistribute it and/or +modify it under the terms of the GNU General Public License +as published by the Free Software Foundation; either version 2 +of the License, or (at your option) any later version. - This program is free software; you can redistribute it and/or - modify it under the terms of the GNU General Public License - as published by the Free Software Foundation; either version 2 - of the License, or (at your option) any later version. +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. +This driver provides kernel support for Davicom DM9102(A)/DM9132/DM9801 ethernet cards ( CNET +10/100 ethernet cards uses Davicom chipset too, so this driver supports CNET cards too ).If you +didn't compile this driver as a module, it will automatically load itself on boot and print a +line similar to : - A. Compiler command: + dmfe: Davicom DM9xxx net driver, version 1.36.4 (2002-01-17) - A-1: For normal single or multiple processor kernel - "gcc -DMODULE -D__KERNEL__ -I/usr/src/linux/net/inet -Wall - -Wstrict-prototypes -O6 -c dmfe.c" +If you compiled this driver as a module, you have to load it on boot.You can load it with command : - A-2: For single or multiple processor with kernel module version function - "gcc -DMODULE -DMODVERSIONS -D__KERNEL__ -I/usr/src/linux/net/inet - -Wall -Wstrict-prototypes -O6 -c dmfe.c" + insmod dmfe +This way it will autodetect the device mode.This is the suggested way to load the module.Or you can pass +a mode= setting to module while loading, like : - B. The following steps teach you how to activate a DM9102 board: + insmod dmfe mode=0 # Force 10M Half Duplex + insmod dmfe mode=1 # Force 100M Half Duplex + insmod dmfe mode=4 # Force 10M Full Duplex + insmod dmfe mode=5 # Force 100M Full Duplex - 1. Used the upper compiler command to compile dmfe.c +Next you should configure your network interface with a command similar to : - 2. Insert dmfe module into kernel - "insmod dmfe" ;;Auto Detection Mode (Suggest) - "insmod dmfe mode=0" ;;Force 10M Half Duplex - "insmod dmfe mode=1" ;;Force 100M Half Duplex - "insmod dmfe mode=4" ;;Force 10M Full Duplex - "insmod dmfe mode=5" ;;Force 100M Full Duplex + ifconfig eth0 172.22.3.18 + ^^^^^^^^^^^ + Your IP Adress - 3. Config a dm9102 network interface - "ifconfig eth0 172.22.3.18" - ^^^^^^^^^^^ Your IP address +Then you may have to modify the default routing table with command : - 4. Activate the IP routing table. For some distributions, it is not - necessary. You can type "route" to check. + route add default eth0 - "route add default eth0" +Now your ethernet card should be up and running. - 5. Well done. Your DM9102 adapter is now activated. +TODO: - C. Object files description: - 1. dmfe_rh61.o: For Redhat 6.1 +Implement pci_driver::suspend() and pci_driver::resume() power management methods. +Check on 64 bit boxes. +Check and fix on big endian boxes. +Test and make sure PCI latency is now correct for all cases. - If you can make sure your kernel version, you can rename - to dmfe.o and directly use it without re-compiling. +Authors: - Author: Sten Wang, 886-3-5798797-8517, E-mail: sten_wang@davicom.com.tw +Sten Wang <sten_wang@davicom.com.tw > : Original Author +Tobias Ringstrom <tori@unhappy.mine.nu> : Current Maintainer + +Contributors: + +Marcelo Tosatti <marcelo@conectiva.com.br> +Alan Cox <alan@redhat.com> +Jeff Garzik <jgarzik@pobox.com> +Vojtech Pavlik <vojtech@suse.cz> diff --git a/Documentation/networking/fib_trie.txt b/Documentation/networking/fib_trie.txt new file mode 100644 index 00000000000..f50d0c673c5 --- /dev/null +++ b/Documentation/networking/fib_trie.txt @@ -0,0 +1,145 @@ + LC-trie implementation notes. + +Node types +---------- +leaf + An end node with data. This has a copy of the relevant key, along + with 'hlist' with routing table entries sorted by prefix length. + See struct leaf and struct leaf_info. + +trie node or tnode + An internal node, holding an array of child (leaf or tnode) pointers, + indexed through a subset of the key. See Level Compression. + +A few concepts explained +------------------------ +Bits (tnode) + The number of bits in the key segment used for indexing into the + child array - the "child index". See Level Compression. + +Pos (tnode) + The position (in the key) of the key segment used for indexing into + the child array. See Path Compression. + +Path Compression / skipped bits + Any given tnode is linked to from the child array of its parent, using + a segment of the key specified by the parent's "pos" and "bits" + In certain cases, this tnode's own "pos" will not be immediately + adjacent to the parent (pos+bits), but there will be some bits + in the key skipped over because they represent a single path with no + deviations. These "skipped bits" constitute Path Compression. + Note that the search algorithm will simply skip over these bits when + searching, making it necessary to save the keys in the leaves to + verify that they actually do match the key we are searching for. + +Level Compression / child arrays + the trie is kept level balanced moving, under certain conditions, the + children of a full child (see "full_children") up one level, so that + instead of a pure binary tree, each internal node ("tnode") may + contain an arbitrarily large array of links to several children. + Conversely, a tnode with a mostly empty child array (see empty_children) + may be "halved", having some of its children moved downwards one level, + in order to avoid ever-increasing child arrays. + +empty_children + the number of positions in the child array of a given tnode that are + NULL. + +full_children + the number of children of a given tnode that aren't path compressed. + (in other words, they aren't NULL or leaves and their "pos" is equal + to this tnode's "pos"+"bits"). + + (The word "full" here is used more in the sense of "complete" than + as the opposite of "empty", which might be a tad confusing.) + +Comments +--------- + +We have tried to keep the structure of the code as close to fib_hash as +possible to allow verification and help up reviewing. + +fib_find_node() + A good start for understanding this code. This function implements a + straightforward trie lookup. + +fib_insert_node() + Inserts a new leaf node in the trie. This is bit more complicated than + fib_find_node(). Inserting a new node means we might have to run the + level compression algorithm on part of the trie. + +trie_leaf_remove() + Looks up a key, deletes it and runs the level compression algorithm. + +trie_rebalance() + The key function for the dynamic trie after any change in the trie + it is run to optimize and reorganize. Tt will walk the trie upwards + towards the root from a given tnode, doing a resize() at each step + to implement level compression. + +resize() + Analyzes a tnode and optimizes the child array size by either inflating + or shrinking it repeatedly until it fullfills the criteria for optimal + level compression. This part follows the original paper pretty closely + and there may be some room for experimentation here. + +inflate() + Doubles the size of the child array within a tnode. Used by resize(). + +halve() + Halves the size of the child array within a tnode - the inverse of + inflate(). Used by resize(); + +fn_trie_insert(), fn_trie_delete(), fn_trie_select_default() + The route manipulation functions. Should conform pretty closely to the + corresponding functions in fib_hash. + +fn_trie_flush() + This walks the full trie (using nextleaf()) and searches for empty + leaves which have to be removed. + +fn_trie_dump() + Dumps the routing table ordered by prefix length. This is somewhat + slower than the corresponding fib_hash function, as we have to walk the + entire trie for each prefix length. In comparison, fib_hash is organized + as one "zone"/hash per prefix length. + +Locking +------- + +fib_lock is used for an RW-lock in the same way that this is done in fib_hash. +However, the functions are somewhat separated for other possible locking +scenarios. It might conceivably be possible to run trie_rebalance via RCU +to avoid read_lock in the fn_trie_lookup() function. + +Main lookup mechanism +--------------------- +fn_trie_lookup() is the main lookup function. + +The lookup is in its simplest form just like fib_find_node(). We descend the +trie, key segment by key segment, until we find a leaf. check_leaf() does +the fib_semantic_match in the leaf's sorted prefix hlist. + +If we find a match, we are done. + +If we don't find a match, we enter prefix matching mode. The prefix length, +starting out at the same as the key length, is reduced one step at a time, +and we backtrack upwards through the trie trying to find a longest matching +prefix. The goal is always to reach a leaf and get a positive result from the +fib_semantic_match mechanism. + +Inside each tnode, the search for longest matching prefix consists of searching +through the child array, chopping off (zeroing) the least significant "1" of +the child index until we find a match or the child index consists of nothing but +zeros. + +At this point we backtrack (t->stats.backtrack++) up the trie, continuing to +chop off part of the key in order to find the longest matching prefix. + +At this point we will repeatedly descend subtries to look for a match, and there +are some optimizations available that can provide us with "shortcuts" to avoid +descending into dead ends. Look for "HL_OPTIMIZE" sections in the code. + +To alleviate any doubts about the correctness of the route selection process, +a new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which +gives userland access to fib_lookup(). diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index a2c893a7475..ab65714d95f 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -304,57 +304,6 @@ tcp_low_latency - BOOLEAN changed would be a Beowulf compute cluster. Default: 0 -tcp_westwood - BOOLEAN - Enable TCP Westwood+ congestion control algorithm. - TCP Westwood+ is a sender-side only modification of the TCP Reno - protocol stack that optimizes the performance of TCP congestion - control. It is based on end-to-end bandwidth estimation to set - congestion window and slow start threshold after a congestion - episode. Using this estimation, TCP Westwood+ adaptively sets a - slow start threshold and a congestion window which takes into - account the bandwidth used at the time congestion is experienced. - TCP Westwood+ significantly increases fairness wrt TCP Reno in - wired networks and throughput over wireless links. - Default: 0 - -tcp_vegas_cong_avoid - BOOLEAN - Enable TCP Vegas congestion avoidance algorithm. - TCP Vegas is a sender-side only change to TCP that anticipates - the onset of congestion by estimating the bandwidth. TCP Vegas - adjusts the sending rate by modifying the congestion - window. TCP Vegas should provide less packet loss, but it is - not as aggressive as TCP Reno. - Default:0 - -tcp_bic - BOOLEAN - Enable BIC TCP congestion control algorithm. - BIC-TCP is a sender-side only change that ensures a linear RTT - fairness under large windows while offering both scalability and - bounded TCP-friendliness. The protocol combines two schemes - called additive increase and binary search increase. When the - congestion window is large, additive increase with a large - increment ensures linear RTT fairness as well as good - scalability. Under small congestion windows, binary search - increase provides TCP friendliness. - Default: 0 - -tcp_bic_low_window - INTEGER - Sets the threshold window (in packets) where BIC TCP starts to - adjust the congestion window. Below this threshold BIC TCP behaves - the same as the default TCP Reno. - Default: 14 - -tcp_bic_fast_convergence - BOOLEAN - Forces BIC TCP to more quickly respond to changes in congestion - window. Allows two flows sharing the same connection to converge - more rapidly. - Default: 1 - -tcp_default_win_scale - INTEGER - Sets the minimum window scale TCP will negotiate for on all - conections. - Default: 7 - tcp_tso_win_divisor - INTEGER This allows control over what percentage of the congestion window can be consumed by a single TSO frame. @@ -368,6 +317,11 @@ tcp_frto - BOOLEAN where packet loss is typically due to random radio interference rather than intermediate router congestion. +tcp_congestion_control - STRING + Set the congestion control algorithm to be used for new + connections. The algorithm "reno" is always available, but + additional choices may be available based on kernel configuration. + somaxconn - INTEGER Limit of socket listen() backlog, known in userspace as SOMAXCONN. Defaults to 128. See also tcp_max_syn_backlog for additional tuning diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt new file mode 100644 index 00000000000..29ccae40903 --- /dev/null +++ b/Documentation/networking/phy.txt @@ -0,0 +1,288 @@ + +------- +PHY Abstraction Layer +(Updated 2005-07-21) + +Purpose + + Most network devices consist of set of registers which provide an interface + to a MAC layer, which communicates with the physical connection through a + PHY. The PHY concerns itself with negotiating link parameters with the link + partner on the other side of the network connection (typically, an ethernet + cable), and provides a register interface to allow drivers to determine what + settings were chosen, and to configure what settings are allowed. + + While these devices are distinct from the network devices, and conform to a + standard layout for the registers, it has been common practice to integrate + the PHY management code with the network driver. This has resulted in large + amounts of redundant code. Also, on embedded systems with multiple (and + sometimes quite different) ethernet controllers connected to the same + management bus, it is difficult to ensure safe use of the bus. + + Since the PHYs are devices, and the management busses through which they are + accessed are, in fact, busses, the PHY Abstraction Layer treats them as such. + In doing so, it has these goals: + + 1) Increase code-reuse + 2) Increase overall code-maintainability + 3) Speed development time for new network drivers, and for new systems + + Basically, this layer is meant to provide an interface to PHY devices which + allows network driver writers to write as little code as possible, while + still providing a full feature set. + +The MDIO bus + + Most network devices are connected to a PHY by means of a management bus. + Different devices use different busses (though some share common interfaces). + In order to take advantage of the PAL, each bus interface needs to be + registered as a distinct device. + + 1) read and write functions must be implemented. Their prototypes are: + + int write(struct mii_bus *bus, int mii_id, int regnum, u16 value); + int read(struct mii_bus *bus, int mii_id, int regnum); + + mii_id is the address on the bus for the PHY, and regnum is the register + number. These functions are guaranteed not to be called from interrupt + time, so it is safe for them to block, waiting for an interrupt to signal + the operation is complete + + 2) A reset function is necessary. This is used to return the bus to an + initialized state. + + 3) A probe function is needed. This function should set up anything the bus + driver needs, setup the mii_bus structure, and register with the PAL using + mdiobus_register. Similarly, there's a remove function to undo all of + that (use mdiobus_unregister). + + 4) Like any driver, the device_driver structure must be configured, and init + exit functions are used to register the driver. + + 5) The bus must also be declared somewhere as a device, and registered. + + As an example for how one driver implemented an mdio bus driver, see + drivers/net/gianfar_mii.c and arch/ppc/syslib/mpc85xx_devices.c + +Connecting to a PHY + + Sometime during startup, the network driver needs to establish a connection + between the PHY device, and the network device. At this time, the PHY's bus + and drivers need to all have been loaded, so it is ready for the connection. + At this point, there are several ways to connect to the PHY: + + 1) The PAL handles everything, and only calls the network driver when + the link state changes, so it can react. + + 2) The PAL handles everything except interrupts (usually because the + controller has the interrupt registers). + + 3) The PAL handles everything, but checks in with the driver every second, + allowing the network driver to react first to any changes before the PAL + does. + + 4) The PAL serves only as a library of functions, with the network device + manually calling functions to update status, and configure the PHY + + +Letting the PHY Abstraction Layer do Everything + + If you choose option 1 (The hope is that every driver can, but to still be + useful to drivers that can't), connecting to the PHY is simple: + + First, you need a function to react to changes in the link state. This + function follows this protocol: + + static void adjust_link(struct net_device *dev); + + Next, you need to know the device name of the PHY connected to this device. + The name will look something like, "phy0:0", where the first number is the + bus id, and the second is the PHY's address on that bus. + + Now, to connect, just call this function: + + phydev = phy_connect(dev, phy_name, &adjust_link, flags); + + phydev is a pointer to the phy_device structure which represents the PHY. If + phy_connect is successful, it will return the pointer. dev, here, is the + pointer to your net_device. Once done, this function will have started the + PHY's software state machine, and registered for the PHY's interrupt, if it + has one. The phydev structure will be populated with information about the + current state, though the PHY will not yet be truly operational at this + point. + + flags is a u32 which can optionally contain phy-specific flags. + This is useful if the system has put hardware restrictions on + the PHY/controller, of which the PHY needs to be aware. + + Now just make sure that phydev->supported and phydev->advertising have any + values pruned from them which don't make sense for your controller (a 10/100 + controller may be connected to a gigabit capable PHY, so you would need to + mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions + for these bitfields. Note that you should not SET any bits, or the PHY may + get put into an unsupported state. + + Lastly, once the controller is ready to handle network traffic, you call + phy_start(phydev). This tells the PAL that you are ready, and configures the + PHY to connect to the network. If you want to handle your own interrupts, + just set phydev->irq to PHY_IGNORE_INTERRUPT before you call phy_start. + Similarly, if you don't want to use interrupts, set phydev->irq to PHY_POLL. + + When you want to disconnect from the network (even if just briefly), you call + phy_stop(phydev). + +Keeping Close Tabs on the PAL + + It is possible that the PAL's built-in state machine needs a little help to + keep your network device and the PHY properly in sync. If so, you can + register a helper function when connecting to the PHY, which will be called + every second before the state machine reacts to any changes. To do this, you + need to manually call phy_attach() and phy_prepare_link(), and then call + phy_start_machine() with the second argument set to point to your special + handler. + + Currently there are no examples of how to use this functionality, and testing + on it has been limited because the author does not have any drivers which use + it (they all use option 1). So Caveat Emptor. + +Doing it all yourself + + There's a remote chance that the PAL's built-in state machine cannot track + the complex interactions between the PHY and your network device. If this is + so, you can simply call phy_attach(), and not call phy_start_machine or + phy_prepare_link(). This will mean that phydev->state is entirely yours to + handle (phy_start and phy_stop toggle between some of the states, so you + might need to avoid them). + + An effort has been made to make sure that useful functionality can be + accessed without the state-machine running, and most of these functions are + descended from functions which did not interact with a complex state-machine. + However, again, no effort has been made so far to test running without the + state machine, so tryer beware. + + Here is a brief rundown of the functions: + + int phy_read(struct phy_device *phydev, u16 regnum); + int phy_write(struct phy_device *phydev, u16 regnum, u16 val); + + Simple read/write primitives. They invoke the bus's read/write function + pointers. + + void phy_print_status(struct phy_device *phydev); + + A convenience function to print out the PHY status neatly. + + int phy_clear_interrupt(struct phy_device *phydev); + int phy_config_interrupt(struct phy_device *phydev, u32 interrupts); + + Clear the PHY's interrupt, and configure which ones are allowed, + respectively. Currently only supports all on, or all off. + + int phy_enable_interrupts(struct phy_device *phydev); + int phy_disable_interrupts(struct phy_device *phydev); + + Functions which enable/disable PHY interrupts, clearing them + before and after, respectively. + + int phy_start_interrupts(struct phy_device *phydev); + int phy_stop_interrupts(struct phy_device *phydev); + + Requests the IRQ for the PHY interrupts, then enables them for + start, or disables then frees them for stop. + + struct phy_device * phy_attach(struct net_device *dev, const char *phy_id, + u32 flags); + + Attaches a network device to a particular PHY, binding the PHY to a generic + driver if none was found during bus initialization. Passes in + any phy-specific flags as needed. + + int phy_start_aneg(struct phy_device *phydev); + + Using variables inside the phydev structure, either configures advertising + and resets autonegotiation, or disables autonegotiation, and configures + forced settings. + + static inline int phy_read_status(struct phy_device *phydev); + + Fills the phydev structure with up-to-date information about the current + settings in the PHY. + + void phy_sanitize_settings(struct phy_device *phydev) + + Resolves differences between currently desired settings, and + supported settings for the given PHY device. Does not make + the changes in the hardware, though. + + int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd); + int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd); + + Ethtool convenience functions. + + int phy_mii_ioctl(struct phy_device *phydev, + struct mii_ioctl_data *mii_data, int cmd); + + The MII ioctl. Note that this function will completely screw up the state + machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to + use this only to write registers which are not standard, and don't set off + a renegotiation. + + +PHY Device Drivers + + With the PHY Abstraction Layer, adding support for new PHYs is + quite easy. In some cases, no work is required at all! However, + many PHYs require a little hand-holding to get up-and-running. + +Generic PHY driver + + If the desired PHY doesn't have any errata, quirks, or special + features you want to support, then it may be best to not add + support, and let the PHY Abstraction Layer's Generic PHY Driver + do all of the work. + +Writing a PHY driver + + If you do need to write a PHY driver, the first thing to do is + make sure it can be matched with an appropriate PHY device. + This is done during bus initialization by reading the device's + UID (stored in registers 2 and 3), then comparing it to each + driver's phy_id field by ANDing it with each driver's + phy_id_mask field. Also, it needs a name. Here's an example: + + static struct phy_driver dm9161_driver = { + .phy_id = 0x0181b880, + .name = "Davicom DM9161E", + .phy_id_mask = 0x0ffffff0, + ... + } + + Next, you need to specify what features (speed, duplex, autoneg, + etc) your PHY device and driver support. Most PHYs support + PHY_BASIC_FEATURES, but you can look in include/mii.h for other + features. + + Each driver consists of a number of function pointers: + + config_init: configures PHY into a sane state after a reset. + For instance, a Davicom PHY requires descrambling disabled. + probe: Does any setup needed by the driver + suspend/resume: power management + config_aneg: Changes the speed/duplex/negotiation settings + read_status: Reads the current speed/duplex/negotiation settings + ack_interrupt: Clear a pending interrupt + config_intr: Enable or disable interrupts + remove: Does any driver take-down + + Of these, only config_aneg and read_status are required to be + assigned by the driver code. The rest are optional. Also, it is + preferred to use the generic phy driver's versions of these two + functions if at all possible: genphy_read_status and + genphy_config_aneg. If this is not possible, it is likely that + you only need to perform some actions before and after invoking + these functions, and so your functions will wrap the generic + ones. + + Feel free to look at the Marvell, Cicada, and Davicom drivers in + drivers/net/phy/ for examples (the lxt and qsemi drivers have + not been tested as of this writing) diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt index 71749007091..0fa30042557 100644 --- a/Documentation/networking/tcp.txt +++ b/Documentation/networking/tcp.txt @@ -1,5 +1,72 @@ -How the new TCP output machine [nyi] works. +TCP protocol +============ + +Last updated: 21 June 2005 + +Contents +======== + +- Congestion control +- How the new TCP output machine [nyi] works + +Congestion control +================== + +The following variables are used in the tcp_sock for congestion control: +snd_cwnd The size of the congestion window +snd_ssthresh Slow start threshold. We are in slow start if + snd_cwnd is less than this. +snd_cwnd_cnt A counter used to slow down the rate of increase + once we exceed slow start threshold. +snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to. +snd_cwnd_stamp Timestamp for when congestion window last validated. +snd_cwnd_used Used as a highwater mark for how much of the + congestion window is in use. It is used to adjust + snd_cwnd down when the link is limited by the + application rather than the network. + +As of 2.6.13, Linux supports pluggable congestion control algorithms. +A congestion control mechanism can be registered through functions in +tcp_cong.c. The functions used by the congestion control mechanism are +registered via passing a tcp_congestion_ops struct to +tcp_register_congestion_control. As a minimum name, ssthresh, +cong_avoid, min_cwnd must be valid. +Private data for a congestion control mechanism is stored in tp->ca_priv. +tcp_ca(tp) returns a pointer to this space. This is preallocated space - it +is important to check the size of your private data will fit this space, or +alternatively space could be allocated elsewhere and a pointer to it could +be stored here. + +There are three kinds of congestion control algorithms currently: The +simplest ones are derived from TCP reno (highspeed, scalable) and just +provide an alternative the congestion window calculation. More complex +ones like BIC try to look at other events to provide better +heuristics. There are also round trip time based algorithms like +Vegas and Westwood+. + +Good TCP congestion control is a complex problem because the algorithm +needs to maintain fairness and performance. Please review current +research and RFC's before developing new modules. + +The method that is used to determine which congestion control mechanism is +determined by the setting of the sysctl net.ipv4.tcp_congestion_control. +The default congestion control will be the last one registered (LIFO); +so if you built everything as modules. the default will be reno. If you +build with the default's from Kconfig, then BIC will be builtin (not a module) +and it will end up the default. + +If you really want a particular default value then you will need +to set it with the sysctl. If you use a sysctl, the module will be autoloaded +if needed and you will get the expected protocol. If you ask for an +unknown congestion method, then the sysctl attempt will fail. + +If you remove a tcp congestion control module, then you will get the next +available one. Since reno can not be built as a module, and can not be +deleted, it will always be available. + +How the new TCP output machine [nyi] works. +=========================================== Data is kept on a single queue. The skb->users flag tells us if the frame is one that has been queued already. To add a frame we throw it on the end. Ack diff --git a/Documentation/networking/wanpipe.txt b/Documentation/networking/wanpipe.txt deleted file mode 100644 index aea20cd2a56..00000000000 --- a/Documentation/networking/wanpipe.txt +++ /dev/null @@ -1,622 +0,0 @@ ------------------------------------------------------------------------------- -Linux WAN Router Utilities Package ------------------------------------------------------------------------------- -Version 2.2.1 -Mar 28, 2001 -Author: Nenad Corbic <ncorbic@sangoma.com> -Copyright (c) 1995-2001 Sangoma Technologies Inc. ------------------------------------------------------------------------------- - -INTRODUCTION - -Wide Area Networks (WANs) are used to interconnect Local Area Networks (LANs) -and/or stand-alone hosts over vast distances with data transfer rates -significantly higher than those achievable with commonly used dial-up -connections. - -Usually an external device called `WAN router' sitting on your local network -or connected to your machine's serial port provides physical connection to -WAN. Although router's job may be as simple as taking your local network -traffic, converting it to WAN format and piping it through the WAN link, these -devices are notoriously expensive, with prices as much as 2 - 5 times higher -then the price of a typical PC box. - -Alternatively, considering robustness and multitasking capabilities of Linux, -an internal router can be built (most routers use some sort of stripped down -Unix-like operating system anyway). With a number of relatively inexpensive WAN -interface cards available on the market, a perfectly usable router can be -built for less than half a price of an external router. Yet a Linux box -acting as a router can still be used for other purposes, such as fire-walling, -running FTP, WWW or DNS server, etc. - -This kernel module introduces the notion of a WAN Link Driver (WLD) to Linux -operating system and provides generic hardware-independent services for such -drivers. Why can existing Linux network device interface not be used for -this purpose? Well, it can. However, there are a few key differences between -a typical network interface (e.g. Ethernet) and a WAN link. - -Many WAN protocols, such as X.25 and frame relay, allow for multiple logical -connections (known as `virtual circuits' in X.25 terminology) over a single -physical link. Each such virtual circuit may (and almost always does) lead -to a different geographical location and, therefore, different network. As a -result, it is the virtual circuit, not the physical link, that represents a -route and, therefore, a network interface in Linux terms. - -To further complicate things, virtual circuits are usually volatile in nature -(excluding so called `permanent' virtual circuits or PVCs). With almost no -time required to set up and tear down a virtual circuit, it is highly desirable -to implement on-demand connections in order to minimize network charges. So -unlike a typical network driver, the WAN driver must be able to handle multiple -network interfaces and cope as multiple virtual circuits come into existence -and go away dynamically. - -Last, but not least, WAN configuration is much more complex than that of say -Ethernet and may well amount to several dozens of parameters. Some of them -are "link-wide" while others are virtual circuit-specific. The same holds -true for WAN statistics which is by far more extensive and extremely useful -when troubleshooting WAN connections. Extending the ifconfig utility to suit -these needs may be possible, but does not seem quite reasonable. Therefore, a -WAN configuration utility and corresponding application programmer's interface -is needed for this purpose. - -Most of these problems are taken care of by this module. Its goal is to -provide a user with more-or-less standard look and feel for all WAN devices and -assist a WAN device driver writer by providing common services, such as: - - o User-level interface via /proc file system - o Centralized configuration - o Device management (setup, shutdown, etc.) - o Network interface management (dynamic creation/destruction) - o Protocol encapsulation/decapsulation - -To ba able to use the Linux WAN Router you will also need a WAN Tools package -available from - - ftp.sangoma.com/pub/linux/current_wanpipe/wanpipe-X.Y.Z.tgz - -where vX.Y.Z represent the wanpipe version number. - -For technical questions and/or comments please e-mail to ncorbic@sangoma.com. -For general inquiries please contact Sangoma Technologies Inc. by - - Hotline: 1-800-388-2475 (USA and Canada, toll free) - Phone: (905) 474-1990 ext: 106 - Fax: (905) 474-9223 - E-mail: dm@sangoma.com (David Mandelstam) - WWW: http://www.sangoma.com - - -INSTALLATION - -Please read the WanpipeForLinux.pdf manual on how to -install the WANPIPE tools and drivers properly. - - -After installing wanpipe package: /usr/local/wanrouter/doc. -On the ftp.sangoma.com : /linux/current_wanpipe/doc - - -COPYRIGHT AND LICENSING INFORMATION - -This program is free software; you can redistribute it and/or modify it under -the terms of the GNU General Public License as published by the Free Software -Foundation; either version 2, or (at your option) any later version. - -This program is distributed in the hope that it will be useful, but WITHOUT -ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS -FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. - -You should have received a copy of the GNU General Public License along with -this program; if not, write to the Free Software Foundation, Inc., 675 Mass -Ave, Cambridge, MA 02139, USA. - - - -ACKNOWLEDGEMENTS - -This product is based on the WANPIPE(tm) Multiprotocol WAN Router developed -by Sangoma Technologies Inc. for Linux 2.0.x and 2.2.x. Success of the WANPIPE -together with the next major release of Linux kernel in summer 1996 commanded -adequate changes to the WANPIPE code to take full advantage of new Linux -features. - -Instead of continuing developing proprietary interface tied to Sangoma WAN -cards, we decided to separate all hardware-independent code into a separate -module and defined two levels of interfaces - one for user-level applications -and another for kernel-level WAN drivers. WANPIPE is now implemented as a -WAN driver compliant with the WAN Link Driver interface. Also a general -purpose WAN configuration utility and a set of shell scripts was developed to -support WAN router at the user level. - -Many useful ideas concerning hardware-independent interface implementation -were given by Mike McLagan <mike.mclagan@linux.org> and his implementation -of the Frame Relay router and drivers for Sangoma cards (dlci/sdla). - -With the new implementation of the APIs being incorporated into the WANPIPE, -a special thank goes to Alan Cox in providing insight into BSD sockets. - -Special thanks to all the WANPIPE users who performed field-testing, reported -bugs and made valuable comments and suggestions that help us to improve this -product. - - - -NEW IN THIS RELEASE - - o Updated the WANCFG utility - Calls the pppconfig to configure the PPPD - for async connections. - - o Added the PPPCONFIG utility - Used to configure the PPPD dameon for the - WANPIPE Async PPP and standard serial port. - The wancfg calls the pppconfig to configure - the pppd. - - o Fixed the PCI autodetect feature. - The SLOT 0 was used as an autodetect option - however, some high end PC's slot numbers start - from 0. - - o This release has been tested with the new backupd - daemon release. - - -PRODUCT COMPONENTS AND RELATED FILES - -/etc: (or user defined) - wanpipe1.conf default router configuration file - -/lib/modules/X.Y.Z/misc: - wanrouter.o router kernel loadable module - af_wanpipe.o wanpipe api socket module - -/lib/modules/X.Y.Z/net: - sdladrv.o Sangoma SDLA support module - wanpipe.o Sangoma WANPIPE(tm) driver module - -/proc/net/wanrouter - Config reads current router configuration - Status reads current router status - {name} reads WAN driver statistics - -/usr/sbin: - wanrouter wanrouter start-up script - wanconfig wanrouter configuration utility - sdladump WANPIPE adapter memory dump utility - fpipemon Monitor for Frame Relay - cpipemon Monitor for Cisco HDLC - ppipemon Monitor for PPP - xpipemon Monitor for X25 - wpkbdmon WANPIPE keyboard led monitor/debugger - -/usr/local/wanrouter: - README this file - COPYING GNU General Public License - Setup installation script - Filelist distribution definition file - wanrouter.rc meta-configuration file - (used by the Setup and wanrouter script) - -/usr/local/wanrouter/doc: - wanpipeForLinux.pdf WAN Router User's Manual - -/usr/local/wanrouter/patches: - wanrouter-v2213.gz patch for Linux kernels 2.2.11 up to 2.2.13. - wanrouter-v2214.gz patch for Linux kernel 2.2.14. - wanrouter-v2215.gz patch for Linux kernels 2.2.15 to 2.2.17. - wanrouter-v2218.gz patch for Linux kernels 2.2.18 and up. - wanrouter-v240.gz patch for Linux kernel 2.4.0. - wanrouter-v242.gz patch for Linux kernel 2.4.2 and up. - wanrouter-v2034.gz patch for Linux kernel 2.0.34 - wanrouter-v2036.gz patch for Linux kernel 2.0.36 and up. - -/usr/local/wanrouter/patches/kdrivers: - Sources of the latest WANPIPE device drivers. - These are used to UPGRADE the linux kernel to the newest - version if the kernel source has already been pathced with - WANPIPE drivers. - -/usr/local/wanrouter/samples: - interface sample interface configuration file - wanpipe1.cpri CHDLC primary port - wanpipe2.csec CHDLC secondary port - wanpipe1.fr Frame Relay protocol - wanpipe1.ppp PPP protocol ) - wanpipe1.asy CHDLC ASYNC protocol - wanpipe1.x25 X25 protocol - wanpipe1.stty Sync TTY driver (Used by Kernel PPPD daemon) - wanpipe1.atty Async TTY driver (Used by Kernel PPPD daemon) - wanrouter.rc sample meta-configuration file - -/usr/local/wanrouter/util: - * wan-tools utilities source code - -/usr/local/wanrouter/api/x25: - * x25 api sample programs. -/usr/local/wanrouter/api/chdlc: - * chdlc api sample programs. -/usr/local/wanrouter/api/fr: - * fr api sample programs. -/usr/local/wanrouter/config/wancfg: - wancfg WANPIPE GUI configuration program. - Creates wanpipe#.conf files. -/usr/local/wanrouter/config/cfgft1: - cfgft1 GUI CSU/DSU configuration program. - -/usr/include/linux: - wanrouter.h router API definitions - wanpipe.h WANPIPE API definitions - sdladrv.h SDLA support module API definitions - sdlasfm.h SDLA firmware module definitions - if_wanpipe.h WANPIPE Socket definitions - if_wanpipe_common.h WANPIPE Socket/Driver common definitions. - sdlapci.h WANPIPE PCI definitions - - -/usr/src/linux/net/wanrouter: - * wanrouter source code - -/var/log: - wanrouter wanrouter start-up log (created by the Setup script) - -/var/lock: (or /var/lock/subsys for RedHat) - wanrouter wanrouter lock file (created by the Setup script) - -/usr/local/wanrouter/firmware: - fr514.sfm Frame relay firmware for Sangoma S508/S514 card - cdual514.sfm Dual Port Cisco HDLC firmware for Sangoma S508/S514 card - ppp514.sfm PPP Firmware for Sangoma S508 and S514 cards - x25_508.sfm X25 Firmware for Sangoma S508 card. - - -REVISION HISTORY - -1.0.0 December 31, 1996 Initial version - -1.0.1 January 30, 1997 Status and statistics can be read via /proc - filesystem entries. - -1.0.2 April 30, 1997 Added UDP management via monitors. - -1.0.3 June 3, 1997 UDP management for multiple boards using Frame - Relay and PPP - Enabled continuous transmission of Configure - Request Packet for PPP (for 508 only) - Connection Timeout for PPP changed from 900 to 0 - Flow Control Problem fixed for Frame Relay - -1.0.4 July 10, 1997 S508/FT1 monitoring capability in fpipemon and - ppipemon utilities. - Configurable TTL for UDP packets. - Multicast and Broadcast IP source addresses are - silently discarded. - -1.0.5 July 28, 1997 Configurable T391,T392,N391,N392,N393 for Frame - Relay in router.conf. - Configurable Memory Address through router.conf - for Frame Relay, PPP and X.25. (commenting this - out enables auto-detection). - Fixed freeing up received buffers using kfree() - for Frame Relay and X.25. - Protect sdla_peek() by calling save_flags(), - cli() and restore_flags(). - Changed number of Trace elements from 32 to 20 - Added DLCI specific data monitoring in FPIPEMON. -2.0.0 Nov 07, 1997 Implemented protection of RACE conditions by - critical flags for FRAME RELAY and PPP. - DLCI List interrupt mode implemented. - IPX support in FRAME RELAY and PPP. - IPX Server Support (MARS) - More driver specific stats included in FPIPEMON - and PIPEMON. - -2.0.1 Nov 28, 1997 Bug Fixes for version 2.0.0. - Protection of "enable_irq()" while - "disable_irq()" has been enabled from any other - routine (for Frame Relay, PPP and X25). - Added additional Stats for Fpipemon and Ppipemon - Improved Load Sharing for multiple boards - -2.0.2 Dec 09, 1997 Support for PAP and CHAP for ppp has been - implemented. - -2.0.3 Aug 15, 1998 New release supporting Cisco HDLC, CIR for Frame - relay, Dynamic IP assignment for PPP and Inverse - Arp support for Frame-relay. Man Pages are - included for better support and a new utility - for configuring FT1 cards. - -2.0.4 Dec 09, 1998 Dual Port support for Cisco HDLC. - Support for HDLC (LAPB) API. - Supports BiSync Streaming code for S502E - and S503 cards. - Support for Streaming HDLC API. - Provides a BSD socket interface for - creating applications using BiSync - streaming. - -2.0.5 Aug 04, 1999 CHDLC initializatin bug fix. - PPP interrupt driven driver: - Fix to the PPP line hangup problem. - New PPP firmware - Added comments to the startup SYSTEM ERROR messages - Xpipemon debugging application for the X25 protocol - New USER_MANUAL.txt - Fixed the odd boundary 4byte writes to the board. - BiSync Streaming code has been taken out. - Available as a patch. - Streaming HDLC API has been taken out. - Available as a patch. - -2.0.6 Aug 17, 1999 Increased debugging in statup scripts - Fixed insallation bugs from 2.0.5 - Kernel patch works for both 2.2.10 and 2.2.11 kernels. - There is no functional difference between the two packages - -2.0.7 Aug 26, 1999 o Merged X25API code into WANPIPE. - o Fixed a memeory leak for X25API - o Updated the X25API code for 2.2.X kernels. - o Improved NEM handling. - -2.1.0 Oct 25, 1999 o New code for S514 PCI Card - o New CHDLC and Frame Relay drivers - o PPP and X25 are not supported in this release - -2.1.1 Nov 30, 1999 o PPP support for S514 PCI Cards - -2.1.3 Apr 06, 2000 o Socket based x25api - o Socket based chdlc api - o Socket based fr api - o Dual Port Receive only CHDLC support. - o Asynchronous CHDLC support (Secondary Port) - o cfgft1 GUI csu/dsu configurator - o wancfg GUI configuration file - configurator. - o Architectual directory changes. - -beta-2.1.4 Jul 2000 o Dynamic interface configuration: - Network interfaces reflect the state - of protocol layer. If the protocol becomes - disconnected, driver will bring down - the interface. Once the protocol reconnects - the interface will be brought up. - - Note: This option is turned off by default. - - o Dynamic wanrouter setup using 'wanconfig': - wanconfig utility can be used to - shutdown,restart,start or reconfigure - a virtual circuit dynamically. - - Frame Relay: Each DLCI can be: - created,stopped,restarted and reconfigured - dynamically using wanconfig. - - ex: wanconfig card wanpipe1 dev wp1_fr16 up - - o Wanrouter startup via command line arguments: - wanconfig also supports wanrouter startup via command line - arguments. Thus, there is no need to create a wanpipe#.conf - configuration file. - - o Socket based x25api update/bug fixes. - Added support for LCN numbers greater than 255. - Option to pass up modem messages. - Provided a PCI IRQ check, so a single S514 - card is guaranteed to have a non-sharing interrupt. - - o Fixes to the wancfg utility. - o New FT1 debugging support via *pipemon utilities. - o Frame Relay ARP support Enabled. - -beta3-2.1.4 Jul 2000 o X25 M_BIT Problem fix. - o Added the Multi-Port PPP - Updated utilites for the Multi-Port PPP. - -2.1.4 Aut 2000 - o In X25API: - Maximum packet an application can send - to the driver has been extended to 4096 bytes. - - Fixed the x25 startup bug. Enable - communications only after all interfaces - come up. HIGH SVC/PVC is used to calculate - the number of channels. - Enable protocol only after all interfaces - are enabled. - - o Added an extra state to the FT1 config, kernel module. - o Updated the pipemon debuggers. - - o Blocked the Multi-Port PPP from running on kernels - 2.2.16 or greater, due to syncppp kernel module - change. - -beta1-2.1.5 Nov 15 2000 - o Fixed the MulitPort PPP Support for kernels 2.2.16 and above. - 2.2.X kernels only - - o Secured the driver UDP debugging calls - - All illegal netowrk debugging calls are reported to - the log. - - Defined a set of allowed commands, all other denied. - - o Cpipemon - - Added set FT1 commands to the cpipemon. Thus CSU/DSU - configuraiton can be performed using cpipemon. - All systems that cannot run cfgft1 GUI utility should - use cpipemon to configure the on board CSU/DSU. - - - o Keyboard Led Monitor/Debugger - - A new utilty /usr/sbin/wpkbdmon uses keyboard leds - to convey operatinal statistic information of the - Sangoma WANPIPE cards. - NUM_LOCK = Line State (On=connected, Off=disconnected) - CAPS_LOCK = Tx data (On=transmitting, Off=no tx data) - SCROLL_LOCK = Rx data (On=receiving, Off=no rx data - - o Hardware probe on module load and dynamic device allocation - - During WANPIPE module load, all Sangoma cards are probed - and found information is printed in the /var/log/messages. - - If no cards are found, the module load fails. - - Appropriate number of devices are dynamically loaded - based on the number of Sangoma cards found. - - Note: The kernel configuraiton option - CONFIG_WANPIPE_CARDS has been taken out. - - o Fixed the Frame Relay and Chdlc network interfaces so they are - compatible with libpcap libraries. Meaning, tcpdump, snort, - ethereal, and all other packet sniffers and debuggers work on - all WANPIPE netowrk interfaces. - - Set the network interface encoding type to ARPHRD_PPP. - This tell the sniffers that data obtained from the - network interface is in pure IP format. - Fix for 2.2.X kernels only. - - o True interface encoding option for Frame Relay and CHDLC - - The above fix sets the network interface encoding - type to ARPHRD_PPP, however some customers use - the encoding interface type to determine the - protocol running. Therefore, the TURE ENCODING - option will set the interface type back to the - original value. - - NOTE: If this option is used with Frame Relay and CHDLC - libpcap library support will be broken. - i.e. tcpdump will not work. - Fix for 2.2.x Kernels only. - - o Ethernet Bridgind over Frame Relay - - The Frame Relay bridging has been developed by - Kristian Hoffmann and Mark Wells. - - The Linux kernel bridge is used to send ethernet - data over the frame relay links. - For 2.2.X Kernels only. - - o Added extensive 2.0.X support. Most new features of - 2.1.5 for protocols Frame Relay, PPP and CHDLC are - supported under 2.0.X kernels. - -beta1-2.2.0 Dec 30 2000 - o Updated drivers for 2.4.X kernels. - o Updated drivers for SMP support. - o X25API is now able to share PCI interrupts. - o Took out a general polling routine that was used - only by X25API. - o Added appropriate locks to the dynamic reconfiguration - code. - o Fixed a bug in the keyboard debug monitor. - -beta2-2.2.0 Jan 8 2001 - o Patches for 2.4.0 kernel - o Patches for 2.2.18 kernel - o Minor updates to PPP and CHLDC drivers. - Note: No functinal difference. - -beta3-2.2.9 Jan 10 2001 - o I missed the 2.2.18 kernel patches in beta2-2.2.0 - release. They are included in this release. - -Stable Release -2.2.0 Feb 01 2001 - o Bug fix in wancfg GUI configurator. - The edit function didn't work properly. - - -bata1-2.2.1 Feb 09 2001 - o WANPIPE TTY Driver emulation. - Two modes of operation Sync and Async. - Sync: Using the PPPD daemon, kernel SyncPPP layer - and the Wanpipe sync TTY driver: a PPP protocol - connection can be established via Sangoma adapter, over - a T1 leased line. - - The 2.4.0 kernel PPP layer supports MULTILINK - protocol, that can be used to bundle any number of Sangoma - adapters (T1 lines) into one, under a single IP address. - Thus, efficiently obtaining multiple T1 throughput. - - NOTE: The remote side must also implement MULTILINK PPP - protocol. - - Async:Using the PPPD daemon, kernel AsyncPPP layer - and the WANPIPE async TTY driver: a PPP protocol - connection can be established via Sangoma adapter and - a modem, over a telephone line. - - Thus, the WANPIPE async TTY driver simulates a serial - TTY driver that would normally be used to interface the - MODEM to the linux kernel. - - o WANPIPE PPP Backup Utility - This utility will monitor the state of the PPP T1 line. - In case of failure, a dial up connection will be established - via pppd daemon, ether via a serial tty driver (serial port), - or a WANPIPE async TTY driver (in case serial port is unavailable). - - Furthermore, while in dial up mode, the primary PPP T1 link - will be monitored for signs of life. - - If the PPP T1 link comes back to life, the dial up connection - will be shutdown and T1 line re-established. - - - o New Setup installation script. - Option to UPGRADE device drivers if the kernel source has - already been patched with WANPIPE. - - Option to COMPILE WANPIPE modules against the currently - running kernel, thus no need for manual kernel and module - re-compilatin. - - o Updates and Bug Fixes to wancfg utility. - -bata2-2.2.1 Feb 20 2001 - - o Bug fixes to the CHDLC device drivers. - The driver had compilation problems under kernels - 2.2.14 or lower. - - o Bug fixes to the Setup installation script. - The device drivers compilation options didn't work - properly. - - o Update to the wpbackupd daemon. - Optimized the cross-over times, between the primary - link and the backup dialup. - -beta3-2.2.1 Mar 02 2001 - o Patches for 2.4.2 kernel. - - o Bug fixes to util/ make files. - o Bug fixes to the Setup installation script. - - o Took out the backupd support and made it into - as separate package. - -beta4-2.2.1 Mar 12 2001 - - o Fix to the Frame Relay Device driver. - IPSAC sends a packet of zero length - header to the frame relay driver. The - driver tries to push its own 2 byte header - into the packet, which causes the driver to - crash. - - o Fix the WANPIPE re-configuration code. - Bug was found by trying to run the cfgft1 while the - interface was already running. - - o Updates to cfgft1. - Writes a wanpipe#.cfgft1 configuration file - once the CSU/DSU is configured. This file can - holds the current CSU/DSU configuration. - - - ->>>>>> END OF README <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< - - diff --git a/Documentation/pci.txt b/Documentation/pci.txt index 62b1dc5d97e..76d28d03365 100644 --- a/Documentation/pci.txt +++ b/Documentation/pci.txt @@ -266,20 +266,6 @@ port an old driver to the new PCI interface. They are no longer present in the kernel as they aren't compatible with hotplug or PCI domains or having sane locking. -pcibios_present() and Since ages, you don't need to test presence -pci_present() of PCI subsystem when trying to talk to it. - If it's not there, the list of PCI devices - is empty and all functions for searching for - devices just return NULL. -pcibios_(read|write)_* Superseded by their pci_(read|write)_* - counterparts. -pcibios_find_* Superseded by their pci_get_* counterparts. -pci_for_each_dev() Superseded by pci_get_device() -pci_for_each_dev_reverse() Superseded by pci_find_device_reverse() -pci_for_each_bus() Superseded by pci_find_next_bus() pci_find_device() Superseded by pci_get_device() pci_find_subsys() Superseded by pci_get_subsys() pci_find_slot() Superseded by pci_get_slot() -pcibios_find_class() Superseded by pci_get_class() -pci_find_class() Superseded by pci_get_class() -pci_(read|write)_*_nodev() Superseded by pci_bus_(read|write)_*() diff --git a/Documentation/pcmcia/devicetable.txt b/Documentation/pcmcia/devicetable.txt new file mode 100644 index 00000000000..3351c035514 --- /dev/null +++ b/Documentation/pcmcia/devicetable.txt @@ -0,0 +1,63 @@ +Matching of PCMCIA devices to drivers is done using one or more of the +following criteria: + +- manufactor ID +- card ID +- product ID strings _and_ hashes of these strings +- function ID +- device function (actual and pseudo) + +You should use the helpers in include/pcmcia/device_id.h for generating the +struct pcmcia_device_id[] entries which match devices to drivers. + +If you want to match product ID strings, you also need to pass the crc32 +hashes of the string to the macro, e.g. if you want to match the product ID +string 1, you need to use + +PCMCIA_DEVICE_PROD_ID1("some_string", 0x(hash_of_some_string)), + +If the hash is incorrect, the kernel will inform you about this in "dmesg" +upon module initialization, and tell you of the correct hash. + +You can determine the hash of the product ID strings by catting the file +"modalias" in the sysfs directory of the PCMCIA device. It generates a string +in the following form: +pcmcia:m0149cC1ABf06pfn00fn00pa725B842DpbF1EFEE84pc0877B627pd00000000 + +The hex value after "pa" is the hash of product ID string 1, after "pb" for +string 2 and so on. + +Alternatively, you can use this small tool to determine the crc32 hash. +simply pass the string you want to evaluate as argument to this program, +e.g. +$ ./crc32hash "Dual Speed" + +------------------------------------------------------------------------- +/* crc32hash.c - derived from linux/lib/crc32.c, GNU GPL v2 */ +#include <string.h> +#include <stdio.h> +#include <ctype.h> +#include <stdlib.h> + +unsigned int crc32(unsigned char const *p, unsigned int len) +{ + int i; + unsigned int crc = 0; + while (len--) { + crc ^= *p++; + for (i = 0; i < 8; i++) + crc = (crc >> 1) ^ ((crc & 1) ? 0xedb88320 : 0); + } + return crc; +} + +int main(int argc, char **argv) { + unsigned int result; + if (argc != 2) { + printf("no string passed as argument\n"); + return -1; + } + result = crc32(argv[1], strlen(argv[1])); + printf("0x%x\n", result); + return 0; +} diff --git a/Documentation/pcmcia/driver-changes.txt b/Documentation/pcmcia/driver-changes.txt new file mode 100644 index 00000000000..403e7b4dcdd --- /dev/null +++ b/Documentation/pcmcia/driver-changes.txt @@ -0,0 +1,67 @@ +This file details changes in 2.6 which affect PCMCIA card driver authors: + +* event handler initialization in struct pcmcia_driver (as of 2.6.13) + The event handler is notified of all events, and must be initialized + as the event() callback in the driver's struct pcmcia_driver. + +* pcmcia/version.h should not be used (as of 2.6.13) + This file will be removed eventually. + +* in-kernel device<->driver matching (as of 2.6.13) + PCMCIA devices and their correct drivers can now be matched in + kernelspace. See 'devicetable.txt' for details. + +* Device model integration (as of 2.6.11) + A struct pcmcia_device is registered with the device model core, + and can be used (e.g. for SET_NETDEV_DEV) by using + handle_to_dev(client_handle_t * handle). + +* Convert internal I/O port addresses to unsigned long (as of 2.6.11) + ioaddr_t should be replaced by kio_addr_t in PCMCIA card drivers. + +* irq_mask and irq_list parameters (as of 2.6.11) + The irq_mask and irq_list parameters should no longer be used in + PCMCIA card drivers. Instead, it is the job of the PCMCIA core to + determine which IRQ should be used. Therefore, link->irq.IRQInfo2 + is ignored. + +* client->PendingEvents is gone (as of 2.6.11) + client->PendingEvents is no longer available. + +* client->Attributes are gone (as of 2.6.11) + client->Attributes is unused, therefore it is removed from all + PCMCIA card drivers + +* core functions no longer available (as of 2.6.11) + The following functions have been removed from the kernel source + because they are unused by all in-kernel drivers, and no external + driver was reported to rely on them: + pcmcia_get_first_region() + pcmcia_get_next_region() + pcmcia_modify_window() + pcmcia_set_event_mask() + pcmcia_get_first_window() + pcmcia_get_next_window() + +* device list iteration upon module removal (as of 2.6.10) + It is no longer necessary to iterate on the driver's internal + client list and call the ->detach() function upon module removal. + +* Resource management. (as of 2.6.8) + Although the PCMCIA subsystem will allocate resources for cards, + it no longer marks these resources busy. This means that driver + authors are now responsible for claiming your resources as per + other drivers in Linux. You should use request_region() to mark + your IO regions in-use, and request_mem_region() to mark your + memory regions in-use. The name argument should be a pointer to + your driver name. Eg, for pcnet_cs, name should point to the + string "pcnet_cs". + +* CardServices is gone + CardServices() in 2.4 is just a big switch statement to call various + services. In 2.6, all of those entry points are exported and called + directly (except for pcmcia_report_error(), just use cs_error() instead). + +* struct pcmcia_driver + You need to use struct pcmcia_driver and pcmcia_{un,}register_driver + instead of {un,}register_pccard_driver diff --git a/Documentation/power/kernel_threads.txt b/Documentation/power/kernel_threads.txt index 60b548105ed..fb57784986b 100644 --- a/Documentation/power/kernel_threads.txt +++ b/Documentation/power/kernel_threads.txt @@ -12,8 +12,7 @@ refrigerator. Code to do this looks like this: do { hub_events(); wait_event_interruptible(khubd_wait, !list_empty(&hub_event_list)); - if (current->flags & PF_FREEZE) - refrigerator(PF_FREEZE); + try_to_freeze(); } while (!signal_pending(current)); from drivers/usb/core/hub.c::hub_thread() diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt index 35b1a7dae34..6fc9d511fc3 100644 --- a/Documentation/power/pci.txt +++ b/Documentation/power/pci.txt @@ -291,6 +291,44 @@ a request to enable wake events from D3, two calls should be made to pci_enable_wake (one for both D3hot and D3cold). +A reference implementation +------------------------- +.suspend() +{ + /* driver specific operations */ + + /* Disable IRQ */ + free_irq(); + /* If using MSI */ + pci_disable_msi(); + + pci_save_state(); + pci_enable_wake(); + /* Disable IO/bus master/irq router */ + pci_disable_device(); + pci_set_power_state(pci_choose_state()); +} + +.resume() +{ + pci_set_power_state(PCI_D0); + pci_restore_state(); + /* device's irq possibly is changed, driver should take care */ + pci_enable_device(); + pci_set_master(); + + /* if using MSI, device's vector possibly is changed */ + pci_enable_msi(); + + request_irq(); + /* driver specific operations; */ +} + +This is a typical implementation. Drivers can slightly change the order +of the operations in the implementation, ignore some operations or add +more deriver specific operations in it, but drivers should do something like +this on the whole. + 5. Resources ~~~~~~~~~~~~ diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt index c7c3459fde4..7a6b7896645 100644 --- a/Documentation/power/swsusp.txt +++ b/Documentation/power/swsusp.txt @@ -164,11 +164,11 @@ place where the thread is safe to be frozen (no kernel semaphores should be held at that point and it must be safe to sleep there), and add: - if (current->flags & PF_FREEZE) - refrigerator(PF_FREEZE); + try_to_freeze(); If the thread is needed for writing the image to storage, you should -instead set the PF_NOFREEZE process flag when creating the thread. +instead set the PF_NOFREEZE process flag when creating the thread (and +be very carefull). Q: What is the difference between between "platform", "shutdown" and @@ -233,3 +233,81 @@ A: Try running cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null after resume. swapoff -a; swapon -a may also be usefull. + +Q: What happens to devices during swsusp? They seem to be resumed +during system suspend? + +A: That's correct. We need to resume them if we want to write image to +disk. Whole sequence goes like + + Suspend part + ~~~~~~~~~~~~ + running system, user asks for suspend-to-disk + + user processes are stopped + + suspend(PMSG_FREEZE): devices are frozen so that they don't interfere + with state snapshot + + state snapshot: copy of whole used memory is taken with interrupts disabled + + resume(): devices are woken up so that we can write image to swap + + write image to swap + + suspend(PMSG_SUSPEND): suspend devices so that we can power off + + turn the power off + + Resume part + ~~~~~~~~~~~ + (is actually pretty similar) + + running system, user asks for suspend-to-disk + + user processes are stopped (in common case there are none, but with resume-from-initrd, noone knows) + + read image from disk + + suspend(PMSG_FREEZE): devices are frozen so that they don't interfere + with image restoration + + image restoration: rewrite memory with image + + resume(): devices are woken up so that system can continue + + thaw all user processes + +Q: What is this 'Encrypt suspend image' for? + +A: First of all: it is not a replacement for dm-crypt encrypted swap. +It cannot protect your computer while it is suspended. Instead it does +protect from leaking sensitive data after resume from suspend. + +Think of the following: you suspend while an application is running +that keeps sensitive data in memory. The application itself prevents +the data from being swapped out. Suspend, however, must write these +data to swap to be able to resume later on. Without suspend encryption +your sensitive data are then stored in plaintext on disk. This means +that after resume your sensitive data are accessible to all +applications having direct access to the swap device which was used +for suspend. If you don't need swap after resume these data can remain +on disk virtually forever. Thus it can happen that your system gets +broken in weeks later and sensitive data which you thought were +encrypted and protected are retrieved and stolen from the swap device. +To prevent this situation you should use 'Encrypt suspend image'. + +During suspend a temporary key is created and this key is used to +encrypt the data written to disk. When, during resume, the data was +read back into memory the temporary key is destroyed which simply +means that all data written to disk during suspend are then +inaccessible so they can't be stolen later on. The only thing that +you must then take care of is that you call 'mkswap' for the swap +partition used for suspend as early as possible during regular +boot. This asserts that any temporary key from an oopsed suspend or +from a failed or aborted resume is erased from the swap device. + +As a rule of thumb use encrypted swap to protect your data while your +system is shut down or suspended. Additionally use the encrypted +suspend image to prevent sensitive data from being stolen after +resume. diff --git a/Documentation/power/video.txt b/Documentation/power/video.txt index 68734355d7c..7a4a5036d12 100644 --- a/Documentation/power/video.txt +++ b/Documentation/power/video.txt @@ -83,8 +83,10 @@ Compaq Armada E500 - P3-700 none (1) (S1 also works OK) Compaq Evo N620c vga=normal, s3_bios (2) Dell 600m, ATI R250 Lf none (1), but needs xorg-x11-6.8.1.902-1 Dell D600, ATI RV250 vga=normal and X, or try vbestate (6) +Dell D610 vga=normal and X (possibly vbestate (6) too, but not tested) Dell Inspiron 4000 ??? (*) Dell Inspiron 500m ??? (*) +Dell Inspiron 510m ??? Dell Inspiron 600m ??? (*) Dell Inspiron 8200 ??? (*) Dell Inspiron 8500 ??? (*) @@ -115,6 +117,7 @@ IBM Thinkpad X40 Type 2371-7JG s3_bios,s3_mode (4) Medion MD4220 ??? (*) Samsung P35 vbetool needed (6) Sharp PC-AR10 (ATI rage) none (1) +Sony Vaio PCG-C1VRX/K s3_bios (2) Sony Vaio PCG-F403 ??? (*) Sony Vaio PCG-N505SN ??? (*) Sony Vaio vgn-s260 X or boot-radeon can init it (5) @@ -123,6 +126,7 @@ Toshiba Satellite 4030CDT s3_mode (3) Toshiba Satellite 4080XCDT s3_mode (3) Toshiba Satellite 4090XCDT ??? (*) Toshiba Satellite P10-554 s3_bios,s3_mode (4)(****) +Toshiba M30 (2) xor X with nvidia driver using internal AGP Uniwill 244IIO ??? (*) diff --git a/Documentation/power/video_extension.txt b/Documentation/power/video_extension.txt index 8e33d7c82c4..b2f9b1598ac 100644 --- a/Documentation/power/video_extension.txt +++ b/Documentation/power/video_extension.txt @@ -1,13 +1,16 @@ -This driver implement the ACPI Extensions For Display Adapters -for integrated graphics devices on motherboard, as specified in -ACPI 2.0 Specification, Appendix B, allowing to perform some basic -control like defining the video POST device, retrieving EDID information -or to setup a video output, etc. Note that this is an ref. implementation only. -It may or may not work for your integrated video device. +ACPI video extensions +~~~~~~~~~~~~~~~~~~~~~ + +This driver implement the ACPI Extensions For Display Adapters for +integrated graphics devices on motherboard, as specified in ACPI 2.0 +Specification, Appendix B, allowing to perform some basic control like +defining the video POST device, retrieving EDID information or to +setup a video output, etc. Note that this is an ref. implementation +only. It may or may not work for your integrated video device. Interfaces exposed to userland through /proc/acpi/video: -VGA/info : display the supported video bus device capability like ,Video ROM, CRT/LCD/TV. +VGA/info : display the supported video bus device capability like Video ROM, CRT/LCD/TV. VGA/ROM : Used to get a copy of the display devices' ROM data (up to 4k). VGA/POST_info : Used to determine what options are implemented. VGA/POST : Used to get/set POST device. @@ -15,7 +18,7 @@ VGA/DOS : Used to get/set ownership of output switching: Please refer ACPI spec B.4.1 _DOS VGA/CRT : CRT output VGA/LCD : LCD output -VGA/TV : TV output +VGA/TVO : TV output VGA/*/brightness : Used to get/set brightness of output device Notify event through /proc/acpi/event: diff --git a/Documentation/s390/s390dbf.txt b/Documentation/s390/s390dbf.txt index 2d1cd939b4d..e24fdeada97 100644 --- a/Documentation/s390/s390dbf.txt +++ b/Documentation/s390/s390dbf.txt @@ -12,8 +12,8 @@ where log records can be stored efficiently in memory, where each component One purpose of this is to inspect the debug logs after a production system crash in order to analyze the reason for the crash. If the system still runs but only a subcomponent which uses dbf failes, -it is possible to look at the debug logs on a live system via the Linux proc -filesystem. +it is possible to look at the debug logs on a live system via the Linux +debugfs filesystem. The debug feature may also very useful for kernel and driver development. Design: @@ -52,16 +52,18 @@ Each debug entry contains the following data: - Flag, if entry is an exception or not The debug logs can be inspected in a live system through entries in -the proc-filesystem. Under the path /proc/s390dbf there is +the debugfs-filesystem. Under the toplevel directory "s390dbf" there is a directory for each registered component, which is named like the -corresponding component. +corresponding component. The debugfs normally should be mounted to +/sys/kernel/debug therefore the debug feature can be accessed unter +/sys/kernel/debug/s390dbf. The content of the directories are files which represent different views to the debug log. Each component can decide which views should be used through registering them with the function debug_register_view(). Predefined views for hex/ascii, sprintf and raw binary data are provided. It is also possible to define other views. The content of -a view can be inspected simply by reading the corresponding proc file. +a view can be inspected simply by reading the corresponding debugfs file. All debug logs have an an actual debug level (range from 0 to 6). The default level is 3. Event and Exception functions have a 'level' @@ -69,14 +71,14 @@ parameter. Only debug entries with a level that is lower or equal than the actual level are written to the log. This means, when writing events, high priority log entries should have a low level value whereas low priority entries should have a high one. -The actual debug level can be changed with the help of the proc-filesystem -through writing a number string "x" to the 'level' proc file which is +The actual debug level can be changed with the help of the debugfs-filesystem +through writing a number string "x" to the 'level' debugfs file which is provided for every debug log. Debugging can be switched off completely -by using "-" on the 'level' proc file. +by using "-" on the 'level' debugfs file. Example: -> echo "-" > /proc/s390dbf/dasd/level +> echo "-" > /sys/kernel/debug/s390dbf/dasd/level It is also possible to deactivate the debug feature globally for every debug log. You can change the behavior using 2 sysctl parameters in @@ -99,11 +101,11 @@ Kernel Interfaces: ------------------ ---------------------------------------------------------------------------- -debug_info_t *debug_register(char *name, int pages_index, int nr_areas, +debug_info_t *debug_register(char *name, int pages, int nr_areas, int buf_size); -Parameter: name: Name of debug log (e.g. used for proc entry) - pages_index: 2^pages_index pages will be allocated per area +Parameter: name: Name of debug log (e.g. used for debugfs entry) + pages: number of pages, which will be allocated per area nr_areas: number of debug areas buf_size: size of data area in each debug entry @@ -134,7 +136,7 @@ Return Value: none Description: Sets new actual debug level if new_level is valid. --------------------------------------------------------------------------- -+void debug_stop_all(void); +void debug_stop_all(void); Parameter: none @@ -270,7 +272,7 @@ Parameter: id: handle for debug log Return Value: 0 : ok < 0: Error -Description: registers new debug view and creates proc dir entry +Description: registers new debug view and creates debugfs dir entry --------------------------------------------------------------------------- int debug_unregister_view (debug_info_t * id, struct debug_view *view); @@ -281,7 +283,7 @@ Parameter: id: handle for debug log Return Value: 0 : ok < 0: Error -Description: unregisters debug view and removes proc dir entry +Description: unregisters debug view and removes debugfs dir entry @@ -308,7 +310,7 @@ static int init(void) { /* register 4 debug areas with one page each and 4 byte data field */ - debug_info = debug_register ("test", 0, 4, 4 ); + debug_info = debug_register ("test", 1, 4, 4 ); debug_register_view(debug_info,&debug_hex_ascii_view); debug_register_view(debug_info,&debug_raw_view); @@ -343,7 +345,7 @@ static int init(void) /* register 4 debug areas with one page each and data field for */ /* format string pointer + 2 varargs (= 3 * sizeof(long)) */ - debug_info = debug_register ("test", 0, 4, sizeof(long) * 3); + debug_info = debug_register ("test", 1, 4, sizeof(long) * 3); debug_register_view(debug_info,&debug_sprintf_view); debug_sprintf_event(debug_info, 2 , "first event in %s:%i\n",__FILE__,__LINE__); @@ -362,16 +364,16 @@ module_exit(cleanup); -ProcFS Interface +Debugfs Interface ---------------- Views to the debug logs can be investigated through reading the corresponding -proc-files: +debugfs-files: Example: -> ls /proc/s390dbf/dasd -flush hex_ascii level raw -> cat /proc/s390dbf/dasd/hex_ascii | sort +1 +> ls /sys/kernel/debug/s390dbf/dasd +flush hex_ascii level pages raw +> cat /sys/kernel/debug/s390dbf/dasd/hex_ascii | sort +1 00 00974733272:680099 2 - 02 0006ad7e 07 ea 4a 90 | .... 00 00974733272:682210 2 - 02 0006ade6 46 52 45 45 | FREE 00 00974733272:682213 2 - 02 0006adf6 07 ea 4a 90 | .... @@ -391,25 +393,36 @@ Changing the debug level Example: -> cat /proc/s390dbf/dasd/level +> cat /sys/kernel/debug/s390dbf/dasd/level 3 -> echo "5" > /proc/s390dbf/dasd/level -> cat /proc/s390dbf/dasd/level +> echo "5" > /sys/kernel/debug/s390dbf/dasd/level +> cat /sys/kernel/debug/s390dbf/dasd/level 5 Flushing debug areas -------------------- Debug areas can be flushed with piping the number of the desired -area (0...n) to the proc file "flush". When using "-" all debug areas +area (0...n) to the debugfs file "flush". When using "-" all debug areas are flushed. Examples: 1. Flush debug area 0: -> echo "0" > /proc/s390dbf/dasd/flush +> echo "0" > /sys/kernel/debug/s390dbf/dasd/flush 2. Flush all debug areas: -> echo "-" > /proc/s390dbf/dasd/flush +> echo "-" > /sys/kernel/debug/s390dbf/dasd/flush + +Changing the size of debug areas +------------------------------------ +It is possible the change the size of debug areas through piping +the number of pages to the debugfs file "pages". The resize request will +also flush the debug areas. + +Example: + +Define 4 pages for the debug areas of debug feature "dasd": +> echo "4" > /sys/kernel/debug/s390dbf/dasd/pages Stooping the debug feature -------------------------- @@ -491,7 +504,7 @@ Defining views -------------- Views are specified with the 'debug_view' structure. There are defined -callback functions which are used for reading and writing the proc files: +callback functions which are used for reading and writing the debugfs files: struct debug_view { char name[DEBUG_MAX_PROCF_LEN]; @@ -525,7 +538,7 @@ typedef int (debug_input_proc_t) (debug_info_t* id, The "private_data" member can be used as pointer to view specific data. It is not used by the debug feature itself. -The output when reading a debug-proc file is structured like this: +The output when reading a debugfs file is structured like this: "prolog_proc output" @@ -534,13 +547,13 @@ The output when reading a debug-proc file is structured like this: "header_proc output 3" "format_proc output 3" ... -When a view is read from the proc fs, the Debug Feature calls the +When a view is read from the debugfs, the Debug Feature calls the 'prolog_proc' once for writing the prolog. Then 'header_proc' and 'format_proc' are called for each existing debug entry. The input_proc can be used to implement functionality when it is written to -the view (e.g. like with 'echo "0" > /proc/s390dbf/dasd/level). +the view (e.g. like with 'echo "0" > /sys/kernel/debug/s390dbf/dasd/level). For header_proc there can be used the default function debug_dflt_header_fn() which is defined in in debug.h. @@ -602,7 +615,7 @@ debug_info = debug_register ("test", 0, 4, 4 )); debug_register_view(debug_info, &debug_test_view); for(i = 0; i < 10; i ++) debug_int_event(debug_info, 1, i); -> cat /proc/s390dbf/test/myview +> cat /sys/kernel/debug/s390dbf/test/myview 00 00964419734:611402 1 - 00 88042ca This error........... 00 00964419734:611405 1 - 00 88042ca That error........... 00 00964419734:611408 1 - 00 88042ca Problem.............. diff --git a/Documentation/scsi/scsi_mid_low_api.txt b/Documentation/scsi/scsi_mid_low_api.txt index da176c95d0f..7536823c0cb 100644 --- a/Documentation/scsi/scsi_mid_low_api.txt +++ b/Documentation/scsi/scsi_mid_low_api.txt @@ -388,7 +388,6 @@ Summary: scsi_remove_device - detach and remove a SCSI device scsi_remove_host - detach and remove all SCSI devices owned by host scsi_report_bus_reset - report scsi _bus_ reset observed - scsi_set_device - place device reference in host structure scsi_track_queue_full - track successive QUEUE_FULL events scsi_unblock_requests - allow further commands to be queued to given host scsi_unregister - [calls scsi_host_put()] @@ -741,20 +740,6 @@ void scsi_report_bus_reset(struct Scsi_Host * shost, int channel) /** - * scsi_set_device - place device reference in host structure - * @shost: a pointer to a scsi host instance - * @pdev: pointer to device instance to assign - * - * Returns nothing - * - * Might block: no - * - * Defined in: include/scsi/scsi_host.h . - **/ -void scsi_set_device(struct Scsi_Host * shost, struct device * dev) - - -/** * scsi_track_queue_full - track successive QUEUE_FULL events on given * device to determine if and when there is a need * to adjust the queue depth on the device. diff --git a/Documentation/serial/driver b/Documentation/serial/driver index e9c0178cd20..ac7eabbf662 100644 --- a/Documentation/serial/driver +++ b/Documentation/serial/driver @@ -107,8 +107,8 @@ hardware. indicate that the signal is permanently active. If RI is not available, the signal should not be indicated as active. - Locking: none. - Interrupts: caller dependent. + Locking: port->lock taken. + Interrupts: locally disabled. This call must not sleep stop_tx(port,tty_stop) diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt index 104a994b828..a18ecb92b35 100644 --- a/Documentation/sound/alsa/ALSA-Configuration.txt +++ b/Documentation/sound/alsa/ALSA-Configuration.txt @@ -636,11 +636,16 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. 3stack-digout 3-jack in back, a HP out and a SPDIF out 5stack 5-jack in back, 2-jack in front 5stack-digout 5-jack in back, 2-jack in front, a SPDIF out + 6stack 6-jack in back, 2-jack in front + 6stack-digout 6-jack with a SPDIF out w810 3-jack z71v 3-jack (HP shared SPDIF) asus 3-jack uniwill 3-jack F1734 2-jack + test for testing/debugging purpose, almost all controls can be + adjusted. Appearing only when compiled with + $CONFIG_SND_DEBUG=y CMI9880 minimal 3-jack in back @@ -1054,6 +1059,13 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. The power-management is supported. + Module snd-pxa2xx-ac97 (on arm only) + ------------------------------------ + + Module for AC97 driver for the Intel PXA2xx chip + + For ARM architecture only. + Module snd-rme32 ---------------- @@ -1173,6 +1185,13 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. Module supports up to 8 cards. + Module snd-sun-dbri (on sparc only) + ----------------------------------- + + Module for DBRI sound chips found on Sparcs. + + Module supports up to 8 cards. + Module snd-wavefront -------------------- @@ -1371,7 +1390,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. Module snd-vxpocket ------------------- - Module for Digigram VX-Pocket VX2 PCMCIA card. + Module for Digigram VX-Pocket VX2 and 440 PCMCIA cards. ibl - Capture IBL size. (default = 0, minimum size) @@ -1391,29 +1410,6 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. Note: the driver is build only when CONFIG_ISA is set. - Module snd-vxp440 - ----------------- - - Module for Digigram VX-Pocket 440 PCMCIA card. - - ibl - Capture IBL size. (default = 0, minimum size) - - Module supports up to 8 cards. The module is compiled only when - PCMCIA is supported on kernel. - - To activate the driver via the card manager, you'll need to set - up /etc/pcmcia/vxp440.conf. See the sound/pcmcia/vx/vxp440.c. - - When the driver is compiled as a module and the hotplug firmware - is supported, the firmware data is loaded via hotplug automatically. - Install the necessary firmware files in alsa-firmware package. - When no hotplug fw loader is available, you need to load the - firmware via vxloader utility in alsa-tools package. - - About capture IBL, see the description of snd-vx222 module. - - Note: the driver is build only when CONFIG_ISA is set. - Module snd-ymfpci ----------------- diff --git a/Documentation/stable_api_nonsense.txt b/Documentation/stable_api_nonsense.txt index 3cea1387527..f39c9d714db 100644 --- a/Documentation/stable_api_nonsense.txt +++ b/Documentation/stable_api_nonsense.txt @@ -132,7 +132,7 @@ to extra work for the USB developers. Since all Linux USB developers do their work on their own time, asking programmers to do extra work for no gain, for free, is not a possibility. -Security issues are also a very important for Linux. When a +Security issues are also very important for Linux. When a security issue is found, it is fixed in a very short amount of time. A number of times this has caused internal kernel interfaces to be reworked to prevent the security problem from occurring. When this diff --git a/Documentation/stable_kernel_rules.txt b/Documentation/stable_kernel_rules.txt new file mode 100644 index 00000000000..2c81305090d --- /dev/null +++ b/Documentation/stable_kernel_rules.txt @@ -0,0 +1,58 @@ +Everything you ever wanted to know about Linux 2.6 -stable releases. + +Rules on what kind of patches are accepted, and what ones are not, into +the "-stable" tree: + + - It must be obviously correct and tested. + - It can not bigger than 100 lines, with context. + - It must fix only one thing. + - It must fix a real bug that bothers people (not a, "This could be a + problem..." type thing.) + - It must fix a problem that causes a build error (but not for things + marked CONFIG_BROKEN), an oops, a hang, data corruption, a real + security issue, or some "oh, that's not good" issue. In short, + something critical. + - No "theoretical race condition" issues, unless an explanation of how + the race can be exploited. + - It can not contain any "trivial" fixes in it (spelling changes, + whitespace cleanups, etc.) + - It must be accepted by the relevant subsystem maintainer. + - It must follow Documentation/SubmittingPatches rules. + + +Procedure for submitting patches to the -stable tree: + + - Send the patch, after verifying that it follows the above rules, to + stable@kernel.org. + - The sender will receive an ack when the patch has been accepted into + the queue, or a nak if the patch is rejected. This response might + take a few days, according to the developer's schedules. + - If accepted, the patch will be added to the -stable queue, for review + by other developers. + - Security patches should not be sent to this alias, but instead to the + documented security@kernel.org. + + +Review cycle: + + - When the -stable maintainers decide for a review cycle, the patches + will be sent to the review committee, and the maintainer of the + affected area of the patch (unless the submitter is the maintainer of + the area) and CC: to the linux-kernel mailing list. + - The review committee has 48 hours in which to ack or nak the patch. + - If the patch is rejected by a member of the committee, or linux-kernel + members object to the patch, bringing up issues that the maintainers + and members did not realize, the patch will be dropped from the + queue. + - At the end of the review cycle, the acked patches will be added to + the latest -stable release, and a new -stable release will happen. + - Security patches will be accepted into the -stable tree directly from + the security kernel team, and not go through the normal review cycle. + Contact the kernel security team for more details on this procedure. + + +Review committe: + + - This will be made up of a number of kernel developers who have + volunteered for this task, and a few that haven't. + diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 35159176997..9f11d36a8c1 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -49,6 +49,7 @@ show up in /proc/sys/kernel: - shmmax [ sysv ipc ] - shmmni - stop-a [ SPARC only ] +- suid_dumpable - sysrq ==> Documentation/sysrq.txt - tainted - threads-max @@ -300,6 +301,25 @@ kernel. This value defaults to SHMMAX. ============================================================== +suid_dumpable: + +This value can be used to query and set the core dump mode for setuid +or otherwise protected/tainted binaries. The modes are + +0 - (default) - traditional behaviour. Any process which has changed + privilege levels or is execute only will not be dumped +1 - (debug) - all processes dump core when possible. The core dump is + owned by the current user and no security is applied. This is + intended for system debugging situations only. Ptrace is unchecked. +2 - (suidsafe) - any binary which normally would not be dumped is dumped + readable by root only. This allows the end user to remove + such a dump but not access it directly. For security reasons + core dumps in this mode will not overwrite one another or + other files. This mode is appropriate when adminstrators are + attempting to debug problems in a normal environment. + +============================================================== + tainted: Non-zero if the kernel has been tainted. Numeric values, which diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt index f98c2e31c14..136d817c01b 100644 --- a/Documentation/sysrq.txt +++ b/Documentation/sysrq.txt @@ -72,6 +72,8 @@ On all - write a character to /proc/sysrq-trigger. eg: 'b' - Will immediately reboot the system without syncing or unmounting your disks. +'c' - Will perform a kexec reboot in order to take a crashdump. + 'o' - Will shut your system off (if configured and supported). 's' - Will attempt to sync all mounted filesystems. @@ -122,6 +124,9 @@ useful when you want to exit a program that will not let you switch consoles. re'B'oot is good when you're unable to shut down. But you should also 'S'ync and 'U'mount first. +'C'rashdump can be used to manually trigger a crashdump when the system is hung. +The kernel needs to have been built with CONFIG_KEXEC enabled. + 'S'ync is great when your system is locked up, it allows you to sync your disks and will certainly lessen the chance of data loss and fscking. Note that the sync hasn't taken place until you see the "OK" and "Done" appear diff --git a/Documentation/tty.txt b/Documentation/tty.txt index 3958cf746dd..8ff7bc2a081 100644 --- a/Documentation/tty.txt +++ b/Documentation/tty.txt @@ -22,7 +22,7 @@ copy of the structure. You must not re-register over the top of the line discipline even with the same data or your computer again will be eaten by demons. -In order to remove a line discipline call tty_register_ldisc passing NULL. +In order to remove a line discipline call tty_unregister_ldisc(). In ancient times this always worked. In modern times the function will return -EBUSY if the ldisc is currently in use. Since the ldisc referencing code manages the module counts this should not usually be a concern. diff --git a/Documentation/usb/sn9c102.txt b/Documentation/usb/sn9c102.txt index cf9a1187edc..3f8a119db31 100644 --- a/Documentation/usb/sn9c102.txt +++ b/Documentation/usb/sn9c102.txt @@ -297,6 +297,7 @@ Vendor ID Product ID 0x0c45 0x602a 0x0c45 0x602b 0x0c45 0x602c +0x0c45 0x602d 0x0c45 0x6030 0x0c45 0x6080 0x0c45 0x6082 @@ -333,6 +334,7 @@ Model Manufacturer ----- ------------ HV7131D Hynix Semiconductor, Inc. MI-0343 Micron Technology, Inc. +OV7630 OmniVision Technologies, Inc. PAS106B PixArt Imaging, Inc. PAS202BCB PixArt Imaging, Inc. TAS5110C1B Taiwan Advanced Sensor Corporation @@ -470,9 +472,11 @@ order): - Luca Capello for the donation of a webcam; - Joao Rodrigo Fuzaro, Joao Limirio, Claudio Filho and Caio Begotti for the donation of a webcam; +- Jon Hollstrom for the donation of a webcam; - Carlos Eduardo Medaglia Dyonisio, who added the support for the PAS202BCB image sensor; - Stefano Mozzi, who donated 45 EU; +- Andrew Pearce for the donation of a webcam; - Bertrik Sikken, who reverse-engineered and documented the Huffman compression algorithm used in the SN9C10x controllers and implemented the first decoder; - Mizuno Takafumi for the donation of a webcam; diff --git a/Documentation/usb/usbmon.txt b/Documentation/usb/usbmon.txt index 2f8431f92b7..63cb7edd177 100644 --- a/Documentation/usb/usbmon.txt +++ b/Documentation/usb/usbmon.txt @@ -101,6 +101,13 @@ Here is the list of words, from left to right: or 3 and 2 positions, correspondingly. - URB Status. This field makes no sense for submissions, but is present to help scripts with parsing. In error case, it contains the error code. + In case of a setup packet, it contains a Setup Tag. If scripts read a number + in this field, they proceed to read Data Length. Otherwise, they read + the setup packet before reading the Data Length. +- Setup packet, if present, consists of 5 words: one of each for bmRequestType, + bRequest, wValue, wIndex, wLength, as specified by the USB Specification 2.0. + These words are safe to decode if Setup Tag was 's'. Otherwise, the setup + packet was present, but not captured, and the fields contain filler. - Data Length. This is the actual length in the URB. - Data tag. The usbmon may not always capture data, even if length is nonzero. Only if tag is '=', the data words are present. @@ -125,25 +132,31 @@ class ParsedLine { String data_str = st.nextToken(); int len = data_str.length() / 2; int i; + int b; // byte is signed, apparently?! XXX for (i = 0; i < len; i++) { - data[data_len] = Byte.parseByte( - data_str.substring(i*2, i*2 + 2), - 16); + // data[data_len] = Byte.parseByte( + // data_str.substring(i*2, i*2 + 2), + // 16); + b = Integer.parseInt( + data_str.substring(i*2, i*2 + 2), + 16); + if (b >= 128) + b *= -1; + data[data_len] = (byte) b; data_len++; } } } } -This format is obviously deficient. For example, the setup packet for control -transfers is not delivered. This will change in the future. +This format may be changed in the future. Examples: -An input control transfer to get a port status: +An input control transfer to get a port status. -d74ff9a0 2640288196 S Ci:001:00 -115 4 < -d74ff9a0 2640288202 C Ci:001:00 0 4 = 01010100 +d5ea89a0 3575914555 S Ci:001:00 s a3 00 0000 0003 0004 4 < +d5ea89a0 3575914560 C Ci:001:00 0 4 = 01050000 An output bulk transfer to send a SCSI command 0x5E in a 31-byte Bulk wrapper to a storage device at address 5: diff --git a/Documentation/video4linux/API.html b/Documentation/video4linux/API.html index 4b3d8f640a4..441407b12a9 100644 --- a/Documentation/video4linux/API.html +++ b/Documentation/video4linux/API.html @@ -1,399 +1,16 @@ -<HTML><HEAD> -<TITLE>Video4Linux Kernel API Reference v0.1:19990430</TITLE> -</HEAD> -<! Revision History: > -<! 4/30/1999 - Fred Gleason (fredg@wava.com)> -<! Documented extensions for the Radio Data System (RDS) extensions > -<BODY bgcolor="#ffffff"> -<H3>Devices</H3> -Video4Linux provides the following sets of device files. These live on the -character device formerly known as "/dev/bttv". /dev/bttv should be a -symlink to /dev/video0 for most people. -<P> -<TABLE> -<TR><TH>Device Name</TH><TH>Minor Range</TH><TH>Function</TH> -<TR><TD>/dev/video</TD><TD>0-63</TD><TD>Video Capture Interface</TD> -<TR><TD>/dev/radio</TD><TD>64-127</TD><TD>AM/FM Radio Devices</TD> -<TR><TD>/dev/vtx</TD><TD>192-223</TD><TD>Teletext Interface Chips</TD> -<TR><TD>/dev/vbi</TD><TD>224-239</TD><TD>Raw VBI Data (Intercast/teletext)</TD> -</TABLE> -<P> -Video4Linux programs open and scan the devices to find what they are looking -for. Capability queries define what each interface supports. The -described API is only defined for video capture cards. The relevant subset -applies to radio cards. Teletext interfaces talk the existing VTX API. -<P> -<H3>Capability Query Ioctl</H3> -The <B>VIDIOCGCAP</B> ioctl call is used to obtain the capability -information for a video device. The <b>struct video_capability</b> object -passed to the ioctl is completed and returned. It contains the following -information -<P> -<TABLE> -<TR><TD><b>name[32]</b><TD>Canonical name for this interface</TD> -<TR><TD><b>type</b><TD>Type of interface</TD> -<TR><TD><b>channels</b><TD>Number of radio/tv channels if appropriate</TD> -<TR><TD><b>audios</b><TD>Number of audio devices if appropriate</TD> -<TR><TD><b>maxwidth</b><TD>Maximum capture width in pixels</TD> -<TR><TD><b>maxheight</b><TD>Maximum capture height in pixels</TD> -<TR><TD><b>minwidth</b><TD>Minimum capture width in pixels</TD> -<TR><TD><b>minheight</b><TD>Minimum capture height in pixels</TD> -</TABLE> -<P> -The type field lists the capability flags for the device. These are -as follows -<P> -<TABLE> -<TR><TH>Name</TH><TH>Description</TH> -<TR><TD><b>VID_TYPE_CAPTURE</b><TD>Can capture to memory</TD> -<TR><TD><b>VID_TYPE_TUNER</b><TD>Has a tuner of some form</TD> -<TR><TD><b>VID_TYPE_TELETEXT</b><TD>Has teletext capability</TD> -<TR><TD><b>VID_TYPE_OVERLAY</b><TD>Can overlay its image onto the frame buffer</TD> -<TR><TD><b>VID_TYPE_CHROMAKEY</b><TD>Overlay is Chromakeyed</TD> -<TR><TD><b>VID_TYPE_CLIPPING</b><TD>Overlay clipping is supported</TD> -<TR><TD><b>VID_TYPE_FRAMERAM</b><TD>Overlay overwrites frame buffer memory</TD> -<TR><TD><b>VID_TYPE_SCALES</b><TD>The hardware supports image scaling</TD> -<TR><TD><b>VID_TYPE_MONOCHROME</b><TD>Image capture is grey scale only</TD> -<TR><TD><b>VID_TYPE_SUBCAPTURE</b><TD>Capture can be of only part of the image</TD> -</TABLE> -<P> -The minimum and maximum sizes listed for a capture device do not imply all -that all height/width ratios or sizes within the range are possible. A -request to set a size will be honoured by the largest available capture -size whose capture is no large than the requested rectangle in either -direction. For example the quickcam has 3 fixed settings. -<P> -<H3>Frame Buffer</H3> -Capture cards that drop data directly onto the frame buffer must be told the -base address of the frame buffer, its size and organisation. This is a -privileged ioctl and one that eventually X itself should set. -<P> -The <b>VIDIOCSFBUF</b> ioctl sets the frame buffer parameters for a capture -card. If the card does not do direct writes to the frame buffer then this -ioctl will be unsupported. The <b>VIDIOCGFBUF</b> ioctl returns the -currently used parameters. The structure used in both cases is a -<b>struct video_buffer</b>. -<P> -<TABLE> -<TR><TD><b>void *base</b></TD><TD>Base physical address of the buffer</TD> -<TR><TD><b>int height</b></TD><TD>Height of the frame buffer</TD> -<TR><TD><b>int width</b></TD><TD>Width of the frame buffer</TD> -<TR><TD><b>int depth</b></TD><TD>Depth of the frame buffer</TD> -<TR><TD><b>int bytesperline</b></TD><TD>Number of bytes of memory between the start of two adjacent lines</TD> -</TABLE> -<P> -Note that these values reflect the physical layout of the frame buffer. -The visible area may be smaller. In fact under XFree86 this is commonly the -case. XFree86 DGA can provide the parameters required to set up this ioctl. -Setting the base address to NULL indicates there is no physical frame buffer -access. -<P> -<H3>Capture Windows</H3> -The capture area is described by a <b>struct video_window</b>. This defines -a capture area and the clipping information if relevant. The -<b>VIDIOCGWIN</b> ioctl recovers the current settings and the -<b>VIDIOCSWIN</b> sets new values. A successful call to <b>VIDIOCSWIN</b> -indicates that a suitable set of parameters have been chosen. They do not -indicate that exactly what was requested was granted. The program should -call <b>VIDIOCGWIN</b> to check if the nearest match was suitable. The -<b>struct video_window</b> contains the following fields. -<P> -<TABLE> -<TR><TD><b>x</b><TD>The X co-ordinate specified in X windows format.</TD> -<TR><TD><b>y</b><TD>The Y co-ordinate specified in X windows format.</TD> -<TR><TD><b>width</b><TD>The width of the image capture.</TD> -<TR><TD><b>height</b><TD>The height of the image capture.</TD> -<TR><TD><b>chromakey</b><TD>A host order RGB32 value for the chroma key.</TD> -<TR><TD><b>flags</b><TD>Additional capture flags.</TD> -<TR><TD><b>clips</b><TD>A list of clipping rectangles. <em>(Set only)</em></TD> -<TR><TD><b>clipcount</b><TD>The number of clipping rectangles. <em>(Set only)</em></TD> -</TABLE> -<P> -Clipping rectangles are passed as an array. Each clip consists of the following -fields available to the user. -<P> -<TABLE> -<TR><TD><b>x</b></TD><TD>X co-ordinate of rectangle to skip</TD> -<TR><TD><b>y</b></TD><TD>Y co-ordinate of rectangle to skip</TD> -<TR><TD><b>width</b></TD><TD>Width of rectangle to skip</TD> -<TR><TD><b>height</b></TD><TD>Height of rectangle to skip</TD> -</TABLE> -<P> -Merely setting the window does not enable capturing. Overlay capturing -(i.e. PCI-PCI transfer to the frame buffer of the video card) -is activated by passing the <b>VIDIOCCAPTURE</b> ioctl a value of 1, and -disabled by passing it a value of 0. -<P> -Some capture devices can capture a subfield of the image they actually see. -This is indicated when VIDEO_TYPE_SUBCAPTURE is defined. -The video_capture describes the time and special subfields to capture. -The video_capture structure contains the following fields. -<P> -<TABLE> -<TR><TD><b>x</b></TD><TD>X co-ordinate of source rectangle to grab</TD> -<TR><TD><b>y</b></TD><TD>Y co-ordinate of source rectangle to grab</TD> -<TR><TD><b>width</b></TD><TD>Width of source rectangle to grab</TD> -<TR><TD><b>height</b></TD><TD>Height of source rectangle to grab</TD> -<TR><TD><b>decimation</b></TD><TD>Decimation to apply</TD> -<TR><TD><b>flags</b></TD><TD>Flag settings for grabbing</TD> -</TABLE> -The available flags are -<P> -<TABLE> -<TR><TH>Name</TH><TH>Description</TH> -<TR><TD><b>VIDEO_CAPTURE_ODD</b><TD>Capture only odd frames</TD> -<TR><TD><b>VIDEO_CAPTURE_EVEN</b><TD>Capture only even frames</TD> -</TABLE> -<P> -<H3>Video Sources</H3> -Each video4linux video or audio device captures from one or more -source <b>channels</b>. Each channel can be queries with the -<b>VDIOCGCHAN</b> ioctl call. Before invoking this function the caller -must set the channel field to the channel that is being queried. On return -the <b>struct video_channel</b> is filled in with information about the -nature of the channel itself. -<P> -The <b>VIDIOCSCHAN</b> ioctl takes an integer argument and switches the -capture to this input. It is not defined whether parameters such as colour -settings or tuning are maintained across a channel switch. The caller should -maintain settings as desired for each channel. (This is reasonable as -different video inputs may have different properties). -<P> -The <b>struct video_channel</b> consists of the following -<P> -<TABLE> -<TR><TD><b>channel</b></TD><TD>The channel number</TD> -<TR><TD><b>name</b></TD><TD>The input name - preferably reflecting the label -on the card input itself</TD> -<TR><TD><b>tuners</b></TD><TD>Number of tuners for this input</TD> -<TR><TD><b>flags</b></TD><TD>Properties the tuner has</TD> -<TR><TD><b>type</b></TD><TD>Input type (if known)</TD> -<TR><TD><b>norm</b><TD>The norm for this channel</TD> -</TABLE> -<P> -The flags defined are -<P> -<TABLE> -<TR><TD><b>VIDEO_VC_TUNER</b><TD>Channel has tuners.</TD> -<TR><TD><b>VIDEO_VC_AUDIO</b><TD>Channel has audio.</TD> -<TR><TD><b>VIDEO_VC_NORM</b><TD>Channel has norm setting.</TD> -</TABLE> -<P> -The types defined are -<P> -<TABLE> -<TR><TD><b>VIDEO_TYPE_TV</b><TD>The input is a TV input.</TD> -<TR><TD><b>VIDEO_TYPE_CAMERA</b><TD>The input is a camera.</TD> -</TABLE> -<P> -<H3>Image Properties</H3> -The image properties of the picture can be queried with the <b>VIDIOCGPICT</b> -ioctl which fills in a <b>struct video_picture</b>. The <b>VIDIOCSPICT</b> -ioctl allows values to be changed. All values except for the palette type -are scaled between 0-65535. -<P> -The <b>struct video_picture</b> consists of the following fields -<P> -<TABLE> -<TR><TD><b>brightness</b><TD>Picture brightness</TD> -<TR><TD><b>hue</b><TD>Picture hue (colour only)</TD> -<TR><TD><b>colour</b><TD>Picture colour (colour only)</TD> -<TR><TD><b>contrast</b><TD>Picture contrast</TD> -<TR><TD><b>whiteness</b><TD>The whiteness (greyscale only)</TD> -<TR><TD><b>depth</b><TD>The capture depth (may need to match the frame buffer depth)</TD> -<TR><TD><b>palette</b><TD>Reports the palette that should be used for this image</TD> -</TABLE> -<P> -The following palettes are defined -<P> -<TABLE> -<TR><TD><b>VIDEO_PALETTE_GREY</b><TD>Linear intensity grey scale (255 is brightest).</TD> -<TR><TD><b>VIDEO_PALETTE_HI240</b><TD>The BT848 8bit colour cube.</TD> -<TR><TD><b>VIDEO_PALETTE_RGB565</b><TD>RGB565 packed into 16 bit words.</TD> -<TR><TD><b>VIDEO_PALETTE_RGB555</b><TD>RGV555 packed into 16 bit words, top bit undefined.</TD> -<TR><TD><b>VIDEO_PALETTE_RGB24</b><TD>RGB888 packed into 24bit words.</TD> -<TR><TD><b>VIDEO_PALETTE_RGB32</b><TD>RGB888 packed into the low 3 bytes of 32bit words. The top 8bits are undefined.</TD> -<TR><TD><b>VIDEO_PALETTE_YUV422</b><TD>Video style YUV422 - 8bits packed 4bits Y 2bits U 2bits V</TD> -<TR><TD><b>VIDEO_PALETTE_YUYV</b><TD>Describe me</TD> -<TR><TD><b>VIDEO_PALETTE_UYVY</b><TD>Describe me</TD> -<TR><TD><b>VIDEO_PALETTE_YUV420</b><TD>YUV420 capture</TD> -<TR><TD><b>VIDEO_PALETTE_YUV411</b><TD>YUV411 capture</TD> -<TR><TD><b>VIDEO_PALETTE_RAW</b><TD>RAW capture (BT848)</TD> -<TR><TD><b>VIDEO_PALETTE_YUV422P</b><TD>YUV 4:2:2 Planar</TD> -<TR><TD><b>VIDEO_PALETTE_YUV411P</b><TD>YUV 4:1:1 Planar</TD> -</TABLE> -<P> -<H3>Tuning</H3> -Each video input channel can have one or more tuners associated with it. Many -devices will not have tuners. TV cards and radio cards will have one or more -tuners attached. -<P> -Tuners are described by a <b>struct video_tuner</b> which can be obtained by -the <b>VIDIOCGTUNER</b> ioctl. Fill in the tuner number in the structure -then pass the structure to the ioctl to have the data filled in. The -tuner can be switched using <b>VIDIOCSTUNER</b> which takes an integer argument -giving the tuner to use. A struct tuner has the following fields -<P> -<TABLE> -<TR><TD><b>tuner</b><TD>Number of the tuner</TD> -<TR><TD><b>name</b><TD>Canonical name for this tuner (eg FM/AM/TV)</TD> -<TR><TD><b>rangelow</b><TD>Lowest tunable frequency</TD> -<TR><TD><b>rangehigh</b><TD>Highest tunable frequency</TD> -<TR><TD><b>flags</b><TD>Flags describing the tuner</TD> -<TR><TD><b>mode</b><TD>The video signal mode if relevant</TD> -<TR><TD><b>signal</b><TD>Signal strength if known - between 0-65535</TD> -</TABLE> -<P> -The following flags exist -<P> -<TABLE> -<TR><TD><b>VIDEO_TUNER_PAL</b><TD>PAL tuning is supported</TD> -<TR><TD><b>VIDEO_TUNER_NTSC</b><TD>NTSC tuning is supported</TD> -<TR><TD><b>VIDEO_TUNER_SECAM</b><TD>SECAM tuning is supported</TD> -<TR><TD><b>VIDEO_TUNER_LOW</b><TD>Frequency is in a lower range</TD> -<TR><TD><b>VIDEO_TUNER_NORM</b><TD>The norm for this tuner is settable</TD> -<TR><TD><b>VIDEO_TUNER_STEREO_ON</b><TD>The tuner is seeing stereo audio</TD> -<TR><TD><b>VIDEO_TUNER_RDS_ON</b><TD>The tuner is seeing a RDS datastream</TD> -<TR><TD><b>VIDEO_TUNER_MBS_ON</b><TD>The tuner is seeing a MBS datastream</TD> -</TABLE> -<P> -The following modes are defined -<P> -<TABLE> -<TR><TD><b>VIDEO_MODE_PAL</b><TD>The tuner is in PAL mode</TD> -<TR><TD><b>VIDEO_MODE_NTSC</b><TD>The tuner is in NTSC mode</TD> -<TR><TD><b>VIDEO_MODE_SECAM</b><TD>The tuner is in SECAM mode</TD> -<TR><TD><b>VIDEO_MODE_AUTO</b><TD>The tuner auto switches, or mode does not apply</TD> -</TABLE> -<P> -Tuning frequencies are an unsigned 32bit value in 1/16th MHz or if the -<b>VIDEO_TUNER_LOW</b> flag is set they are in 1/16th KHz. The current -frequency is obtained as an unsigned long via the <b>VIDIOCGFREQ</b> ioctl and -set by the <b>VIDIOCSFREQ</b> ioctl. -<P> -<H3>Audio</H3> -TV and Radio devices have one or more audio inputs that may be selected. -The audio properties are queried by passing a <b>struct video_audio</b> to <b>VIDIOCGAUDIO</b> ioctl. The -<b>VIDIOCSAUDIO</b> ioctl sets audio properties. -<P> -The structure contains the following fields -<P> -<TABLE> -<TR><TD><b>audio</b><TD>The channel number</TD> -<TR><TD><b>volume</b><TD>The volume level</TD> -<TR><TD><b>bass</b><TD>The bass level</TD> -<TR><TD><b>treble</b><TD>The treble level</TD> -<TR><TD><b>flags</b><TD>Flags describing the audio channel</TD> -<TR><TD><b>name</b><TD>Canonical name for the audio input</TD> -<TR><TD><b>mode</b><TD>The mode the audio input is in</TD> -<TR><TD><b>balance</b><TD>The left/right balance</TD> -<TR><TD><b>step</b><TD>Actual step used by the hardware</TD> -</TABLE> -<P> -The following flags are defined -<P> -<TABLE> -<TR><TD><b>VIDEO_AUDIO_MUTE</b><TD>The audio is muted</TD> -<TR><TD><b>VIDEO_AUDIO_MUTABLE</b><TD>Audio muting is supported</TD> -<TR><TD><b>VIDEO_AUDIO_VOLUME</b><TD>The volume is controllable</TD> -<TR><TD><b>VIDEO_AUDIO_BASS</b><TD>The bass is controllable</TD> -<TR><TD><b>VIDEO_AUDIO_TREBLE</b><TD>The treble is controllable</TD> -<TR><TD><b>VIDEO_AUDIO_BALANCE</b><TD>The balance is controllable</TD> -</TABLE> -<P> -The following decoding modes are defined -<P> -<TABLE> -<TR><TD><b>VIDEO_SOUND_MONO</b><TD>Mono signal</TD> -<TR><TD><b>VIDEO_SOUND_STEREO</b><TD>Stereo signal (NICAM for TV)</TD> -<TR><TD><b>VIDEO_SOUND_LANG1</b><TD>European TV alternate language 1</TD> -<TR><TD><b>VIDEO_SOUND_LANG2</b><TD>European TV alternate language 2</TD> -</TABLE> -<P> -<H3>Reading Images</H3> -Each call to the <b>read</b> syscall returns the next available image -from the device. It is up to the caller to set format and size (using -the VIDIOCSPICT and VIDIOCSWIN ioctls) and then to pass a suitable -size buffer and length to the function. Not all devices will support -read operations. -<P> -A second way to handle image capture is via the mmap interface if supported. -To use the mmap interface a user first sets the desired image size and depth -properties. Next the VIDIOCGMBUF ioctl is issued. This reports the size -of buffer to mmap and the offset within the buffer for each frame. The -number of frames supported is device dependent and may only be one. -<P> -The video_mbuf structure contains the following fields -<P> -<TABLE> -<TR><TD><b>size</b><TD>The number of bytes to map</TD> -<TR><TD><b>frames</b><TD>The number of frames</TD> -<TR><TD><b>offsets</b><TD>The offset of each frame</TD> -</TABLE> -<P> -Once the mmap has been made the VIDIOCMCAPTURE ioctl starts the -capture to a frame using the format and image size specified in the -video_mmap (which should match or be below the initial query size). -When the VIDIOCMCAPTURE ioctl returns the frame is <em>not</em> -captured yet, the driver just instructed the hardware to start the -capture. The application has to use the VIDIOCSYNC ioctl to wait -until the capture of a frame is finished. VIDIOCSYNC takes the frame -number you want to wait for as argument. -<p> -It is allowed to call VIDIOCMCAPTURE multiple times (with different -frame numbers in video_mmap->frame of course) and thus have multiple -outstanding capture requests. A simple way do to double-buffering -using this feature looks like this: -<pre> -/* setup everything */ -VIDIOCMCAPTURE(0) -while (whatever) { - VIDIOCMCAPTURE(1) - VIDIOCSYNC(0) - /* process frame 0 while the hardware captures frame 1 */ - VIDIOCMCAPTURE(0) - VIDIOCSYNC(1) - /* process frame 1 while the hardware captures frame 0 */ -} -</pre> -Note that you are <em>not</em> limited to only two frames. The API -allows up to 32 frames, the VIDIOCGMBUF ioctl returns the number of -frames the driver granted. Thus it is possible to build deeper queues -to avoid loosing frames on load peaks. -<p> -While capturing to memory the driver will make a "best effort" attempt -to capture to screen as well if requested. This normally means all -frames that "miss" memory mapped capture will go to the display. -<P> -A final ioctl exists to allow a device to obtain related devices if a -driver has multiple components (for example video0 may not be associated -with vbi0 which would cause an intercast display program to make a bad -mistake). The VIDIOCGUNIT ioctl reports the unit numbers of the associated -devices if any exist. The video_unit structure has the following fields. -<P> -<TABLE> -<TR><TD><b>video</b><TD>Video capture device</TD> -<TR><TD><b>vbi</b><TD>VBI capture device</TD> -<TR><TD><b>radio</b><TD>Radio device</TD> -<TR><TD><b>audio</b><TD>Audio mixer</TD> -<TR><TD><b>teletext</b><TD>Teletext device</TD> -</TABLE> -<P> -<H3>RDS Datastreams</H3> -For radio devices that support it, it is possible to receive Radio Data -System (RDS) data by means of a read() on the device. The data is packed in -groups of three, as follows: -<TABLE> -<TR><TD>First Octet</TD><TD>Least Significant Byte of RDS Block</TD></TR> -<TR><TD>Second Octet</TD><TD>Most Significant Byte of RDS Block -<TR><TD>Third Octet</TD><TD>Bit 7:</TD><TD>Error bit. Indicates that -an uncorrectable error occurred during reception of this block.</TD></TR> -<TR><TD> </TD><TD>Bit 6:</TD><TD>Corrected bit. Indicates that -an error was corrected for this data block.</TD></TR> -<TR><TD> </TD><TD>Bits 5-3:</TD><TD>Received Offset. Indicates the -offset received by the sync system.</TD></TR> -<TR><TD> </TD><TD>Bits 2-0:</TD><TD>Offset Name. Indicates the -offset applied to this data.</TD></TR> -</TABLE> -</BODY> -</HTML> +<TITLE>V4L API</TITLE> +<H1>Video For Linux APIs</H1> +<table border=0> +<tr> +<td> +<A HREF=http://www.linuxtv.org/downloads/video4linux/API/V4L1_API.html> +V4L original API</a> +</td><td> +Obsoleted by V4L2 API +</td></tr><tr><td> +<A HREF=http://www.linuxtv.org/downloads/video4linux/API/V4L2_API.html> +V4L2 API</a> +</td><td> +Should be used for new projects +</td></tr> +</table> diff --git a/Documentation/video4linux/CARDLIST.bttv b/Documentation/video4linux/CARDLIST.bttv index e46761c39e3..62a12a08e2a 100644 --- a/Documentation/video4linux/CARDLIST.bttv +++ b/Documentation/video4linux/CARDLIST.bttv @@ -1,4 +1,4 @@ -card=0 - *** UNKNOWN/GENERIC *** +card=0 - *** UNKNOWN/GENERIC *** card=1 - MIRO PCTV card=2 - Hauppauge (bt848) card=3 - STB, Gateway P/N 6000699 (bt848) @@ -119,3 +119,17 @@ card=117 - NGS NGSTV+ card=118 - LMLBT4 card=119 - Tekram M205 PRO card=120 - Conceptronic CONTVFMi +card=121 - Euresys Picolo Tetra +card=122 - Spirit TV Tuner +card=123 - AVerMedia AVerTV DVB-T 771 +card=124 - AverMedia AverTV DVB-T 761 +card=125 - MATRIX Vision Sigma-SQ +card=126 - MATRIX Vision Sigma-SLC +card=127 - APAC Viewcomp 878(AMAX) +card=128 - DVICO FusionHDTV DVB-T Lite +card=129 - V-Gear MyVCD +card=130 - Super TV Tuner +card=131 - Tibet Systems 'Progress DVR' CS16 +card=132 - Kodicom 4400R (master) +card=133 - Kodicom 4400R (slave) +card=134 - Adlink RTV24 diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88 new file mode 100644 index 00000000000..03deb0726aa --- /dev/null +++ b/Documentation/video4linux/CARDLIST.cx88 @@ -0,0 +1,32 @@ +card=0 - UNKNOWN/GENERIC +card=1 - Hauppauge WinTV 34xxx models +card=2 - GDI Black Gold +card=3 - PixelView +card=4 - ATI TV Wonder Pro +card=5 - Leadtek Winfast 2000XP Expert +card=6 - AverTV Studio 303 (M126) +card=7 - MSI TV-@nywhere Master +card=8 - Leadtek Winfast DV2000 +card=9 - Leadtek PVR 2000 +card=10 - IODATA GV-VCP3/PCI +card=11 - Prolink PlayTV PVR +card=12 - ASUS PVR-416 +card=13 - MSI TV-@nywhere +card=14 - KWorld/VStream XPert DVB-T +card=15 - DViCO FusionHDTV DVB-T1 +card=16 - KWorld LTV883RF +card=17 - DViCO FusionHDTV 3 Gold-Q +card=18 - Hauppauge Nova-T DVB-T +card=19 - Conexant DVB-T reference design +card=20 - Provideo PV259 +card=21 - DViCO FusionHDTV DVB-T Plus +card=22 - digitalnow DNTV Live! DVB-T +card=23 - pcHDTV HD3000 HDTV +card=24 - Hauppauge WinTV 28xxx (Roslyn) models +card=25 - Digital-Logic MICROSPACE Entertainment Center (MEC) +card=26 - IODATA GV/BCTV7E +card=27 - PixelView PlayTV Ultra Pro (Stereo) +card=28 - DViCO FusionHDTV 3 Gold-T +card=29 - ADS Tech Instant TV DVB-T PCI +card=30 - TerraTec Cinergy 1400 DVB-T +card=31 - DViCO FusionHDTV 5 Gold diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134 index a6c82fa4de0..1b5a3a9ffbe 100644 --- a/Documentation/video4linux/CARDLIST.saa7134 +++ b/Documentation/video4linux/CARDLIST.saa7134 @@ -1,10 +1,10 @@ - 0 -> UNKNOWN/GENERIC + 0 -> UNKNOWN/GENERIC 1 -> Proteus Pro [philips reference design] [1131:2001,1131:2001] 2 -> LifeView FlyVIDEO3000 [5168:0138,4e42:0138] 3 -> LifeView FlyVIDEO2000 [5168:0138] 4 -> EMPRESS [1131:6752] 5 -> SKNet Monster TV [1131:4e85] - 6 -> Tevion MD 9717 + 6 -> Tevion MD 9717 7 -> KNC One TV-Station RDS / Typhoon TV Tuner RDS [1131:fe01,1894:fe01] 8 -> Terratec Cinergy 400 TV [153B:1142] 9 -> Medion 5044 @@ -20,16 +20,45 @@ 19 -> Compro VideoMate TV [185b:c100] 20 -> Matrox CronosPlus [102B:48d0] 21 -> 10MOONS PCI TV CAPTURE CARD [1131:2001] - 22 -> Medion 2819/ AverMedia M156 [1461:a70b,1461:2115] + 22 -> AverMedia M156 / Medion 2819 [1461:a70b] 23 -> BMK MPEX Tuner 24 -> KNC One TV-Station DVR [1894:a006] 25 -> ASUS TV-FM 7133 [1043:4843] 26 -> Pinnacle PCTV Stereo (saa7134) [11bd:002b] - 27 -> Manli MuchTV M-TV002 - 28 -> Manli MuchTV M-TV001 + 27 -> Manli MuchTV M-TV002/Behold TV 403 FM + 28 -> Manli MuchTV M-TV001/Behold TV 401 29 -> Nagase Sangyo TransGear 3000TV [1461:050c] 30 -> Elitegroup ECS TVP3XP FM1216 Tuner Card(PAL-BG,FM) [1019:4cb4] 31 -> Elitegroup ECS TVP3XP FM1236 Tuner Card (NTSC,FM) [1019:4cb5] 32 -> AVACS SmartTV 33 -> AVerMedia DVD EZMaker [1461:10ff] - 34 -> LifeView FlyTV Platinum33 mini [5168:0212] + 34 -> Noval Prime TV 7133 + 35 -> AverMedia AverTV Studio 305 [1461:2115] + 36 -> UPMOST PURPLE TV [12ab:0800] + 37 -> Items MuchTV Plus / IT-005 + 38 -> Terratec Cinergy 200 TV [153B:1152] + 39 -> LifeView FlyTV Platinum Mini [5168:0212] + 40 -> Compro VideoMate TV PVR/FM [185b:c100] + 41 -> Compro VideoMate TV Gold+ [185b:c100] + 42 -> Sabrent SBT-TVFM (saa7130) + 43 -> :Zolid Xpert TV7134 + 44 -> Empire PCI TV-Radio LE + 45 -> Avermedia AVerTV Studio 307 [1461:9715] + 46 -> AVerMedia Cardbus TV/Radio (E500) [1461:d6ee] + 47 -> Terratec Cinergy 400 mobile [153b:1162] + 48 -> Terratec Cinergy 600 TV MK3 [153B:1158] + 49 -> Compro VideoMate Gold+ Pal [185b:c200] + 50 -> Pinnacle PCTV 300i DVB-T + PAL [11bd:002d] + 51 -> ProVideo PV952 [1540:9524] + 52 -> AverMedia AverTV/305 [1461:2108] + 53 -> ASUS TV-FM 7135 [1043:4845] + 54 -> LifeView FlyTV Platinum FM [5168:0214,1489:0214] + 55 -> LifeView FlyDVB-T DUO [5168:0502,5168:0306] + 56 -> Avermedia AVerTV 307 [1461:a70a] + 57 -> Avermedia AVerTV GO 007 FM [1461:f31f] + 58 -> ADS Tech Instant TV (saa7135) [1421:0350,1421:0370] + 59 -> Kworld/Tevion V-Stream Xpert TV PVR7134 + 60 -> Typhoon DVB-T Duo Digital/Analog Cardbus [4e42:0502] + 61 -> Philips TOUGH DVB-T reference design [1131:2004] + 62 -> Compro VideoMate TV Gold+II + 63 -> Kworld Xpert TV PVR7134 diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner index f7bafe862ba..f3302e1b1b9 100644 --- a/Documentation/video4linux/CARDLIST.tuner +++ b/Documentation/video4linux/CARDLIST.tuner @@ -44,3 +44,23 @@ tuner=42 - Philips 1236D ATSC/NTSC daul in tuner=43 - Philips NTSC MK3 (FM1236MK3 or FM1236/F) tuner=44 - Philips 4 in 1 (ATI TV Wonder Pro/Conexant) tuner=45 - Microtune 4049 FM5 +tuner=46 - Panasonic VP27s/ENGE4324D +tuner=47 - LG NTSC (TAPE series) +tuner=48 - Tenna TNF 8831 BGFF) +tuner=49 - Microtune 4042 FI5 ATSC/NTSC dual in +tuner=50 - TCL 2002N +tuner=51 - Philips PAL/SECAM_D (FM 1256 I-H3) +tuner=52 - Thomson DDT 7610 (ATSC/NTSC) +tuner=53 - Philips FQ1286 +tuner=54 - tda8290+75 +tuner=55 - LG PAL (TAPE series) +tuner=56 - Philips PAL/SECAM multi (FQ1216AME MK4) +tuner=57 - Philips FQ1236A MK4 +tuner=58 - Ymec TVision TVF-8531MF/8831MF/8731MF +tuner=59 - Ymec TVision TVF-5533MF +tuner=60 - Thomson DDT 7611 (ATSC/NTSC) +tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF +tuner=62 - Philips TEA5767HN FM Radio +tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner +tuner=64 - LG TDVS-H062F/TUA6034 +tuner=65 - Ymec TVF66T5-B/DFF diff --git a/Documentation/video4linux/README.saa7134 b/Documentation/video4linux/README.saa7134 index 1a446c65365..1f788e498ef 100644 --- a/Documentation/video4linux/README.saa7134 +++ b/Documentation/video4linux/README.saa7134 @@ -57,6 +57,15 @@ Cards can use either of these two crystals (xtal): - 24.576MHz -> .audio_clock=0x200000 (xtal * .audio_clock = 51539600) +Some details about 30/34/35: + + - saa7130 - low-price chip, doesn't have mute, that is why all those + cards should have .mute field defined in their tuner structure. + + - saa7134 - usual chip + + - saa7133/35 - saa7135 is probably a marketing decision, since all those + chips identifies itself as 33 on pci. Credits ======= diff --git a/Documentation/video4linux/bttv/Cards b/Documentation/video4linux/bttv/Cards index 7f8c7eb70ab..8f1941ede4d 100644 --- a/Documentation/video4linux/bttv/Cards +++ b/Documentation/video4linux/bttv/Cards @@ -20,7 +20,7 @@ All other cards only differ by additional components as tuners, sound decoders, EEPROMs, teletext decoders ... -Unsupported Cards: +Unsupported Cards: ------------------ Cards with Zoran (ZR) or Philips (SAA) or ISA are not supported by @@ -50,11 +50,11 @@ Bt848a/Bt849 single crytal operation support possible!!! Miro/Pinnacle PCTV ------------------ -- Bt848 - some (all??) come with 2 crystals for PAL/SECAM and NTSC +- Bt848 + some (all??) come with 2 crystals for PAL/SECAM and NTSC - PAL, SECAM or NTSC TV tuner (Philips or TEMIC) - MSP34xx sound decoder on add on board - decoder is supported but AFAIK does not yet work + decoder is supported but AFAIK does not yet work (other sound MUX setting in GPIO port needed??? somebody who fixed this???) - 1 tuner, 1 composite and 1 S-VHS input - tuner type is autodetected @@ -70,7 +70,7 @@ in 1997! Hauppauge Win/TV pci -------------------- -There are many different versions of the Hauppauge cards with different +There are many different versions of the Hauppauge cards with different tuners (TV+Radio ...), teletext decoders. Note that even cards with same model numbers have (depending on the revision) different chips on it. @@ -80,22 +80,22 @@ different chips on it. - PAL, SECAM, NTSC or tuner with or without Radio support e.g.: - PAL: + PAL: TDA5737: VHF, hyperband and UHF mixer/oscillator for TV and VCR 3-band tuners TSA5522: 1.4 GHz I2C-bus controlled synthesizer, I2C 0xc2-0xc3 - + NTSC: TDA5731: VHF, hyperband and UHF mixer/oscillator for TV and VCR 3-band tuners TSA5518: no datasheet available on Philips site -- Philips SAA5246 or SAA5284 ( or no) Teletext decoder chip +- Philips SAA5246 or SAA5284 ( or no) Teletext decoder chip with buffer RAM (e.g. Winbond W24257AS-35: 32Kx8 CMOS static RAM) SAA5246 (I2C 0x22) is supported -- 256 bytes EEPROM: Microchip 24LC02B or Philips 8582E2Y +- 256 bytes EEPROM: Microchip 24LC02B or Philips 8582E2Y with configuration information I2C address 0xa0 (24LC02B also responds to 0xa2-0xaf) - 1 tuner, 1 composite and (depending on model) 1 S-VHS input - 14052B: mux for selection of sound source -- sound decoder: TDA9800, MSP34xx (stereo cards) +- sound decoder: TDA9800, MSP34xx (stereo cards) Askey CPH-Series @@ -108,17 +108,17 @@ Developed by TelSignal(?), OEMed by many vendors (Typhoon, Anubis, Dynalink) CPH05x: BT878 with FM CPH06x: BT878 (w/o FM) CPH07x: BT878 capture only - + TV standards: CPH0x0: NTSC-M/M CPH0x1: PAL-B/G CPH0x2: PAL-I/I CPH0x3: PAL-D/K - CPH0x4: SECAM-L/L - CPH0x5: SECAM-B/G - CPH0x6: SECAM-D/K - CPH0x7: PAL-N/N - CPH0x8: PAL-B/H + CPH0x4: SECAM-L/L + CPH0x5: SECAM-B/G + CPH0x6: SECAM-D/K + CPH0x7: PAL-N/N + CPH0x8: PAL-B/H CPH0x9: PAL-M/M CPH03x was often sold as "TV capturer". @@ -174,7 +174,7 @@ Lifeview Flyvideo Series: "The FlyVideo2000 and FlyVideo2000s product name have renamed to FlyVideo98." Their Bt8x8 cards are listed as discontinued. Flyvideo 2000S was probably sold as Flyvideo 3000 in some contries(Europe?). - The new Flyvideo 2000/3000 are SAA7130/SAA7134 based. + The new Flyvideo 2000/3000 are SAA7130/SAA7134 based. "Flyvideo II" had been the name for the 848 cards, nowadays (in Germany) this name is re-used for LR50 Rev.W. @@ -235,12 +235,12 @@ Prolink Multimedia TV packages (card + software pack): PixelView Play TV Theater - (Model: PV-M4200) = PixelView Play TV pro + Software PixelView Play TV PAK - (Model: PV-BT878P+ REV 4E) - PixelView Play TV/VCR - (Model: PV-M3200 REV 4C / 8D / 10A ) + PixelView Play TV/VCR - (Model: PV-M3200 REV 4C / 8D / 10A ) PixelView Studio PAK - (Model: M2200 REV 4C / 8D / 10A ) PixelView PowerStudio PAK - (Model: PV-M3600 REV 4E) PixelView DigitalVCR PAK - (Model: PV-M2400 REV 4C / 8D / 10A ) - PixelView PlayTV PAK II (TV/FM card + usb camera) PV-M3800 + PixelView PlayTV PAK II (TV/FM card + usb camera) PV-M3800 PixelView PlayTV XP PV-M4700,PV-M4700(w/FM) PixelView PlayTV DVR PV-M4600 package contents:PixelView PlayTV pro, windvr & videoMail s/w @@ -254,7 +254,7 @@ Prolink DTV3000 PV-DTV3000P+ DVB-S CI = Twinhan VP-1030 DTV2000 DVB-S = Twinhan VP-1020 - + Video Conferencing: PixelView Meeting PAK - (Model: PV-BT878P) PixelView Meeting PAK Lite - (Model: PV-BT878P) @@ -308,7 +308,7 @@ KNC One newer Cards have saa7134, but model name stayed the same? -Provideo +Provideo -------- PV951 or PV-951 (also are sold as: Boeder TV-FM Video Capture Card @@ -353,7 +353,7 @@ AVerMedia AVerTV AVerTV Stereo AVerTV Studio (w/FM) - AVerMedia TV98 with Remote + AVerMedia TV98 with Remote AVerMedia TV/FM98 Stereo AVerMedia TVCAM98 TVCapture (Bt848) @@ -373,7 +373,7 @@ AVerMedia (1) Daughterboard MB68-A with TDA9820T and TDA9840T (2) Sony NE41S soldered (stereo sound?) (3) Daughterboard M118-A w/ pic 16c54 and 4 MHz quartz - + US site has different drivers for (as of 09/2002): EZ Capture/InterCam PCI (BT-848 chip) EZ Capture/InterCam PCI (BT-878 chip) @@ -437,7 +437,7 @@ Terratec Terra TValueRadio, "LR102 Rev.C" printed on the PCB Terra TV/Radio+ Version 1.0, "80-CP2830100-0" TTTV3 printed on the PCB, "CPH010-E83" on the back, SAA6588T, TDA9873H - Terra TValue Version BT878, "80-CP2830110-0 TTTV4" printed on the PCB, + Terra TValue Version BT878, "80-CP2830110-0 TTTV4" printed on the PCB, "CPH011-D83" on back Terra TValue Version 1.0 "ceb105.PCB" (really identical to Terra TV+ Version 1.0) Terra TValue New Revision "LR102 Rec.C" @@ -528,7 +528,7 @@ Koutech KW-606RSF KW-607A (capture only) KW-608 (Zoran capture only) - + IODATA (jp) ------ GV-BCTV/PCI @@ -542,15 +542,15 @@ Canopus (jp) ------- WinDVR = Kworld "KW-TVL878RF" -www.sigmacom.co.kr +www.sigmacom.co.kr ------------------ - Sigma Cyber TV II + Sigma Cyber TV II www.sasem.co.kr --------------- Litte OnAir TV -hama +hama ---- TV/Radio-Tuner Card, PCI (Model 44677) = CPH051 @@ -638,7 +638,7 @@ Media-Surfer (esc-kathrein.de) Jetway (www.jetway.com.tw) -------------------------- - JW-TV 878M + JW-TV 878M JW-TV 878 = KWorld KW-TV878RF Galaxis @@ -715,7 +715,7 @@ Hauppauge 809 MyVideo 872 MyTV2Go FM - + 546 WinTV Nova-S CI 543 WinTV Nova 907 Nova-S USB @@ -739,7 +739,7 @@ Hauppauge 832 MyTV2Go 869 MyTV2Go-FM 805 MyVideo (USB) - + Matrix-Vision ------------- @@ -764,7 +764,7 @@ Gallant (www.gallantcom.com) www.minton.com.tw Intervision IV-550 (bt8x8) Intervision IV-100 (zoran) Intervision IV-1000 (bt8x8) - + Asonic (www.asonic.com.cn) (website down) ----------------------------------------- SkyEye tv 878 @@ -804,11 +804,11 @@ Kworld (www.kworld.com.tw) JTT/ Justy Corp.http://www.justy.co.jp/ (www.jtt.com.jp website down) --------------------------------------------------------------------- - JTT-02 (JTT TV) "TV watchmate pro" (bt848) + JTT-02 (JTT TV) "TV watchmate pro" (bt848) ADS www.adstech.com ------------------- - Channel Surfer TV ( CHX-950 ) + Channel Surfer TV ( CHX-950 ) Channel Surfer TV+FM ( CHX-960FM ) AVEC www.prochips.com @@ -874,7 +874,7 @@ www.ids-imaging.de ------------------ Falcon Series (capture only) In USA: http://www.theimagingsource.com/ - DFG/LC1 + DFG/LC1 www.sknet-web.co.jp ------------------- @@ -890,7 +890,7 @@ Cybertainment CyberMail Xtreme These are Flyvideo -VCR (http://www.vcrinc.com/) +VCR (http://www.vcrinc.com/) --- Video Catcher 16 @@ -920,7 +920,7 @@ Sdisilk www.sdisilk.com/ SDI Silk 200 SDI Input Card www.euresys.com - PICOLO series + PICOLO series PMC/Pace www.pacecom.co.uk website closed diff --git a/Documentation/video4linux/bttv/Insmod-options b/Documentation/video4linux/bttv/Insmod-options index 7bb5a50b077..fc94ff235ff 100644 --- a/Documentation/video4linux/bttv/Insmod-options +++ b/Documentation/video4linux/bttv/Insmod-options @@ -44,6 +44,9 @@ bttv.o push used by bttv. bttv will disable overlay by default on this hardware to avoid crashes. With this insmod option you can override this. + no_overlay=1 Disable overlay. It should be used by broken + hardware that doesn't support PCI2PCI direct + transfers. automute=0/1 Automatically mutes the sound if there is no TV signal, on by default. You might try to disable this if you have bad input signal diff --git a/Documentation/video4linux/hauppauge-wintv-cx88-ir.txt b/Documentation/video4linux/hauppauge-wintv-cx88-ir.txt new file mode 100644 index 00000000000..93fec32a118 --- /dev/null +++ b/Documentation/video4linux/hauppauge-wintv-cx88-ir.txt @@ -0,0 +1,54 @@ +The controls for the mux are GPIO [0,1] for source, and GPIO 2 for muting. + +GPIO0 GPIO1 + 0 0 TV Audio + 1 0 FM radio + 0 1 Line-In + 1 1 Mono tuner bypass or CD passthru (tuner specific) + +GPIO 16(i believe) is tied to the IR port (if present). + +------------------------------------------------------------------------------------ + +>From the data sheet: + Register 24'h20004 PCI Interrupt Status + bit [18] IR_SMP_INT Set when 32 input samples have been collected over + gpio[16] pin into GP_SAMPLE register. + +What's missing from the data sheet: + +Setup 4KHz sampling rate (roughly 2x oversampled; good enough for our RC5 +compat remote) +set register 0x35C050 to 0xa80a80 + +enable sampling +set register 0x35C054 to 0x5 + +Of course, enable the IRQ bit 18 in the interrupt mask register .(and +provide for a handler) + +GP_SAMPLE register is at 0x35C058 + +Bits are then right shifted into the GP_SAMPLE register at the specified +rate; you get an interrupt when a full DWORD is recieved. +You need to recover the actual RC5 bits out of the (oversampled) IR sensor +bits. (Hint: look for the 0/1and 1/0 crossings of the RC5 bi-phase data) An +actual raw RC5 code will span 2-3 DWORDS, depending on the actual alignment. + +I'm pretty sure when no IR signal is present the receiver is always in a +marking state(1); but stray light, etc can cause intermittent noise values +as well. Remember, this is a free running sample of the IR receiver state +over time, so don't assume any sample starts at any particular place. + +http://www.atmel.com/dyn/resources/prod_documents/doc2817.pdf +This data sheet (google search) seems to have a lovely description of the +RC5 basics + +http://users.pandora.be/nenya/electronics/rc5/ and more data + +http://www.ee.washington.edu/circuit_archive/text/ir_decode.txt +and even a reference to how to decode a bi-phase data stream. + +http://www.xs4all.nl/~sbp/knowledge/ir/rc5.htm +still more info + diff --git a/Documentation/video4linux/lifeview.txt b/Documentation/video4linux/lifeview.txt new file mode 100644 index 00000000000..b07ea79c2b7 --- /dev/null +++ b/Documentation/video4linux/lifeview.txt @@ -0,0 +1,42 @@ +collecting data about the lifeview models and the config coding on +gpio pins 0-9 ... +================================================================== + +bt878: + LR50 rev. Q ("PARTS: 7031505116), Tuner wurde als Nr. 5 erkannt, Eingänge + SVideo, TV, Composite, Audio, Remote. CP9..1=100001001 (1: 0-Ohm-Widerstand + gegen GND unbestückt; 0: bestückt) + +------------------------------------------------------------------------------ + +saa7134: + /* LifeView FlyTV Platinum FM (LR214WF) */ + /* "Peter Missel <peter.missel@onlinehome.de> */ + .name = "LifeView FlyTV Platinum FM", + /* GP27 MDT2005 PB4 pin 10 */ + /* GP26 MDT2005 PB3 pin 9 */ + /* GP25 MDT2005 PB2 pin 8 */ + /* GP23 MDT2005 PB1 pin 7 */ + /* GP22 MDT2005 PB0 pin 6 */ + /* GP21 MDT2005 PB5 pin 11 */ + /* GP20 MDT2005 PB6 pin 12 */ + /* GP19 MDT2005 PB7 pin 13 */ + /* nc MDT2005 PA3 pin 2 */ + /* Remote MDT2005 PA2 pin 1 */ + /* GP18 MDT2005 PA1 pin 18 */ + /* nc MDT2005 PA0 pin 17 strap low */ + + /* GP17 Strap "GP7"=High */ + /* GP16 Strap "GP6"=High + 0=Radio 1=TV + Drives SA630D ENCH1 and HEF4052 A1 pins + to do FM radio through SIF input */ + /* GP15 nc */ + /* GP14 nc */ + /* GP13 nc */ + /* GP12 Strap "GP5" = High */ + /* GP11 Strap "GP4" = High */ + /* GP10 Strap "GP3" = High */ + /* GP09 Strap "GP2" = Low */ + /* GP08 Strap "GP1" = Low */ + /* GP07.00 nc */ diff --git a/Documentation/video4linux/not-in-cx2388x-datasheet.txt b/Documentation/video4linux/not-in-cx2388x-datasheet.txt new file mode 100644 index 00000000000..edbfe744d21 --- /dev/null +++ b/Documentation/video4linux/not-in-cx2388x-datasheet.txt @@ -0,0 +1,41 @@ +================================================================================= +MO_OUTPUT_FORMAT (0x310164) + + Previous default from DScaler: 0x1c1f0008 + Digit 8: 31-28 + 28: PREVREMOD = 1 + + Digit 7: 27-24 (0xc = 12 = b1100 ) + 27: COMBALT = 1 + 26: PAL_INV_PHASE + (DScaler apparently set this to 1, resulted in sucky picture) + + Digits 6,5: 23-16 + 25-16: COMB_RANGE = 0x1f [default] (9 bits -> max 512) + + Digit 4: 15-12 + 15: DISIFX = 0 + 14: INVCBF = 0 + 13: DISADAPT = 0 + 12: NARROWADAPT = 0 + + Digit 3: 11-8 + 11: FORCE2H + 10: FORCEREMD + 9: NCHROMAEN + 8: NREMODEN + + Digit 2: 7-4 + 7-6: YCORE + 5-4: CCORE + + Digit 1: 3-0 + 3: RANGE = 1 + 2: HACTEXT + 1: HSFMT + +0x47 is the sync byte for MPEG-2 transport stream packets. +Datasheet incorrectly states to use 47 decimal. 188 is the length. +All DVB compliant frontends output packets with this start code. + +================================================================================= diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt index b9e6be00cad..678e8f192db 100644 --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt @@ -6,6 +6,11 @@ only the AMD64 specific ones are listed here. Machine check mce=off disable machine check + mce=bootlog Enable logging of machine checks left over from booting. + Disabled by default because some BIOS leave bogus ones. + If your BIOS doesn't do that it's a good idea to enable though + to make sure you log even machine check events that result + in a reboot. nomce (for compatibility with i386): same as mce=off @@ -47,7 +52,7 @@ Timing notsc Don't use the CPU time stamp counter to read the wall time. This can be used to work around timing problems on multiprocessor systems - with not properly synchronized CPUs. Only useful with a SMP kernel + with not properly synchronized CPUs. report_lost_ticks Report when timer interrupts are lost because some code turned off @@ -74,6 +79,9 @@ Idle loop event. This will make the CPUs eat a lot more power, but may be useful to get slightly better performance in multiprocessor benchmarks. It also makes some profiling using performance counters more accurate. + Please note that on systems with MONITOR/MWAIT support (like Intel EM64T + CPUs) this option has no performance advantage over the normal idle loop. + It may also interact badly with hyperthreading. Rebooting @@ -178,6 +186,5 @@ Debugging Misc noreplacement Don't replace instructions with more appropiate ones - for the CPU. This may be useful on asymmetric MP systems - where some CPU have less capabilities than the others. - + for the CPU. This may be useful on asymmetric MP systems + where some CPU have less capabilities than the others. |