Age | Commit message (Collapse) | Author |
|
Yanmin Zhang reported:
| Comparing with kernel 2.6.25, AIM7 (use tmpfs) has more th
| regression under 2.6.26-rc1 on my 8-core stoakley, 16-core tigerton,
| and Itanium Montecito. Bisect located the patch below:
|
| 64ac24e738823161693bf791f87adc802cf529ff is first bad commit
| commit 64ac24e738823161693bf791f87adc802cf529ff
| Author: Matthew Wilcox <matthew@wil.cx>
| Date: Fri Mar 7 21:55:58 2008 -0500
|
| Generic semaphore implementation
|
| After I manually reverted the patch against 2.6.26-rc1 while fixing
| lots of conflicts/errors, aim7 regression became less than 2%.
i reproduced the AIM7 workload and can confirm Yanmin's findings that
-.26-rc1 regresses over .25 - by over 67% here.
Looking at the workload i found and fixed what i believe to be the real
bug causing the AIM7 regression: it was inefficient wakeup / scheduling
/ locking behavior of the new generic semaphore code, causing suboptimal
performance.
The problem comes from the following code. The new semaphore code does
this on down():
spin_lock_irqsave(&sem->lock, flags);
if (likely(sem->count > 0))
sem->count--;
else
__down(sem);
spin_unlock_irqrestore(&sem->lock, flags);
and this on up():
spin_lock_irqsave(&sem->lock, flags);
if (likely(list_empty(&sem->wait_list)))
sem->count++;
else
__up(sem);
spin_unlock_irqrestore(&sem->lock, flags);
where __up() does:
list_del(&waiter->list);
waiter->up = 1;
wake_up_process(waiter->task);
and where __down() does this in essence:
list_add_tail(&waiter.list, &sem->wait_list);
waiter.task = task;
waiter.up = 0;
for (;;) {
[...]
spin_unlock_irq(&sem->lock);
timeout = schedule_timeout(timeout);
spin_lock_irq(&sem->lock);
if (waiter.up)
return 0;
}
the fastpath looks good and obvious, but note the following property of
the contended path: if there's a task on the ->wait_list, the up() of
the current owner will "pass over" ownership to that waiting task, in a
wake-one manner, via the waiter->up flag and by removing the waiter from
the wait list.
That is all and fine in principle, but as implemented in
kernel/semaphore.c it also creates a nasty, hidden source of contention!
The contention comes from the following property of the new semaphore
code: the new owner owns the semaphore exclusively, even if it is not
running yet.
So if the old owner, even if just a few instructions later, does a
down() [lock_kernel()] again, it will be blocked and will have to wait
on the new owner to eventually be scheduled (possibly on another CPU)!
Or if another task gets to lock_kernel() sooner than the "new owner"
scheduled, it will be blocked unnecessarily and for a very long time
when there are 2000 tasks running.
I.e. the implementation of the new semaphores code does wake-one and
lock ownership in a very restrictive way - it does not allow
opportunistic re-locking of the lock at all and keeps the scheduler from
picking task order intelligently.
This kind of scheduling, with 2000 AIM7 processes running, creates awful
cross-scheduling between those 2000 tasks, causes reduced parallelism, a
throttled runqueue length and a lot of idle time. With increasing number
of CPUs it causes an exponentially worse behavior in AIM7, as the chance
for a newly woken new-owner task to actually run anytime soon is less
and less likely.
Note that it takes just a tiny bit of contention for the 'new-semaphore
catastrophy' to happen: the wakeup latencies get added to whatever small
contention there is, and quickly snowball out of control!
I believe Yanmin's findings and numbers support this analysis too.
The best fix for this problem is to use the same scheduling logic that
the kernel/mutex.c code uses: keep the wake-one behavior (that is OK and
wanted because we do not want to over-schedule), but also allow
opportunistic locking of the lock even if a wakee is already "in
flight".
The patch below implements this new logic. With this patch applied the
AIM7 regression is largely fixed on my quad testbox:
# v2.6.25 vanilla:
..................
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
2000 56096.4 91 207.5 789.7 0.4675
2000 55894.4 94 208.2 792.7 0.4658
# v2.6.26-rc1-166-gc0a1811 vanilla:
...................................
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
2000 33230.6 83 350.3 784.5 0.2769
2000 31778.1 86 366.3 783.6 0.2648
# v2.6.26-rc1-166-gc0a1811 + semaphore-speedup:
...............................................
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
2000 55707.1 92 209.0 795.6 0.4642
2000 55704.4 96 209.0 796.0 0.4642
i.e. a 67% speedup. We are now back to within 1% of the v2.6.25
performance levels and have zero idle time during the test, as expected.
Btw., interactivity also improved dramatically with the fix - for
example console-switching became almost instantaneous during this
workload (which after all is running 2000 tasks at once!), without the
patch it was stuck for a minute at times.
There's another nice side-effect of this speedup patch, the new generic
semaphore code got even smaller:
text data bss dec hex filename
1241 0 0 1241 4d9 semaphore.o.before
1207 0 0 1207 4b7 semaphore.o.after
(because the waiter.up complication got removed.)
Longer-term we should look into using the mutex code for the generic
semaphore code as well - but i's not easy due to legacies and it's
outside of the scope of v2.6.26 and outside the scope of this patch as
well.
Bisected-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Move the scattered checks for PAT support to a single function. Its
moved to addon_cpuid_features.c as this file is shared between 32 and
64 bit.
Remove the manipulation of the PAT feature bit and just disable PAT in
the PAT layer, based on the PAT bit provided by the CPU and the
current CPU version/model white list.
Change the boot CPU check so it works on Voyager somewhere in the
future as well :) Also panic, when a secondary has PAT disabled but
the primary one has alrady switched to PAT. We have no way to undo
that.
The white list is kept for now to ensure that we can rely on known to
work CPU types and concentrate on the software induced problems
instead of fighthing CPU erratas and subtle wreckage caused by not yet
verified CPUs. Once the PAT code has stabilized enough, we can remove
the white list and open the can of worms.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
We want drivers to be able to use geode_has_vsa2 without having to worry
about what model geode is being compiled for. This patch ensures that
geode_has_vsa2 is always defined.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
This moves geode_has_vsa2 into a .c file, caches the result we get from
the VSA virtual registers, and causes the function to no longer be inline.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
commit e26a28d190304d910ee49b81cbfe6d9241f56e86
x86: olpc build fix
was a fix to a patch that was withdrawn/delayed and then erroneously
commited to x86.git. Revert it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This reverts commit c3270e577c18b3d0e984c3371493205a4807db9d.
|
|
Found these two bugs while browsing through the code. The first one is
a cut-n-paste bug, instead of disabling the clock when request_irq()
fails, it enabled it once more. The second one fixes a debug printout,
AT91_SSC_IER is write only, AT91_SSC_IMR is readable (the printed string
actually says imr).
Frank Mandarino was busy so he asked me to send these to this list.
/Patrik
Signed-off-by: Patrik Sevallius <patrik.sevallius@enea.com>
Acked-by: Frank Mandarino <fmandarino@endrelia.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
There's more checkpatch stuff to fix in the driver, this just fixes the
minimum required for the following patch to be clean.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
arch/arm/mach-pxa/lubbock.c:399: error: expected '}' before ';' token
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
dev_open() and dev_close() must be called holding the RTNL, since they
call device functions and netdevice notifiers that are promised the RTNL.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The tx packet counting and the local loopback of CAN frames should
only happen in the case that the CAN frame has been enqueued to the
netdevice tx queue successfully.
Thanks to Andre Naujoks <nautsch@gmail.com> for reporting this issue.
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Urs Thuermann <urs@isnogud.escape.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
|
|
When a net namespace is destroyed, some devices (those, not killed
on ns stop explicitly) are moved back to init_net.
The problem, is that this net_ns change has one point of failure -
the __dev_alloc_name() may be called if a name collision occurs (and
this is easy to trigger). This allocator performs a likely-to-fail
GFP_ATOMIC allocation to find a suitable number. Other possible
conditions that may cause error (for device being ns local or not
registered) are always false in this case.
So, when this call fails, the device is unregistered. But this is
*not* the right thing to do, since after this the device may be
released (and kfree-ed) improperly. E. g. bridges require more
actions (sysfs update, timer disarming, etc.), some other devices
want to remove their private areas from lists, etc.
I. e. arbitrary use-after-free cases may occur.
The proposed fix is the following: since the only reason for the
dev_change_net_namespace to fail is the name generation, we may
give it a unique fall-back name w/o %d-s in it - the dev<ifindex>
one, since ifindexes are still unique.
So make this change, raise the failure-case printk loglevel to
EMERG and replace the unregister_netdevice call with BUG().
[ Use snprintf() -DaveM ]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
config values
When conntrack and DCCP/SCTP protocols are enabled, chances are good
that people also want DCCP/SCTP conntrack and NAT support.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
request
Some Inovaphone PBXs exhibit very stange behaviour: when dialing for
example "123", the device sends INVITE requests for "1", "12" and
"123" back to back. The first requests will elicit error responses
from the receiver, causing the SIP helper to flush the RTP
expectations even though we might still see a positive response.
Note the sequence number of the last INVITE request that contained a
media description and only flush the expectations when receiving a
negative response for that sequence number.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As noticed by Ben Greear, macvlan crashes the kernel when unloading the
module. The reason is that it tries to clean up the macvlan_port pointer
on the macvlan device itself instead of the underlying device. A non-NULL
pointer is taken as indication that the macvlan_handle_frame_hook is
valid, when receiving the next packet on the underlying device it tries
to call the NULL hook and crashes.
Clean up the macvlan_port on the correct device to fix this.
Signed-off-by; Patrick McHardy <kaber@trash.net>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RFC 1122 does not have a section 3.1.2.2. The requirement to silently
discard datagrams with a bad checksum is in section 3.2.1.2 instead.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=10611
Signed-off-by: J.H.M. Dassen (Ray) <jdassen@debian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Note: there's actually another bug in FRTO's SACK variant, which
is the causing failure in NewReno case because of the error
that's fixed here. I'll fix the SACK case separately (it's
a separate bug really, though related, but in order to fix that
I need to audit tp->snd_nxt usage a bit).
There were two places where SACK variant of FRTO is getting
incorrectly used even if SACK wasn't negotiated by the TCP flow.
This leads to incorrect setting of frto_highmark with NewReno
if a previous recovery was interrupted by another RTO.
An eventual fallback to conventional recovery then incorrectly
considers one or couple of segments as forward transmissions
though they weren't, which then are not LOST marked during
fallback making them "non-retransmittable" until the next RTO.
In a bad case, those segments are really lost and are the only
one left in the window. Thus TCP needs another RTO to continue.
The next FRTO, however, could again repeat the same events
making the progress of the TCP flow extremely slow.
In order for these events to occur at all, FRTO must occur
again in FRTOs step 3 while the key segments must be lost as
well, which is not too likely in practice. It seems to most
frequently with some small devices such as network printers
that *seem* to accept TCP segments only in-order. In cases
were key segments weren't lost, things get automatically
resolved because those wrongly marked segments don't need to be
retransmitted in order to continue.
I found a reproducer after digging up relevant reports (few
reports in total, none at netdev or lkml I know of), some
cases seemed to indicate middlebox issues which seems now
to be a false assumption some people had made. Bugzilla
#10063 _might_ be related. Damon L. Chesser <damon@damtek.com>
had a reproducable case and was kind enough to tcpdump it
for me. With the tcpdump log it was quite trivial to figure
out.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to acquire the parent i_mutex with I_MUTEX_PARENT to keep
lockdep happy.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
|
|
spu_run
We should not requeue the victim context in find_victim if the owner is
not in spu_run. It's first not needed because leaving the context on
the spu is an optimization and second is harmful because it means the
owner could re-enter spu_run when the context is on the runqueue and
trip the BUG_ON in __spu_update_sched_info.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
|
|
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Make sure register modifications stay atomic. Fixes processors with
shared priority register masking. Dual bitmap masking is unaffected.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
The processor models sh7706, sh7707 and sh7709 don't support high
level trigger sense configuration. And the intc code looks like
crap these days so what's the difference.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Clean up the intc tables by removing unneeded #ifdefs. The vector
list is what selects which interrupt sources that should be added,
having unsupported bitfields listed is ok as long as the vector
is excluded from the list.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
This patch adds interrupt acknowledge code for external interrupt
sources on sh3 processors. Only really required for edge triggered
interrupts, but we ack regardless of sense configuration.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
This patch unifies the sh3 external irq pin code. It buys us some
savings with reduced code redundancy, but the main feature with
this change is irq sense selection support for all sh3 processors.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Only write to hardware in SCI_OUT() if the register size is valid.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
The sh7722 serial ports all lack SCSPTR registers, so mark them as
nonexistent in the register table.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Improve sh7722 support for SCIF1 and SCIF2 and separate code
from sh7366 implementation.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Reset the transmitter and receiver when setting up early printk.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Drain by waiting for all characters to be sent, and make sure to
wait a little bit after setting up the baud rate.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Use sci_out() instead of ctrl_outw() for early printk setup code.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Add physical memory resources such as System RAM, Kernel code/data/bss
and reserved crash dump area to /proc/iomem. Same strategy as on x86.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Do like everyone else and have a struct resource for kernel bss.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Fix sh7705 interrupt sources for vectors 0xc80 and 0xca0.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Select smc91x bus width using platform data for se7722 now when the
smc91x header file is in place.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Select smc91x bus width and irg flags using platform data for MigoR
now when the smc91x header file is in place.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Sam Ravnborg <sam@ravnborg.org>
CC: linux-sh@vger.kernel.org
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
ioremap() and friends get the size information right, so force everything
to go through there.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
SH_MPC1211 has been marked as BROKEN for some time.
Unless someone is working on reviving it now, I'd therefore suggest this
patch to remove it.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|