Age | Commit message (Collapse) | Author |
|
No measurable difference on cairoperf.
|
|
Everything has been constant-sized until now, but constant buffer
handling changes will make us want some additional variable sized
array.
|
|
|
|
This keeps the individual state files from having to export their
structures for brw_state_cache initialization.
|
|
This appears to shave about 3% off the CPU usage in cairo-gl for firefox.
|
|
Once we've freed a miptree, we won't see any more state cache requests
that would hit the things that pointed at it until we've let the miptree
get released back into the BO cache to be reused. By leaving those
surface state and binding table pointers that pointed at it around, we
would end up with up to (500 * texture size) in memory uselessly consumed
by the state cache.
Bug #20057
Bug #23530
|
|
Looking for memory leaks that were causing crashes in my environment
in a situation where valgrind would not work, I ended up improving
the i965 debug traces so I could better see where the memory was
being allocated and where it was going, in the regions and miptrees
code, and in the state caches. These traces were specific enough
that external scripts could determine what elements were not being
released, and where the memory leaks were.
I also ended up creating my own backtrace code in intel_regions.c,
to determine exactly where regions were being allocated and for what,
since valgrind wasn't working. Because it was useful, I left it in,
but disabled and compiled out. It can be activated by changing a flag
at the top of the file.
|
|
These types are only found in the new surface state cache now.
|
|
|
|
The new, second cache will only be used for surface-related items.
Since we can create many surfaces the original, single cache could get
filled quickly. When we cleared it, we had to regenerate shaders, etc.
With two caches, we can avoid doing that.
|
|
|
|
Makefile.template
|
|
This reverts commit 7c81124d7c4a4d1da9f48cbf7e82ab1a3a970a7a.
|
|
This reverts commit 53675e5c05c0598b7ea206d5c27dbcae786a2c03.
Conflicts:
src/mesa/drivers/dri/i965/brw_wm_surface_state.c
|
|
|
|
It was flagging a last_bo update even when last_bo didn't change, but
another part was failing to update last_bo when it should have.
|
|
|
|
This was around 3% improvement in OA.
|
|
Without this, the WM binding tables would all collide, for example. Improves
openarena performance by around 2%.
|
|
The user-space suballocator that was used avoided relocation computations by
using the general and surface state base registers and allocating those types
of buffers out of pools built on top of single buffer objects. It also
avoided calls into the buffer manager for these small state allocations, since
only one buffer object was being used.
However, the buffer allocation cost appears to be low, and with relocation
caching, computing relocations for buffers is essentially free. Additionally,
implementing the suballocator required a don't-fence-subdata flag to disable
waiting on buffer maps so that writing new data didn't block on rendering using
old data, and careful handling when mapping to update old data (which we need
to do for unavoidable relocations with FBOs). More importantly, when the
suballocator filled, it had no replacement algorithm and just threw out all
of the contents and forced them to be recomputed, which is a significant cost.
This is the first step, which just changes the buffer type, but doesn't yet
improve the hash table to not result in full recompute on overflow. Because
the buffers are all allocated out of the general buffer allocator, we can
no longer use the general/surface state bases to avoid relocations, and they
are set to 0 instead.
|
|
This is currently believed to work but be a significant performance loss.
Performance recovery should be soon to follow.
The dri_bo_fake_disable_backing_store() call was added to allow backing store
disable like bufmgr_fake.c did, which is a significant performance win (though
it's missing the no-fence-subdata part).
This commit is a squash merge of the 965-ttm branch, which had some history
I wanted to avoid pulling due to noisiness and brokenness at many points
for git-bisecting.
|
|
In the process, fix some alignment issues:
- Scratch space allocation was aligned into units of 1KB, while the allocation
wanted units of bytes, so we never allocated enough space for scratch.
- GRF register count was programmed as ALIGN(val - 1, 16) / 16 instead of
ALIGN(val, 16) / 16 - 1, which overcounted for val != 16n+1.
|
|
This code existed to dump logs of hardware access to be replayed in simulation.
Since we have real hardware now, it's not really needed.
|
|
This driver comes from Tungsten Graphics, with a few further modifications by
Intel.
|