Age | Commit message (Collapse) | Author |
|
Everything has been constant-sized until now, but constant buffer
handling changes will make us want some additional variable sized
array.
|
|
Saves ~2KB of code.
|
|
Saves ~480 bytes of code.
|
|
Add a GLbitfield64 type and several macros to operate on 64-bit
fields. The OutputsWritten field of gl_program is changed to use that
type. This results in a fair amount of fallout in drivers that use
programs.
No changes are strictly necessary at this point as all bits used are
below the 32-bit boundary. Fairly soon several bits will be added for
clip distances written by a vertex shader. This will cause several
bits used for varyings to be pushed above the 32-bit boundary. This
will affect any drivers that support GLSL.
At this point, only the i965 driver has been modified to support this
eventuality.
I did this as a "squash" merge. There were several places through the
outputswritten64 branch where things were broken. I foresee this
causing difficulties later for bisecting. The history is still
available in the branch.
Conflicts:
src/mesa/drivers/dri/i965/brw_wm.h
|
|
|
|
1. new PCI ids
2. fix some 3D commands on new chipset
3. fix send instruction on new chipset
4. new VUE vertex header
5. ff_sync message (added by Zou Nan Hai <nanhai.zou@intel.com>)
6. the offset in JMPI is in unit of 64bits on new chipset
7. new cube map layout
|
|
Hook up a constant buffer, binding table, etc for the VS unit.
This will allow using large constant buffers with vertex shaders.
The new code is disabled at this time (use_const_buffer=FALSE).
|
|
|
|
|
|
Turns out that XXX comment was important. We weren't flagging the WM to
re-update with the statistics enable, so we got zeroes out of our query.
Bug #20740, fixes piglit occlusion_query test.
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
This function scans the shader to see if it has any GLSL features like
conditionals and loops. Calling this during state validation is expensive.
Just call it when the shader is given to the driver and save the result.
There's some new/temporary assertions to be sure we don't get out of sync
on this.
|
|
s/FRAG_RESULT_DEPR/FRAG_RESULT_DEPTH/
s/FRAG_RESULT_COLR/FRAG_RESULT/COLOR/
Remove FRAG_RESULT_COLH (NV half-precision) output since we never used it.
Next, we might merge the COLOR and DATA outputs (COLOR0, COLOR1, etc).
|
|
|
|
The CACHE_NEW_SURFACE bit always gets spammed since we get many different
surface BOs per state emit, but the only consumer of it wanted to just know
how many surfaces were enabled.
|
|
This was causing a prepare of wm state at every primitive emit.
|
|
|
|
|
|
This reverts commit 7c81124d7c4a4d1da9f48cbf7e82ab1a3a970a7a.
|
|
This reverts commit 53675e5c05c0598b7ea206d5c27dbcae786a2c03.
Conflicts:
src/mesa/drivers/dri/i965/brw_wm_surface_state.c
|
|
To do this, I had to clean up some of 965 state upload stuff. We may end
up over-emitting state in the aperture overflow case, but that should be rare,
and I'd rather have the simplification of state management.
|
|
This is an API breakage only.
|
|
|
|
The GEM flags are much more descriptive for what we need. Since this makes
bufmgr_fake rather device-specific, move it to the intel common directory.
We've wanted to do device-specific stuff to it before.
|
|
Makes state emission into a 2 phase, prepare sets things up and accounts
the size of all referenced buffer objects. The emit stage then actually
does the batchbuffer touching for emitting the objects.
There is an assert in dri_emit_reloc if a reloc occurs for a buffer
that hasn't been accounted yet.
|
|
|
|
We have two consumers of relocations. One is static state buffers, which
want the same relocation every time. The other is the batchbuffer, which gets
thrown out immediately after submit. This lets us reduce repeated computation
for static state buffers, and clean up the code by moving relocations nearer
to where the state buffer is computed.
|
|
|
|
|
|
The user-space suballocator that was used avoided relocation computations by
using the general and surface state base registers and allocating those types
of buffers out of pools built on top of single buffer objects. It also
avoided calls into the buffer manager for these small state allocations, since
only one buffer object was being used.
However, the buffer allocation cost appears to be low, and with relocation
caching, computing relocations for buffers is essentially free. Additionally,
implementing the suballocator required a don't-fence-subdata flag to disable
waiting on buffer maps so that writing new data didn't block on rendering using
old data, and careful handling when mapping to update old data (which we need
to do for unavoidable relocations with FBOs). More importantly, when the
suballocator filled, it had no replacement algorithm and just threw out all
of the contents and forced them to be recomputed, which is a significant cost.
This is the first step, which just changes the buffer type, but doesn't yet
improve the hash table to not result in full recompute on overflow. Because
the buffers are all allocated out of the general buffer allocator, we can
no longer use the general/surface state bases to avoid relocations, and they
are set to 0 instead.
|
|
Putting the bufmgr in the screen is not thread-safe since the emit_reloc
changes. It also led to a significant performance hit from pthread usage
for the attempted thread-safety (up to 12% of a cpu spent on refcounting
protection in single-threaded 965). The motivation had been to allow
multi-context bufmgr sharing in classic mode, but it wasn't worth the cost.
|
|
This is currently believed to work but be a significant performance loss.
Performance recovery should be soon to follow.
The dri_bo_fake_disable_backing_store() call was added to allow backing store
disable like bufmgr_fake.c did, which is a significant performance win (though
it's missing the no-fence-subdata part).
This commit is a squash merge of the 965-ttm branch, which had some history
I wanted to avoid pulling due to noisiness and brokenness at many points
for git-bisecting.
|
|
Conflicts:
src/mesa/drivers/dri/i965/brw_sf.h
src/mesa/drivers/dri/i965/intel_context.c
|
|
In the process, fix some alignment issues:
- Scratch space allocation was aligned into units of 1KB, while the allocation
wanted units of bytes, so we never allocated enough space for scratch.
- GRF register count was programmed as ALIGN(val - 1, 16) / 16 instead of
ALIGN(val, 16) / 16 - 1, which overcounted for val != 16n+1.
|
|
This reverts commit b2f1aa2389473ed09170713301b042661d70a48e.
Somehow I ended up with my branch's save-this-while-I-work-on-master commit
actually on master.
|
|
|
|
|
|
most of the sample working with some small modification
|
|
vertex/fragment programs provided as const.
bmSetFenceLock should return bmSetFence value.
|
|
Signed-off-by: Keith Packard <keithp@neko.keithp.com>
|
|
a multiple of alignment to avoid accumulating unusable free blocks.
|
|
we render. Currenly requires that some state be re-examined after
every LOCK_HARDWARE().
|
|
This driver comes from Tungsten Graphics, with a few further modifications by
Intel.
|