Age | Commit message (Collapse) | Author |
|
If we can't fit all the VS outputs into the MRF, we need to overflow into
temporary GRF registers, then use some MOVs and a second brw_urb_WRITE()
instruction to place the overflow vertex results into the URB.
This is hit when a vertex/fragment shader pair has a large number of varying
variables (12 or more).
There's still something broken here, but it seems close...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
the driver used to overwrite grf0 then use implicit move by send instruction
to move contents of grf0 to mrf1. However, we must not overwrite grf0 since
it's still used later for fb write.
Instead, do the move directly do mrf1 (we could use implicit move from another
grf reg to mrf1 but since we need a mov to encode the data anyway it doesn't
seem to make sense).
I think the dp_READ/WRITE_16 functions may suffer from the same issue.
While here also remove unnecessary msg_reg_nr parameter from the dataport
functions since always message register 1 is used.
|
|
It's the last addressable byte, not the byte after the end of the buffer.
|
|
I noticed this when this MI_FLUSH showed up in IPEHR for the ut2004 hang.
Not setting the reserved bit didn't help, though.
|
|
Fixes shadowtex.c. And an assert is added to catch this sooner next time.
|
|
|
|
We could have mapped the wrong set of draw buffers. Noticed while looking
into a DRI2 glean ReadPixels issue.
|
|
|
|
|
|
|
|
|
|
We were packing according to the pitch, while the hardware appears to base
it on the base level width.
With this and the previous commit, fbo-cubemap now matches untiled behavior.
|
|
3D rendering to tiled textures was being done with non-tile-aligned offsets.
The G4X hardware has fields to let us support it easily and correctly, while
the pre-G4X hardware requires a path full of suffering, so we just fall back.
|
|
Conflicts:
src/mesa/main/api_validate.c
|
|
For the TXP instruction we check if the texcoord is really a 4-component
atttibute which requires the divide by W step. This check involved the
projtex_mask field. However, the projtex_mask field was being miscalculated
because of some confusion between vertex program outputs and fragment
program inputs.
1. Rework the size_masks calculation so we correctly set bits corresponding
to fragment program input attributes.
2. Rename projtex_mask to proj_attrib_mask since we're interested in more
than just texcoords (generic varying vars too).
3. Simply the indexing of the size_masks and proj_attrib_mask fields.
4. The tracker::active[] array was mis-dimensioned. Use MAX_PROGRAM_TEMPS
instead of a magic number.
5. Update comments, add new assertions.
With these changes the Lightsmark demo/benchmark renders correctly, until
we eventually hit a GPU lockup...
|
|
glsl compiler will not generate OPCODE_SWZ, and as a first step it would
be translated away to a MOV anyway (why?), but later internally this opcode is
generated (for EXT_texture_swizzling).
|
|
...rather than with linear interpolation. Modern hardware should use
perspective-corrected interpolation for colors (as for texcoords).
glHint(GL_PERSPECTIVE_CORRECTION_HINT, mode) can be used to get
linear interpolation if mode = GL_FASTEST.
|
|
This is about a 30% performance win in OA with high settings on my GM45,
and experiments with 915GM indicate that it'll be around a 20% win there.
Currently, 915-class hardware is seriously hurt by the fact that we use
fence regs to control the tiling even for 3D instructions that could live
without them, so we spend a bunch of time waiting on previous rendering in
order to pull fences off. Thus, the texture_tiling driconf option defaults
off there for now.
|
|
This gets two more glean glsl1 tests using the non-GLSL path.
|
|
The broken indentation was driving me crazy, so fix other stuff while
I'm here.
|
|
Noticed while debugging a weird 1D FBO testcase that left its existing
viewport and projection matrix in place when switching drawbuffers. Didn't
fix the testcase, though.
|
|
|
|
I don't have a testcase for this, but it seems clearly wrong.
|
|
Before, if the VP output something that is in the attributes coming into
the WM but which isn't used by the WM, then WM would end up reading subsequent
varyings from the wrong places. This was visible with a GLSL demo
using gl_PointSize in the VS and a varying in the WM, as point size is in
the VUE but not used by the WM. There is now a regression test in piglit,
glsl-unused-varying.
|
|
With 1D textures, GL_TEXTURE_WRAP_T should be ignored (only
GL_TEXTURE_WRAP_S should be respected). But the i965 hardware
seems to follow the value of GL_TEXTURE_WRAP_T even when sampling
1D textures.
This fix forces GL_TEXTURE_WRAP_T to be GL_REPEAT whenever 1D
textures are used; this allows the texture to be sampled
correctly, avoiding "imaginary" border elements in the T direction.
This bug was demonstrated in the Piglit tex1d-2dborder test.
With this fix, that test passes.
|
|
Not 100% sure this is right, but the invalid assertion is fixed...
|
|
|
|
|
|
Fixes failed assertion in progs/glsl/twoside.c (but still wrong rendering).
|
|
Looking for memory leaks that were causing crashes in my environment
in a situation where valgrind would not work, I ended up improving
the i965 debug traces so I could better see where the memory was
being allocated and where it was going, in the regions and miptrees
code, and in the state caches. These traces were specific enough
that external scripts could determine what elements were not being
released, and where the memory leaks were.
I also ended up creating my own backtrace code in intel_regions.c,
to determine exactly where regions were being allocated and for what,
since valgrind wasn't working. Because it was useful, I left it in,
but disabled and compiled out. It can be activated by changing a flag
at the top of the file.
|
|
When out of memory (in at least one case, triggered by a longrunning
memory leak), this code will segfault and crash. By checking for the
out-of-memory condition, the system can continue, and will report
the out-of-memory error later, a much preferable outcome.
|
|
In addition to being HW accelerated, it avoids the incorrect
(black) rendering of the mipmaps that SW was doing in fbo-generatemipmap.
Improves the performance of the mipmap generation and drawing in
fbo-generatemipmap by 30%.
|
|
|
|
They seem to be used for something else and using them for shader temps
seems to lead to GPU lock-ups.
Call _mesa_warning() when we run out of temps.
Also, clean up some debug code.
|
|
|
|
Thanks to branching, the state of c->current_const[i].index at the point
of emitting constant loads for this instruction may not match the actual
constant currently loaded in the reg at runtime. Fixes a regression in my
GLSL program for idr's class since b58b3a786aa38dcc9d72144c2cc691151e46e3d5.
|
|
This snuck in with the multi-draw-buffers commit, and is a major penalty
to performance. It doesn't appear to be required, as the only dependency
the surface BO has is on the state key (and if there's some other dependency,
it should just be in the key).
This brings openarena performance up to almost 2% faster than Mesa 7.4.
|
|
This was a leftover from the brw_wm_constant_buffer change.
|
|
This can avoid re-uploading constant data when it isn't necessary, and is
a step towards not updating other surfaces just because constants change.
It also brings the upload of the constant buffer next to the creation.
This brings openarena performance up another 4%, to 91% of the Mesa 7.4 branch.
|
|
Also, only create VS surface state if there's a VS constant buffer to be
uploaded, and set the contents of the buffer at the same time as creation.
|
|
Really, the creation and upload of constants should be in the same place,
since they should only happen together, and a state flag should be
triggered by them so that we don't thrash state around so much for just
updating constants. But this still recovers openarena performance by
another 19%, leaving us 16% behind Mesa 7.4 branch.
|
|
Conflicts:
src/mesa/drivers/dri/i965/brw_curbe.c
src/mesa/drivers/dri/i965/brw_vs_emit.c
src/mesa/drivers/dri/i965/brw_wm_glsl.c
|
|
|
|
Make the use_const_buffer field per-program and only call the code which
updates the constant buffer's data if the flag is set.
This should undo the perf regression from 20f3497e4b6756e330f7b3f54e8acaa1d6c92052
(cherry picked from master, commit dc9705d12d162ba6d087eb762e315de9f97bc456)
|