Age | Commit message (Collapse) | Author |
|
For each FP input, don't assume that the VP output will be
at the same position, but scan the semantics instead, then
put the correct output reg indices into VP_RESULT_MAP.
Position is still assumed to be the first output/input.
See 07fafc7c9346aa260829603bf3188596481e9e62, which renders
previous assumptions incorrect.
|
|
|
|
Simplifies things since the second to last one will then
be converted in the subsequent pass that ensures alignment
automatically.
|
|
|
|
|
|
|
|
Don't assume that a SET that writes to IF's argument
directly precedes the IF.
|
|
Will use AND for gl_FrontFacing, the face input
is either 0 or 0xffffffff.
|
|
Adds a more generic SIFC transfer function.
|
|
We have to indicate to the hw whether the FP exports
multiple colour results.
Method 0x121c is used to specify the number of RTs.
Also deactivate zeta explicitly if there's no zsbuf.
|
|
|
|
|
|
We should really learn to not waste so many though.
|
|
Contained some rather obvious thinking errors before,
and didn't consider offsets from TGSI ADDRESS regs.
|
|
These haven't been used by the mesa state tracker since the
conversion to tgsi_ureg, and it seems that none of the
other state trackers are using it either.
This helps simplify one of the biggest suprises when starting off with
TGSI shaders.
|
|
Allow indirect uniform access and increase the
limit on parameters from 128 to 512.
|
|
|
|
|
|
We only have a per nv50_reg negation flag, if an
nv50_reg is used more than once in a TGSI op with
different sign modes, we'd generate wrong code.
We probably can't do much better without more
invasive changes.
|
|
|
|
If you e.g. only need alpha, it ends up in the first reg,
not the last, as it would when reading rgb too.
|
|
|
|
Separated the integer rounding mode flag for cvt.
|
|
There's a good chance a loop won't execute correctly
though since our TEMP allocation assumes programs to
be executed linearly. Will fix later.
|
|
|
|
When swapping sources 0 and 1, EQ of course does *not*
become NE, etc.
Introduced in 2b963f5c723401aa2646bd48eefe065cd335e280.
|
|
Allocation is unnecessary since all uniforms are
uploaded on every constant buffer change anyway.
|
|
|
|
|
|
|
|
This moves construction of the mapping between VP outputs
and FP inputs into validation.
The map also contains slots for special outputs like clip
distance and point size, so we need to at least merge the
VP related and FP related parts on validation if we want
to support those.
Now we match every single FP input component with results
from the VP and leave those not read out of the map, or
replace those not written by 0 (xyz) or 1 (w).
The bitmap indicating linear interpolants is also filled,
and flat FP inputs are mapped in only after non-flat ones,
as is required.
Furthermore, we can save some space by only fetching VP
attrs we actually use, and avoid wasting any output regs
because of TGSI using less than 4 components.
|
|
Make use of tgsi_shader_info to determine how many nv50_regs we
need to allocate, whether program uses KIL, or writes DEPR.
|
|
|
|
|
|
|
|
|
|
Makes some opcode cases nicer and might reduce the total
nr of TEMPs required, or save some MOVs.
|
|
|
|
We're going to try to reorder the scalar ops of a vector instr
to accomodate swizzles that would otherwise require us to emit
to an additional TEMP first (like MOV R0.xy, R0.zx).
|
|
Extend its usage to avoiding e.g. emission of negation
instructions in tx_insn for sources we don't need.
|
|
Before this, just the perspective divide bit was moved in
convert_to_long of the load interpolant instruction.
|
|
|
|
|
|
The TEX instruction is passed the first index of a contiguous
range of 4 TEMP registers that contain coordinates / LOD and,
after execution, the texel values.
It seems the first index is required to be a multiple of 4 on
some (older ?) cards.
|
|
|
|
Remove the need to have a pointer in this struct by just including
the immediate data inline. Having a pointer in the struct introduces
complications like needing to alloc/free the data pointed to, uncertainty
about who owns the data, etc. There doesn't seem to be a need for it,
and it is unlikely to make much difference plus or minus to performance.
Added some asserts as we now will trip up on immediates with more
than four elements. There were actually already quite a few such asserts,
but the >4 case could be used in the future to specify indexable immediate
ranges, such as lookup tables.
|
|
|
|
|
|
libdrm_nouveau is linked with the winsys, there's no good reason to do all
this through yet another layer.
|
|
|