Pre-Grant Publication Number: 20080046686
Please help the USPTO examine the application by evaluating the relevance of the publicly submitted prior art to the patent application.
Peer-to-Patent forwards the Top 10 most relevant prior art submissions and their annotations to the United States Patent and Trademark Office.
Review this prior art and click on the thumbs up (or down) to indicate whether this submission should be forwarded to the USPTO.
If you login then you can add an annotation by typing in the box at the bottom of the screen to comment on the relevance of the prior art to the claims of the patent application.
Review this prior art and click on the thumbs up (or down) to indicate whether this submission should be forwarded to the USPTO.
If you login then you can add an annotation by typing in the box at the bottom of the screen to comment on the relevance of the prior art to the claims of the patent application.

Prior Art Detail
Summary / Description
| Summary / Description | We describe the design, implementation, and evaluation of a dual-issue SIMD-like extension of the PowerPC 440 floating-point unit (FPU) core. It has several novel features, such as a computational crossbar and cross-load/store instructions, which enhance the performance of numerical codes. |
Basic Information
| Type of Prior Art | Print Publication |
| Publication Title * | A High-Performance SIMD Floating Point Unit for BlueGene/L:Architecture, Compilation, and Algorithm |
| Author | Leonardo Bachega et all |
| ISBN | Proceedings of the 13th I |
| Page Range | |
| Medium | Other printed publication |
| Publication Date * | January 1, 2004 |
| URL | |
Notes / To Do
| Notes | |
Excerpt
Excerpt Along with the two register files, there are also primary
and secondary pairs of datapaths, each consisting
of a computational datapath and a load/store datapath.
The primary (resp., secondary) datapath pair write their
results only to the primary (resp., secondary) register
file. Likewise, for each computational datapath, the B
operand of the FMA is fed from the corresponding register
file. However, the real power comes from the operand
crossbar that allows the primary computational datapath
to get its A and C operands from either register file. This
crossbar mechanism enabled us to create useful operations
that accelerate matrix and complex-arithmetic operations.
The power of the computational crossbar is enhanced
by cross-load and cross-store instructions, which
add flexibility by allowing the primary and secondary
operands to be swapped as they are moved between the
register files and memory. |
Relevance
Claims
1
A processor that processes Single Instruction Multiple Data (SIMD) instructions, which processor comprises:
an Instruction Fetch Unit that loads a SIMD instruction and applies it as input to a SIMD Instruction Decode Unit; and wherein
the SIMD Instruction Decode Unit decodes the applied SIMED instruction and produces output signals including: (a) SIMD field width identification signals; (b) SIMD source operand identification signals; and (c) SIMD half-operand modifier signals.
Relevance
Bachega et al’s work is another example where there is a permute/crossbar inserted in the path between the register file and the processing unit, as seen in Figure 1 of their paper. In this particular case, the granularity of selection is 64 bit quantities. As in this disclosure, it is the operation being fetched and decoded that drives the permute/crossbar so as to select which of the high/low values (here high and low are the real and imaginary part of a complex number).
In light of prior work on permute operations, which can handle smaller granularities (up to 8 bits on VMX or Cell BE/ SPE) which could further be extended to 1 bit quantities.
Furthermore, Bachega et al’s work does not insert “zero” values requested by the “shift” or and with zero bits. Adding this is trivial work in light of permute logics such as found on the Cell BE/ SPU shuffle instruction (for refs, see [Power Efficient Processor Architecture and The Cell Processor, Hofstee, HPCA-11 2005])
As a result Bachega et al’s work fully covers this claim.
Bachega et al’s work is another example where there is a permute/crossbar inserted in the path between the register file and the processing unit, as seen in Figure 1 of their paper. In this particular case, the granularity of selection is 64 bit quantities. As in this disclosure, it is the operation being fetched and decoded that drives the permute/crossbar so as to select which of the high/low values (here high and low are the real and imaginary part of a complex number).
In light of prior work on permute operations, which can handle smaller granularities (up to 8 bits on VMX or Cell BE/ SPE) which could further be extended to 1 bit quantities.
Furthermore, Bachega et al’s work does not insert “zero” values requested by the “shift” or and with zero bits. Adding this is trivial work in light of permute logics such as found on the Cell BE/ SPU shuffle instruction (for refs, see [Power Efficient Processor Architecture and The Cell Processor, Hofstee, HPCA-11 2005])
As a result Bachega et al’s work fully covers this claim.
Claim Chart
All
2
The processor of Claim 1 that further comprises a SIMD Operand Fetch Unit that loads SIMD operand values wherein:
the SIMD field width identification signals, SIMD source operand identification signals and SIMD half-operand modifier signals are supplied as input to the SIMD Operand Fetch Unit;
in response to the SIMD source operand identification signals, the SIMD Operand Fetch Unit loads operand values;
n response to the SIMD field-width identification signals and the SIMD half-operand modifier signals, the SIMD Operand Fetch Unit applies half-operand modifications indicated by the SIMD half-operand modifier signals to the operand values to form output operand values; and
the SIMD Operand Fetch Unit outputs the output operand values.
Relevance
see claim 1
see claim 1
Claim Chart
All
3
The processor of Claim 2 wherein the SIMD half-operand modifier signals comprise two signals for each operand, the first signal h having a first value whenever a high half-operand modification is to be applied to the operand value and another value otherwise, and the second signal having a first value whenever a low half-operand modification is to be applied to the operand value and another value otherwise.
Relevance
see claim 1
see claim 1
Claim Chart
All
4
A method for performing Single Instruction Multiple Data (SIMD) instructions on a processor wherein one or more data values of total size N bits are partitioned into j fields of size n bits, where N=j*n for some integer j≧1, and a single instruction is applied to each field of the j fields; wherein the method comprises:
modifying each of the j fields of the one or more data values according to a half-operand modification prior to applying the instruction; and
applying the instruction to each field of the j fields.
Relevance
see claim 1
see claim 1
Claim Chart
All
5
The method of Claim 4 wherein:
the half-operand modification may be either a high half operand modification or a low half operand modification;
the high half-operand modification selects only the high n/2 bits of each n-bit field; and
the low half-operand modification selects only the low n/2 bits of each field.
Relevance
see claim 1
see claim 1
Claim Chart
All
0 days left






