A dual-issue 32b PA-RISC microprocessor features dual 64kB, two-way set-associative caches with 2.7ns read latency. In 0.5um 3.3V CMOS, it operates at up to 200MHz and exceeds 7.5SPECint95 and 7.5SPECfp95 at 160MHz.
A 250MHz 5W 32b superscalar RISC microprocessor contains two 32kB 8-way set-associative caches, 6 execution units and an integrated L2 cache controller and tag RAM. Dynamic power management and 3 operating modes reduce power dissipation. The 6.35M-transistor 67mm2 die is fabricated in 2.5V 0.25um 5 layer metal CMOS.
A 32-entry two-port cache enhances superscalar microprocessor performance by dynamically predicting and reducing load/store hazards. Conflict prediction incorporates a dynamic-logic greater-than-or-equal-to compare circuit to achieve 5ns performance in 0.5um CMOS.
A 54x54b multiplier with 4.1ns latency at 2.5V supply and a 1.27mm2 active area was implemented on 0.25um CMOS with 5.5nm gate oxide and Co salicidation process. Using sign select Booth encoders, the total transistor count is reduced by 24% from that of a conventional multiplier.
An ALU datapath with register file uses 0.25um 5-layer metal CMOS. With a completion detector to determine when carry-propagation is short, a latency of one 1ns cycle is achieved for 87% of operands, resulting in an average latency of 1.125ns.
A power-reduction technique called HRDL recycles charge used in the evaluation cycle for reuse in the precharge cycle. Power is reduced between 30% to 50% compared to DCVS. An adder implementation shows a 5-fold improvement in power-delay product. Power efficiency and delay are comparable to CRDL but HRDL does not require a separate supply to raise the n-well voltage.
Self-timed design principles are applied to synchronously-clocked domino circuits. Overlapping clock phases tolerate clock skew, eliminate latches from the critical path, and allow time borrowing to balance pipelines and compensate for timing uncertainities.