SESSION SP25

SALON 9

PROCESSORS AND LOGIC

Chair: D. Wingard, San Carlos, CA
Associate Chair: T. Williams, Los Gatos, CA

25.1 - A 200MHz RISC Microprocessor with 128kB On-Chip Caches - 1:30 PM

W. Kever, S. Ziai, D. Weiss, M. Hill Hewlett Packard Co., Ft. Collins, CO

A dual-issue 32b PA-RISC microprocessor features dual 64kB, two-way set-associative caches with 2.7ns read latency. In 0.5um 3.3V CMOS, it operates at up to 200MHz and exceeds 7.5SPECint95 and 7.5SPECfp95 at 160MHz.

25.2 - A 250MHz 5W RISC Microprocessor with On-Chip L2 Cache Controller - 2:00 PM

P. Reed, M. Alexander, J. Alvarez, M. Brauer, C. Chao, C. Croxton, L. Eisen1, T. Le, T. Ngo1, C. Nicoletta, H. Sanchez, S. Taylor1, N. Vanderschaaf1, G. Gerosa Motorola, Inc., Austin, TX 1IBM Corp., Austin, TX

A 250MHz 5W 32b superscalar RISC microprocessor contains two 32kB 8-way set-associative caches, 6 execution units and an integrated L2 cache controller and tag RAM. Dynamic power management and 3 operating modes reduce power dissipation. The 6.35M-transistor 67mm2 die is fabricated in 2.5V 0.25um 5 layer metal CMOS.

25.3 - A 5ns Store Barrier Cache with Dynamic Prediction of Load/Store Conflicts in Superscalar Processors - 2:30 PM

D. Adams, A. Allen, J. Bergkvist, R. Flaker, J. Hesson, J. LeBlanc IBM Microelectronics Division, Essex Junction, VT

A 32-entry two-port cache enhances superscalar microprocessor performance by dynamically predicting and reducing load/store hazards. Conflict prediction incorporates a dynamic-logic greater-than-or-equal-to compare circuit to achieve 5ns performance in 0.5um CMOS.

BREAK 3:00 PM

25.4 - A 4.1ns Compact 54x54b Multiplier Using Sign Select Booth Encoders - 3:15 PM

A. Inoue, R. Ohe, S. Kashiwakura, S. Mitarai1, T. Tsuru1, T. Izawa1, G. Goto Fujitsu Labs Ltd., Nakahara, Japan 1Fujitsu Ltd., Nakahara, Japan

A 54x54b multiplier with 4.1ns latency at 2.5V supply and a 1.27mm2 active area was implemented on 0.25um CMOS with 5.5nm gate oxide and Co salicidation process. Using sign select Booth encoders, the total transistor count is reduced by 24% from that of a conventional multiplier.

25.5 - An Early-Completion-Detecting ALU for a 1GHz 64b Datapath - 3:45 PM

Y. Kondo, N. Ikumi, K. Ueno, J. Mori, M. Hirano Toshiba Corporation, Kanagawa, Japan

An ALU datapath with register file uses 0.25um 5-layer metal CMOS. With a completion detector to determine when carry-propagation is short, a latency of one 1ns cycle is achieved for 87% of operands, resulting in an average latency of 1.125ns.

25.6 - Half Rail Differential Logic - 4:00 PM

S. Choe, G. Rigby University of New South Wales, Sydney, Australia

A power-reduction technique called HRDL recycles charge used in the evaluation cycle for reuse in the precharge cycle. Power is reduced between 30% to 50% compared to DCVS. An adder implementation shows a 5-fold improvement in power-delay product. Power efficiency and delay are comparable to CRDL but HRDL does not require a separate supply to raise the n-well voltage.

25.7 - Skew-Tolerant Domino Circuits - 4:15 PM

D. Harris1, M. Horowitz Stanford University, Stanford, CA 1also with Intel Corp., Santa Clara, CA

Self-timed design principles are applied to synchronously-clocked domino circuits. Overlapping clock phases tolerate clock skew, eliminate latches from the critical path, and allow time borrowing to balance pipelines and compensate for timing uncertainities.

CONCLUSION 4:45 PM


Go back to Table of Contents