USC-SIPI REPORT #346

Technical Report USC-SIPI-346

“Optoelectronic Enhancement to Single Instruction Multiple Data Processing Archtectures”

by Bogdan Hoanca

May 1999

Electronic single instruction multiple data (SIMD) architectures are used for parallel computation, for example image processing or real-time array processing. Such architectures are often communications limited. Using optoelectronic interconnections for global connectivity can alleviate the communications bottleneck. We present optimizations of the optoelectronic architecture at four levels: interconnection topology, optical system, electrical system and inter-chip network.

In optimizing the interconnection topology we concentrate on cellular optoelectronic interconnects, because they are space-invariant. Previously, cellular interconnects have demonstrated incremental improvements, but were unable to predict the ultimate performance achievable, and their design involved a trial-and-error approach. We present a deterministic algorithm for designing optimal cellular interconnections (OCIs). According to formal proofs and numerical simulations, OCIs require the minimum number of clock cycles per data shift for a given number of optical links.

A wavelength and polarization multiplexed optical interconnect can distribute parallel instructions and clock information in the SIMD array. The feasibility of such a multiplexed interconnection, as well as the architecture and the thermal management of the optical system, depend heavily on the choice of optical source. Optical devices allow faster switching than electronic devices, but their advantages are often masked by a bottleneck at the interface between the optical and electronic domains. An on-chip finite state machine acting as a rate-converter can eliminate this bottleneck. We also discuss the design of timing circuitry and issues on skew in parallel optical communications channels.

Stacking SIMD processing arrays in a network provides extremely high computational and communications throughput by passing 2-D packets over multiple parallel channels distributed across the whole area of the chip. In optimizing widely parallel networks, traffic considerations interplay with requirements on allowable skew. Optimum network performance is attained in general for a large number of channels. Clocked translucent nodes minimize the network delay.

To demonstrate the optimization techniques described, we designed and fabricated TRANSPAR, a smart pixel integrated circuit with networking and SIMD processing applications. TRANSPAR chips are interconnected into a high-throughput ring network and combine their SIMD processing power, operating as a massively parallel pipeline computational system.

Technical Report USC-SIPI-346

To download the report in PDF format click here: USC-SIPI-346.pdf (24.9Mb)