Programmable Spatiotemporal Image Processor
A 16 x 16 pixel pseudo general-purpose vision chip for spatiotemporal focal-plane processing was designed and fabricated. The convolution of the image with programmable kernels are realized with area-efficient and real-time circuits. The chip s architecture allows the photoreceptor cells to be small and densely packed by performing all analog computations on the read-out, away from the array. The size, configuration and coefficients of the kernels can be varied on the fly. In addition to the raw intensity image, the chip outputs four processed images in parallel. The convolution is implemented with a digitally programmable analog processor, resulting in very low power consumption at high computation rates.
The key motivation behind this design is to perform all computation on read out. The benefits of computing the convolution kernels at readout are:
- each pixel can be as small as possible to allow high resolution imaging.
- a single processor unit is used for the entire retina thus reducing mismatch problems.
- programmability can be obtained with no impact on the density of imaging array
- compact pseudo general-purpose focal plane visual processing is realizable. The space constrains are then transformed into temporal restrictions since the scanning clock speed and response time of the processing circuits must scale with the size of the array.
The GIP consists of three main components: a 16 rows by 16 columns photo pixel array, 3 vertical and 3 horizontal scanning registers and a single processing unit. The processing unit is composed of four independent sub-processors, (see figure below). Each of the four sub processors can be independently programmed in parallel, allowing for different spatial and/or temporal convolutions to be performed on the incident image. The independent images can be combined, however, to realize complicated non-separable filters.
Block diagram of the chip.
The photocell is composed of photodiode, a photo current amplifier, a delay element and pixel selection switches, (see figure below). The photodiode is implemented as the source diffusion extension of an NFET load transistor. The photo-voltage at the source of the load transistor is transformed into a photo-current using a PFET. This photo-current amplifier magnifies the photocurrent by up to two orders of magnitude, depending on the gate bias to the load NFET. Four amplified photo-currents are produced and directed to the processing unit. Three of the photo-currents, Ix, Iy, and Iorg, are used for special processing, while the fourth copy is used for temporal processing. The latter is a low-pass filtered image, realized by a RC time constant within each pixel. The RC time constant is implemented as an externally controlled diffuser and the gate capacitance of the read-out transistor. This time constant can be varied up greater than 1 second, even when strongly illuminated.
The six scanning registers are used to select groups of pixels and direct their
photo-currents to the eight global current buses. Selecting the group of pixels
is accomplished into two phases. In the first phase, the top (left) scanning register
selects all the pixels in the given columns (rows) of interest, (see figure 1). The
photo-current values of these pixels are then summed horizontally (vertically), providing
the summed photo-current values on each of the 16 rows (columns). In the second phase,
the right (bottom) scanning registers select three of the previously activated 16 rows
(columns) and direct each one of them to a separate vertical (horizontal) bus. This phase
is achieved through a single analog multiplexer per row (column), where the control bits of
the multiplexer are specified by the two registers on the right (bottom). Since there is a
total of three global vertical and three global horizontal buses on the right and bottom of
the photo array, a total of six different groups of pixels are selected and passed to the
processing unit for further processing.
The bottom two registers and the two right registers are used to select one additional
group of pixels. The selection of this group of pixels is achieved in two steps. First,
the bits of the same section of the bottom (right) two registers are NANDed. Second, the
results from the previous two operations activate the PFET switches for the Idel and Iorg
currents in the desired pixels of interest. Hence, the selected group of pixels provide a
copy of the photo-current on the global original current bus, as well as a copy of the
delayed photo-current on the global delayed current bus, (see figure 1 and 2). These two
currents are also passed to the processing unit, where spatiotemporal processing is performed.
The processing unit is a digitally controlled analog processing unit, consisted of
four sub units. All four sub units are identical in structure and consist of a digital
control memory of 40 cells per sub unit and analog adders and multipliers. Each of the
eight input currents are first mirrored four times, and then passed to the sub-processors
for individual computation. The digital memory assigns a 5-bit signed-magnitude control
word per current, which specifies the kernel coefficient for this current. The coefficient
can vary between +/-3.75 in increments of 0.25. The appropriate weight factors will vary
depending on the given mask of interest. After each current is weighted by the appropriate
factor, all currents are summed together to produce the desired processed image.
In order to capitalize on the power of the parallel processing capabilities of the GIP, the pixel grouping should be kept to a minimum number of pixels. This will allow multiple recombination of the pixel groups in each of the sub unit with different weight factors, realizing different convolution kernels. Different grouping of pixels can also lead to similar results, although maximum parallelism may be limited Consider a 3x3 block with the scanning registers loaded with patterns as shown in figure 1. After loading the scanning registers with the appropriate bit patterns, the global current buses will carry the following groups of pixels' currents: Ix1=I(1,1)+I(1,3), Ix2=I(2,1)+I(2,3), Ix3=I(3,1)+I(3,3), Iy1=I(1,1)+ I(3,1), Iy2= I(1,2)+I(3,2), Iy3= I(1,3)+I(3,3), Iorg=I(2,2) and Idel=(2,2,t-1). Other combinations are possible, but the presented grouping will yield the maximum number of processed images. In addition, the current I(i,j) can also be a sum of multiple individual pixel currents when realizing larger kernels. Using these currents as the basis, various kernels can be constructed in the processing unit. Table I gives some example kernels.
The GIP was tested using the programmed bit pattern discussed in the previous section. The figure below shows examples of the chip s outputs. The first four images (1-4) are computed using 3x3 convolution kernels, while the last four images (5-8) are computed using 5x5 convolution kernels. The top row of images shows the vertical and horizontal edge detection images respectively, computed by the two kernel sizes. The bottom row of images shows the Laplacian and Gaussian images respectively. The vertical black line in image 1 and 6 is not
|Basic filtered iamges: 1. 3x3 vertical edge detection, 2. 3x3 horizontal edge detection, 3. Laplacian edge detection, 4. 3x3 Gaussian smoothing, 6. 5x5 vertical edge detection, 7. 5x5 horizontal edge detection, 8. Laplacian edge detection, 9. 5x5 Gaussian smoothing.|
visible in the horizontal edge images, 2 and 7. Both horizontal and vertical edges are visible in the Laplacian image, while the Gaussian image provides a smooth version of the image.
The figure below shows further examples of various non-separable filters. Image 1 used a spatial mask that only computes the diagonal edges of the image, while image 2.has computed a combination or horizontal and vertical edges, while suppressing the diagonal edges. Image 4 and 5 show +/-45 degree edge detection respectively, while image 6 shows a low pass filtered version of the image. The orientation selectivity of the 2D edge detectors is clearly visible in these figures, where horizontal edge image highlights horizontal edges, vertical edge image highlights vertical edges and +/- 45 edge images highlight the appropriate diagonal edges. Other non-separable filters can also be implemented with the given architecture of the GIP.
|Non separable filtered images. 1. Diagonal edge detection, 2. Horizontal and vertical edge detection, 3. High pass filtered image, 4. +45 edge detection, 5. 45 edge detection, 6. Low pass filtered image.|
The figure below illustrates the temporal processing capabilities of the GIP. For this experiment a vertical black line was moved back and forth, while the original intensity image, delayed image and the difference of both of these images, a temporal derivative, was recorded over several time intervals. At time T1, both the original and delayed image are the same, hence their difference is zero, i.e. a blank gray image is recorded. At time T2 the stimulus was moved to a new position. The change in the position is immediately registered by the original image, while the delayed image contains the previous position of the line in its memory. Hence, the difference in position is noticeable in the third image, where the white trailing line behind the black line indicated that the line has been moved to the right. Here, white is negative while black is positive derivative. After time T3, the memory of the delayed image has updated and the difference is zero again. At time T4 the stimulus is moved to its original position. The delayed image still contains the previous frame in its memory and a difference is recorded on the third image. The trailing white line behind the black line is on the other side as compared to T2, indicating that the line has moved in the opposite direction, i.e. to the left. After time T5 the memory is updated and the difference is zero again. From this experiment we can see the temporal capabilities of the GIP.
|Temporal processing as a vertical bar is moved back and forth.|
Table II shows the general characteristics of the chip. A much larger array is possible with no impact on the performance of the scanning circuit and processor unit. In a larger array, all the additional area will be used in the photo array since all the overhead for the scanning and processing circuits will be identical to that in the this chip.