WO1991010182A1

WO1991010182A1 - Generator of multiple uncorrelated noise sources

Info

Publication number: WO1991010182A1
Application number: PCT/US1990/004407
Authority: WO
Inventors: Joshua Alspector; Robert Ray Chu; Joel Wright Gannett; Stuart Alan Haber; Michael Benjamin Parker
Original assignee: Bell Communications Research, Inc.
Priority date: 1989-12-21
Filing date: 1990-08-07
Publication date: 1991-07-11

Abstract

Plural, arbitrarily-shifted, pseudo-random bits streams are generated from a single linear feedback shift register (LFSR) (201). Each bit stream is obtained by tapping the outputs of selected LFSR cells (202) and feeding these tapped cell outputs through a set of exclusive-OR gates (206). The taps are selected in order to achieve the desired shift between bit streams. In addition, the tap patterns can be selected so that the number of inputs (fan-in) to each bit stream are within predetermined bounds and that the number of taps per cell (cell load ) are within predetermined bounds. A disclosed computer program generates the tap patterns as a function of the number of cells and the structure of the LFSR, the number of output bit streams, the maximum allowed shift variation of the bit streams, and the bounds on fan-in and cell load. Each pseudo-random bit stream serves as an input to a low-pass filter which produces an essentially Gaussian noise output. The plural noise outputs are relatively uncorrelated and can be used in a parallel stochastic learning neural network for purposes such as annealing.

Description

GENERATOR OF MULTIPLE UNCORRELATED NOISE SOURCES

BACKGR OUND OF THE INVENTION

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

In a prior art neural network test chip, a stochastic learning technique with a local learning rule was implemented in VLSI, (see, for example, U .S. Patent No . 4,874,964, issued October 17, 1989 to J. Alspector and R . B. Allen; J. Alspector and R . B. Allen, "A neuromorphic vlsi learning system," in Advanced Research in VLSI: Proceedings of the 1987 Stanford Conference, P. Losleben, Ed. Cambridge, MA: MIT Press, pp. 313-349, 1987; J. Alspector, R . B . Allen, V . Hu, and S. Satyanarayanna, "Stochastic learning networks and their electronic implementation," Proceedings of the conference on Neural Information Processing Systems, Denver, CO , pp. 9-21, Nov. 1987, D . Anderson, Ed. New York, NY: Am . Inst. of Phys. , 1988; and J. Alspector, B . Gupta, and R . B . Allen, "Performance of a stochastic learning microchip" in Advances in Neural Information Processing Systems 1, Denver, CO , pp. 748-760, November 1988, D . S. Touretzky, Ed. San Mateo, CA: Morgan Kaufmann, 1989) . The Boltzman algorithm (D . H . Ackley, G . E . Hinton, and T. J. Sejnowski, "A learning algorithm for Boltzmann machines," Cognitive Science 9, pp. 147-169, 1985) depends on the stochastic settling of the neural system using the process of simulated annealing (S. Kirkpatrick, C. D . Gelatt, and M . P. Vecchi, "Optimization by simulated annealing," Science, 220, pp. 671-680, 1983) to avoid local minima in the energy function that describes its evolution. In the aforenoted prior art neural network prototype test chip, highly amplified Gaussian thermal noise generated by electrons in a transistor was used for annealing. Each neuron was fed by a separate thermal noise generator, so that its state would be unaffected by the noise seen by the others.

Neural learning algorithms such as this capture correlations seen by neural states to perform classification based on input data. For local learning rules, stochastic elements are necessary for, among other reasons, performing unbiased averaging over neural states elsewhere in the network. Correlations in the noise they see would cause errors in the learning since these undesired correlations would be captured by the learning rule. Other reasons for stochastic elements in neural networks include the search of a large solution space, helping a network settle while avoiding local minima, and interpolating between discrete values of weights by time averaging.

Although a thermal noise generator seems simple and unbiased it has implementation problems. In particular, it exacts a substantial area penalty; and, in fact, occupies much more area than the neuron itself. More significantly, the large gain needed to amplify thermal noise can lead to cross coupling of the on-chip amplifiers thereby frustrating the original purpose of using separate noise amplifiers to obtain zero cross correlation. Despite this, the small network on the prior art test chip demonstrated satisfactory learning for small problems. To scale this network to larger size, it would have to be sensitive to more subtle correlations and therefore the noise sources must show minimal correlation.

A linear feedback shift register (LFSR) produces a pseudo-random bit stream (PRBS) that can be used to make an analog noise source. The PRBS is processed by a low-pass filter with cutoff frequency just a few percent of the clock frequency. This has the effect of performing a time integration over many bits. If each bit's value is randomly distributed with a probability of 0.5 for 0 or 1, then the value of this integration follows a binomial distribution that approaches a Gaussian distribution for a large number of bits. This creates a Gaussian analog pseudo-random noise source whose statistical properties are similar to the thermal noise which is to be modeled with a simulated annealing technique. Variable amplifiers with gains low enough to avoid coupling problems are then sufficient to perform the annealing process. An ΛT-stage LFSR creates a PRBS of maximal length, 2^N — 1, when the feedback taps are chosen appropriately. One useful property of such a PRBS is that it has cross correlation — 1/ (2^ — 1)

(effectively negligible) with a time shifted version of itself, assuming the cross correlation is calculated after replacing each 1 of the binary bit stream with — 1 and each 0 with 1. (see, for example, S. W . Golomb , Shift Register Sequences, revised ed. Laguna Hills, CA : Aegean Park Press, 1982.) For neural network purposes, this time shift must be large enough for the network to settle sufficiently to "forget" the sequence during the anneal cycle before it sees another version of it later. In practice, this is obtained easily with relatively small shift registers because the length of the sequence grows exponentially with the shift register size.

This shifting could be accomplished by using a collection of identical LFSR s, one per neuron . Each would be loaded with a specified initial state to obtain a desired shift relative to the other LFSR s. All LFSRs would be clocked simultaneously. The overhead of such an approach, however, is unacceptable. For instance, a single 25-stage shift register (with a maximal period of 34 million clock cycles) would require approximately 400,000 square microns in 2 micron CMOS technology, which is considerably larger even than the thermal noise amplifier of the prior art implementation in the same technology.

Various techniques for generating plural PRBS have been reported. For example, P. D . Hortensius, R . D . McLeod, W. Pries, D . M . Miller, and H . C. Card, describe a "Cellular automata-based pseudorandom number generators for built-in self-test," in IEEE Trans.

Computer-Aided Design, vol. 8, no. 8, pp. 842-859, Aug. 1989. As disclosed therein, cellular automata are employed to generate pseudo-random bits in parallel. W. J. McFarland, K . H . Springer, and C .-S. Yen , describe a "1- gword/s pseudorandom word generator," in IEEE J. Solid-State Circuits, vol. 24, no. 3, pp. 747-751, June 1989. This pseudorandom word generator uses a feedback/feedforward technique with exclusive-OR gates at each shift register stage. This technique requires as least as many shift register stages as outputs. A wideband digital pseudo-Gaussian noise generator is disclosed in U .S. Patent No. 3,747,381 , issued June 26, 1973 to W . J. Hurd. This noise generator requires at least two feedback shift registers of relatively prime lengths. Disadvantageously, in all these prior art noise and/or PRBS generators, the number of cells required linearly increases with the number of required bit streams, P.

An object of the present invention is reduce to a minimum the hardware necessary to generate multiple pseudo-random noise sources required for annealing in neural networks. An additional object of the present invention is to amortize the space required for a single generator of plural noise sources amongst many neurons in a neural network so that an acceptable small area overhead for VLSI implementations results. SUMMAR Y OF THE INVENTION

In accordance with the present invention a single maximal length linear feedback shift register is used to generate multiple, arbitrarily-shifted, pseudo-random bit streams. Each bit stream is converted to an analog noise source by filtering. In particular, each bit stream is obtained by tapping the outputs of selected LFSR cells and feeding these tapped cell outputs through a parity tree consisting of exclusive-OR gates. In accordance with the invention, the particular cells of the LFSR tapped to form each bit stream are selected to meet certain constraints. In particular the taps are chosen so that: (1) the shift variation between bit streams is within a set limit; (2) each cell is tapped to provide an input to no fewer than and no greater than preset numbers of bit streams; and (3) each bit stream is formed from no fewer than and no greater than preset numbers of cell outputs.

An advantage of the present invention is that the number of cells needed to produce P bit streams grows as log( P ) rather than linearly with P.

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 is a schematic diagram of a conventional prior art linear feedback shift register used to make an analog noise source; and

FIG . 2 is a schematic diagram of a single linear feedback shift register used to generate multiple pseudo-random bit streams in accordance with the present invention. DETAILED DESCR IPTION

With reference to FIG. 1, the single /V-stage LFSR 101, also denoted Y in the equations derived hereinbelow, consists of N clocked D-type flip-flops 102-(N-1) - 102-0. The N stages, also called cells, are arrayed horizontally with the shift direction from left to right, i.e. , the input of every cell except the leftmost cell is connected directly to the output of the cell on its left. The cells are numbered consecutively from (_V - 1) to 0, with the (.V - l)th cell, 102-(N-1), on the left and the zeroth cell, 102-0, on the right. The signal fed to the D input of the (N - l)th (leftmost) cell, 102- (N-l) , is obtained from the feedback function H. This is the modulo 2 sum of the outputs belonging to a subset of the N cells, that is,

N-l

H i ∑ C& ( mod 2 ) ( 1) i=^*0

where -^ denotes "is defined to be equal to," z_{ is the output of cell i, and each feedback coefficient c_t is either 0 or 1. In the embodiment of FIG . 1, c₀ and c₃ equal 1 and the other _.,• equal 0. These are just chosen for illustration and in reality would be determined as a function of N and the primitive polynomial thereof, to be defined hereinbelow. Exclusive-OR gate 103 forms the modulo 2 sum of the two fed back outputs of cells 102-0 and 102-3. The output of gate 103 provides the D-input to cell 102-(N-1).

To shift register Y 101 means to apply one or more clock pulses simultaneously to the CK clock inputs cells of Y 101. The clock is not shown. The PRBS generated by Y is, by definition, the sequence of bits generated by the zeroth (rightmost) cell, one bit per clock cycle, as Y is shifted. The sequence of states that Y evolves through as it is shifted is determined by the initial state and the feedback function H . Thus, the PRBS for a given LFSR depends on its initial state. If Y sequences through all possible nonzero states whenever it starts in a nonzero initial state, Y is said to be maximal. Maximality occurs only for certain choices of the feedback coefficients c,-, namely, if the polynomial c(x), where c(x) is defined by the expression

is primitive in GF( 2^Λr) , where GF( 2 ) denotes the Galois field with 2^N elements. A PRBS generated by a maximal N-stage LFSR starting in a nonzero initial state is called an N-maximal PRBS. Some straightforward implications of maximality include (a) an N-maximal PRBS has period

2^ — 1 , and (b) every possible combination of N consecutive bits, except the all-zero combination, occurs somewhere in an N-maximal PRBS.

In the prior art analog noise generator in FIG . 1, the pseudo-random bit sequence is taken at the Q output of the rightmost cell, 102-0. This digital bit sequence on lead 104 is processed by a low pass filter having a cutoff frequency just a few percent of the clock frequency, and consisting of resistor 105 and capacitor 106.. An essentially Gaussian analog pseudo-random noise source is thus created at output 107.

With reference to FIG. 2, a single maximal length LFSR 201 is used to derive plural pseudo-random bit streams. As in the prior art described hereinabove, LFSR consists of Ν cells, 202-(Ν-l) - 202-0. Feedback is provided to the D input of cell 202-(N-l) as determined by a primitive polynomial of the N-stage register. As in the prior art structure, feedback is provided in this illustrative example from the Q outputs of the 0th and 3rd cells, 202-0 and 202-3, respectively, which are modulo 2 summed by exclusive-OR gate 203. As above, these particular cells are selected just for purposes of illustration.

It has been determined and mathematically proven by the inventors herein, that by tapping and modulo 2 combining the outputs of particularly selected cells of the maximal length LFSR , shifted versions of the basic bit pattern can be generated. By proper selection of the cells tapped, in fact, any of the 2^- 1 possible shifts can be generated. If the shifts are sufficiently far apart, each combination of cell outputs can serve as a separate source of noise that is essentially uncorrelated with the other sources generated from the same LFSR . It is thus necessary to know which cells to tap to generate the plural bit streams that are shifted sufficiently apart to ensure low correlation. As will be described, the cells tapped can be chosen such that in addition to meeting a shift constraint, other constraints can be met that affect the physicality of a VLSI implementation. Advantageously, the number of bit streams that can be generated from the single LFSR is not limited to the number of cells in the shift register.

In the purely illustrative example in FIG .2, P sources of random Gaussian noise are generated. As just noted, these noise sources are generated from P pseudo-random bit streams, which are shifted versions of each other, by modulo 2 combining the Q outputs of selected cells in the register. In the illustrative example of FIG. 2, the first bit stream on lead 205-1 is produced from the modulo 2 combination of the Q outputs of cells 202-0, 202-1, and 202-3 which are combined by exclusive-OR gates 206-1,1 and 206-1,2. The bit stream on lead 205-1 is low-pass filtered by the RC filter, consisting of resistor 207-1 and capacitor 208-1, to produce the random noise source on lead 209-1. The other noise sources on leads 205-j, for 2 ≤ j ≤ P, are similarly produced by modulo 2 combining, through exclusive-OR gates 206-j, 1 and 206-j,2, the outputs of selected of the cells. The resultant bit stream is then filtered through a low-pass filter consisting of resistor 207-j and capacitor 208-j, to produce random noise at output

209-j. In this illustrative example, each output bit stream is generated from three cell outputs. As will be noted hereinafter, the minimum and maximum number of cells needed to be tapped to form any of the bit streams from the LFSR , called the minimum and maximum allowed fan-in, respectively, is a factor that can be controlled in selecting the tap patterns. Also , the minimum and maximum number of taps on any one cell in the LFSR , called the minimum and maximum allowed cell load, respectively, is controllable.

In what follows, an algorithm for determining the taps will be provided. First, however, a mathematical foundation will be presented for the technique of the present invention. Two lemmas that are keys to the technique of the present invention for generating multiple bit streams from a single LFSR will be proven. The first lemma loosely says that the bit stream obtained from the modulo 2 combination of the outputs of the cells of a maximal LFSR gives a shifted version of the basic LFSR bit stream. The second lemma says that any desired shift can be obtained by appropriate choice of the taps.

Preceding the rigorous mathematical foundation, let A₀ denote an N-maximal PRBS. For each nonπegative integer k, let a_k ζ {0 , 1 } denote the value of the sequence A Q at clock cycle k. This is indicated with the notation A₀ = {a₀ , aι , a₂ , ' ' ' }• For every positive integer m, define A_m ^{a_m , a_m+1 , a_m+₂ , ^{• • ■} }. Note that A_m is obtained from A₀ by shifting forward in time by m clock cycles. Finally, for a given nonzero initial state of Y, let S denote the set containing the all-zero sequence along with the shifted sequences A_m, where O ≤ m ≤ 2^N — 2. Lemma 1, which says, in general terms, that the bitwise exclusive-OR of two shifted versions of a given N-maximal PRBS generates another shifted version of the same PRBS, can now be stated:

Lemma 1: Let m and n be nonnegative integers. Let B — { bo » bi » b₂ » ' ' ' } denote the sequence obtained by a bitwise exclusive-OR of A_m € S and A„ € S, that is, b₍ ^a_{m ÷i} + a_{n +i} ( mod 2 ) . Then B € S.

Proof: A Q is generated by a recursion relation of the form

N-l ^ak÷N = ∑ c.α*+. ( mod 2 ) (3) ι=0

where the feedback coefficients c are either 0 or 1. Clearly, A_m and A„ also satisfy this recursion relation. Since Eq. (3) is linear, B (which equals the bitwise modulo 2 sum of A_m and A„) satisfies Eq. (3) as well. Thus, the entire sequence B is determined by its first N bits. Suppose m and n are equal modulo 2^N — 1. Then A_m and A_n are identical sequences; thus, B is the all-zero sequence and is therefore a member of S. Now suppose m and n are unequal modulo 2^ — 1. Then A_m and A„ are not identical sequences and B is not the all-zero sequence. In particular, the first N bits of B cannot all be zero (otherwise, Eq. (3) would imply that B is the all-zero sequence) . Since A o is an N-maximal PRBS, all possible combinations of N consecutive bits except the all-zero combination must occur in A Q. Thus, there must be some nonnegative r such that the first N bits of A_r equal the first N bits of B. Since A Q is periodic with period 2^Λr — 1 , there is no loss of generality in assuming r < 2^N - 1. Thus B = A_r € S. Q.E.D.

Lemma 1 is a special case of the more general Abelian group property of S under bitwise modulo 2 addition. A pair of taps from an LFSR gives two particular shifted sequences from a restricted set. Their exclusive-OR gives a third sequence by Lemma 1. This sequence, in turn, can be exclusive-OR 'ed with another tap to give still other shifted versions of the main sequence. Lemma 1 thus implies that given a maximal LFSR generating a PRBS, the outputs of a collection of cells of this LFSR can be tapped and the mod 2 sum of these outputs taken to obtain a shifted version of the PRBS. The question arises whether any specified shift can be obtained by appropriate choice of the taps. Lemma 2 hereinbelow answers this question in the affirmative.

Lemma 2: Let Y denote an N-stage maximal LFSR that is initialized to a nonzero state, and let z\ , 0 ≤ i ≤ N — 1, k ≥ 0, denote the output of cell i of Y at clock cycle k. For a collection of coefficients d[ € {0 , 1}, 0 -≤ i ≤ N — 1, define a sequence G ^{go , g_\ , g , ' ' ^• } such that

ΛΓ-I g_k ± ∑ d_tz^k _t ( mod 2 ) (4) t=0

Then for every A_r € S, there exists a collection of coefficients d_t such that G — A_r.

Note: In what follows, the coefficients d_t are called the tap coefficients. Ε^N denotes the set of N-dimensional vectors with components 0 and 1. F can be identified with GF( 2 ).

Proof: From Lemma 1, G € S. Each collection of tap coefficients d = [d₀ , d_{1 }} ^{■ • ■} , -__//_ι ]^τ (T denotes transpose) is identified with a member of F^N. Consider the function Q : F^N→S that maps (according to Eq. (4)) each tap coefficient vector d € F^ to its corresponding sequence G € S. It will be shown that Q is injective. Since Q is a linear map, it is injective if and only if it maps all nonzero points in its domain to nonzero points in its range. Let d" be a nonzero point of F^. Then there exists an m such that the with element of d^* is not zero, that is, d_m" = 1. Since the shift register Y is maximal, there exists a clock cycle k such that z„ = 1 and z^k = 0 for all i ≠ m. By Eq. (4) , the bit value of G ^{* *}^Q( d^* ) at clock cycle k is d_m = 1. Thus, G is not the all-zero sequence. Note that ¥^N has 2^N elements and S has 2^N sequences. Since the function Q is injective, it follows that Q is surjective because its domain and range have a finite and equal number of elements. Q.E.D.

It is thus proven that for a maximal LFSR , the 2^ — 1 nonzero tap patterns map uniquely to the 2^ — 1 possible shift values ( 0 , 1 , ^{• • •} , 2^N — 2). Therefore, any shift is possible if the right tap pattern is found, and each tap pattern can be identified with a unique shift. These two viewpoints form the basis for the practical problem to be solved; namely, that of generating properly shifted versions of the original bit stream in a hardware-efficient manner.

The constraints due to VLSI implementation of a neural net model are first described:

1. The bit streams should be shifted far enough apart so that the network can settle without seeing a shifted version of a noise source in two places. This implies close to equal spacing of the bit streams. In practice, this constraint can be relaxed considerably or eased by simply increasing the shift register size.

2. For performance reasons, the fan-out per cell is limited; that is, loading any flip-flop in the register more than is necessary should be avoided.

3. As few inputs as possible to each set of exclusive-OR gates associated with a bit stream is desired. This reduces silicon area and improves performance. In fact, layout simplicity may require an equal number for all sets.

To formulate a precise problem statement, again let Y denote a maximal N-stage LFSR , and let p ^2^N — 1 denote the period of the PRBS A Q generated by Y for some specified nonzero initial state. Let L be a nonnegative integer, let L , F_, F , and P be positive integers, and let r be a real number such that 0 ≤ r < 1. Let dø , dj , ^{• • •} , dp_j ζ F^ denote a collection of P tap coefficient vectors to be determined, and let G.- ζ. S denote the sequence corresponding to d_f, as in the proof of Lemma 2. Let s_{ denote the shift of G,- relative to A _Q, where O ≤ s_t < p . Without loss of generality, assume _?,- ≤ s_i+ι for all i < P — 1. Define the shift differences t Q ≤ i ≤ P - 1, as follows:

(•Si+i ^{- ■}*«» if O -≤ i ≤ P - 2 h ± p + SQ — _- _-ι, if i = P — 1 (5)

Let u ( R^f denote the P -vector whose ith component is u_{ = \t_t/{p IP ) — 1 This is the normalized version of the shift difference. Two vectors, 1 ζ IL^N and f € R^, both of which have integer-valued components, are associated with a given collection of tap coefficient vectors d₀ , di , ^{• • •} , d _j ζ F^N. The component ,- of 1 is the number of taps connected to cell i of Y. The component f_t of f is the number of Is in (i.e. , the number of cell taps represented by) the tap vector d_f. Let C : R^P X R^P X R^ΛΓ *_* [0 , ∞ ) denote a cost function. C( u , f , 1 ) is the cost associated with a collection of tap coefficient vectors dg , -^ , ^{• • •} , d_P- ζ. F^N. With these definitions, the problem can be stated precisely. The implementation constraints noted above can be restated in mathematical terms as follows:

Problem Statement: A collection of tap coefficient vectors do , dj , ^{• • •} , dp_j € F^ needs to be found that minimizes the cost C( u , f , 1 ) subject to the following conditions:

1. Uι = \t_t/ p IP ) — 1 I ≤ r for all i. The parameter r is the maximum allowed shift variation.

2. No cell of Y has fewer than L_ taps or more than L taps ( ≤ l_t ≤ L for

0 ≤ i ≤ N — 1) . The integer L (resp. , ) is the aforenoted minimum (resp. , maximum ) allowed cell load.

3. No tap coefficient vector d,- has fewer than F components equal to 1 or more than F components equal to 1 (F ≤ f ≤ F for 0 < i ≤ P — 1) .

The integer F (resp., F) is the aforenoted minimum (resp., maximum ) allowed fan -in. Note: if an N X P matrix is formed such that column i equals vector d,-, then condition 2 says that no row has fewer than Is or more than L Is, and condition 3 says that no column has fewer than F Is or more than F Is.

The cost function C is chosen so that minimizing it tends to minimize the components of 1, f, and u. Minimizing the components of 1 alleviates the speed degradation caused by capacitive loading on the cells of Y. Minimizing f minimizes the fan-in (number of inputs) of the exclusive-OR gates whose outputs form the bit streams.

Clearly, 1 and f are strongly correlated (minimizing the components of one tends to minimize those of the other) . Minimizing the components of u tends to keep the bit streams uniformly separated in time. The exact form of the cost function C depends on the relative importance of minimizing these various quantities in a particular application.

If the loads on the cells of the shift register or the fan- ins of the exclusive-OR gates are of no concern, then the cost function C does not depend on 1 or f; moreover, and F are small enough and L and F are large enough so that conditions 2 and 3 are satisfied trivially. The problem then reduces to generating P bit streams with specified, exact time separations. This problem has a simple analytical solution. To see this, first note that the evolution of the shift register's state is governed by the following equation:

(6)

where the state transition matrix M is defined as follows:

M ⁰_v-ι I(_v-i)x(_v-i)

C Q C ι ^{• • ■} C_{N →} (7)

Here I₍t_f-i₎χ₍w-i₎ denotes the (N — l) x (N — 1) identity matrix, 0_#_ι denotes the (N — l)-component all-zero column vector, and the c_t are the feedback coefficients from Eq. (1) .

Lemma 3 hereinbelow says that the taps for a given shift t are obtained explicitly by merely calculating the matrix M' and inspecting its first row.

Lemma 3: Let Y denote a maximal LFSR initialized in a nonzero state and with state transition matrix M. Let t be a nonnegative integer. Then the tap coefficient vector d for Y that gives a shift forward in time by t clock cycles is the transpose of the first row of the matrix M'. Proof: Let z^k denote the vector with components z

(cf. Eq. (6)) . Let βø *%. F^N denote the column vector with 1 as its zeroth component and 0s for the remaining N — 1 components. Then the value of the PRBS generated by Y at clock cycle k is

For any tap coefficient vector d, the output generated at clock cycle k is d^τ z* = d^TM*z°. If d^τ = ej M^f is chosen, then the output at clock cycle k is e<f M^f M*z⁰ = ej M*⁺' z°. But this is the same as a_k+t, by Eq. (8) . Q.E.D. Lemma 3 provides a solution when the loads or fan-ins are of no concern. Note that M^f can be calculated in log ( CPU time. One can calculate a table containing the matrix powers

M° , M¹ , M² , M⁴ , M⁸ , ^{• • •} , M²^ . Then the binary representation of t can be used to choose the powers of M to multiply together to calculate M^r.

The previous special case showed that it is easy to calculate the taps necessary to obtain exact shifts when the load or fan-in are not a concern. When they are a consideration, the shifts must be allowed to vary from their nominal value (i.e., select a nonzero value for r) and a heuristic technique must be used to find a "good" set of taps. Since a fairly wide variance in the shift values can be allowed for this noise-generating application, solution candidates are abundant and a large state space may be searched to find a solution with acceptably low fan-in and cell load.

The software solution implemented for this problem can be described as follows. First, consider the set of tap patterns with K taps for an N-cell shift register. The number of such patterns is

W = κ\{N - κy. ⁽⁹⁾

The set of essential ΛT-tap patterns is defined to be the smallest subset from which all _?-tap patterns can be obtained by right-shifting a pattern from this subset by zero or more positions. When right-shifting a pattern, zeros are padded on the left. The set of essential patterns has only I _ ₁ j members, or K/N times the number of total patterns. For example, the number of 2- tap patterns for a 10-cell register is [ ^ j = 45, while there are only I _, j = 9 essential patterns, viz. ,

1100000000 1010000000 1001000000 1000100000 1000010000 1000001000 1000000100 1000000010 1000000001

Note that once the bit stream shifts for the essential __T-tap patterns are found, the bit stream shifts for all other K-tnτp patterns can be found trivially. For example, if the shift of 1010000000 is q, then the shift of 0001010000 is q - 3 because the latter pattern is obtained from the former by right-shifting three bit positions.

Let X denote the collection of all essential tap coefficient vectors d 6 F^ with at least F Is but not more than F Is. The number of elements in X, \\X\\, is

iixii = ^'if Cf-.¹) (10)

(This is a polynomial in N of order F — 1.) For each d € X, a record is stored in main memory that contains a representation of d along with the shift of its corresponding sequence (see hereinbelow regarding the calculation of this shift) . Note that memory usage is greatly reduced by including only the essential tap patterns in the set X. Simulated annealing, or any desired random or deterministic technique, is used to search X to find a collection of tap coefficient vectors that minimizes the cost function and satisfies conditions 1 and 2 (condition 3 is satisfied by construction) . If a solution exists, this method will find it given enough CPU time.

In practice, it was discovered that even the set of tap coefficient vectors with only two Is produces shifts that are fairly well distributed throughout the interval [ 0 , 2^ — 2 ] . Thus, the procedure is normally tried first with X containing just the tap coefficient vectors with F Is. The members of X are bucket-sorted according to the nominal shift value to which they are closest. If all the buckets contain at least one tap, a solution is sought. If no satisfactory solution can be found, then the tap coefficient vectors with F + 1 Is are added to X and bucket-sorted, and the best solution is sought again. This process (of adding a new set of tap coefficient vectors to X then searching X for the best solution) is continued, if necessary, until the tap coefficient vectors with F Is have been added to X.

Finding the shift associated with each d € X can take significant CPU time. One straightforward way to do this is a method called simple shifting. Here an efficient representation Y of the shift register is implemented using the word operations of the host computer. For a given nonzero initial state z° of the shift register, the first N bits of the sequence G corresponding to a given tap coefficient vector d can be calculated easily using Y. Let g € F^ denote the first N bits of G. Note that g represents the state of Y at the clock cycle that equals the shift of G. Thus, starting at the given initial state z°, Y is shifted one clock cycle at a time until its state is found to equal g . The clock cycle where this equality occurs is the shift associated with d.

The simple shifting method uses 0( 1 ) (i.e. , constant) memory and 0( 2^N) CPU time. It can exact a large time penalty for practical problems. For example, a maximal 25-stage shift register has a sequence length of 34 million clock cycles. Thus, it would be expected that it would be necessary to shift Y an average of 17 million times for each d € X. In practice, however, it has been found that simple shifting is too slow for problems of "practical" size, i.e., when the shift register has more than about 20 cells.

Faster calculation at the expense of increased main memory usage can be obtained with a variant of what is known as Shanks' giant step/baby step method, (see, for example, D . E . Knuth, The Art of Computer Programming, Vol. Ill: Sorting and Searching. Reading, MA: Addison-Wesley, 1973, p. 9.) Here are stored a collection of bit patterns representing the states of the shift register at uniformly-spaced clock cycle intervals. Then given a tap coefficient vector d, the associated shift register state g is calculated, as was done for the simple shifting method. The shift register representation Y is started in the state g . It is then shifted one clock cycle at a time until its state is found to equal one of the bit patterns stored in the table. The shift associated with d is then the shift of the table bit pattern less the number of shifts needed to bring Y to that state.

In more detail, this method proceeds as follows. First, a "reasonable" giant step size h is chosen. As will be noted, a small h implies a fast calculation of the shift for each tap pattern, but the cost in memory usage and table setup time grows as h becomes smaller. Therefore, a compromise value of h must be chosen. For the example of a 25-stage shift register, h — 5000 might be chosen. Next, for all integers i such that i ≥ 0 and ih ≤ 2^N — 2, a hash table is filled with records, each containing the integer ih and the bit pattern M^az° for some specified nonzero initial state z°. For the example, this means that v ^[( 2^N - 2) /h J + 1 = 6711 bit patterns are calculated and installed in the hash table, where [x\ denotes the greatest integer not greater than x. If E = M^A is initially calculated, then the hash table building takes v matrix multiplications. Once w = M'^Az° has been calculated for some value of i, M^^{l +1}^^hz° is simply Ew.

As in the case of the simple shifting method, let Y denote an efficient representation of the shift register, let d denote a tap pattern, and let g^* denote the first N bits of the bit stream corresponding to d when the initial state of Y is z°. To find the shift t associated with d, Y is initialized to the state g^*. Also, counter r is initialized to zero. Then t is found as follows:

1. Lookup the bit pattern that represents the state of Y in the hash table.

If this bit pattern, which equals M^ttz° for some i, is in the hash table, set t = ih — r and exit from loop; otherwise, go to step 2.

2. Shift Y by one clock cycle and increment counter r by 1. Go to step 1. Note that the loop will never be executed more than h times, and, on average, it is executed Λ /2 times for each calculation of /. That is, the time complexity of each t calculation is O( h ) . This results in a significant savings in CPU time for each t calculation relative to the simple shift method. Since v ~ 2^N Ih bit patterns must be stored in the hash table, the memory complexity in terms of N and h is 0( 2^NIh ) . The time required to calculate the v bit patterns in the hash table is also proportional to v and is therefore 0( 2^N h ) . Clearly, the value of h must be chosen based on N and the number of t calculations to minimize the total time (setup time plus t calculation time) while keeping the memory usage within reasonable limits.

The tap-calculating procedure described above has been implemented in the C programming language ( B . W. Kernighan and D . M . Ritchie, The C Programming Language, Prentice-Hall, Inc. , 1978) . For the shift registers of interest (i.e. those having fewer than 30 stages) , it was found that the giant step/baby step algorithm was adequate for tap shift calculations. Even after the floating-point intensive code for tap cost calculation was optimized for efficient execution on a vector processor machine, it was found that the CPU time bottleneck was the solution search (optimization) step, not the tap shift calculation. For shift register larger than 30 stages, other algorithms may be needed for tap shift calculation.

A listing of the program appears in APPENDIX A . The user inputs the number of cells in the register, the feedback pattern, and the number of bit streams required. Also input is the maximum and minimum allowable loading on a cell, the maximum and minimum allowed fan-in, and the maximum allowed shift variation. In minimizing the cost function C, weighting factors are assigned to the u, 1, and f components, which are also specified by the user. In addition the user specifies the coefficients of a penalty function used when a potential solution falls outside the specified ranges.

The program was used to derive the tap patterns for 32 bit streams as generated from a 25-stage shift register. Since a 25-stage shift register produces a PRBS of maximal length 33,554,432 C 2²⁵ - 1 ) , the time separation between bit streams is approximately one million clock cycles. The solution is shown in TABLE I. This solution search was run with the maximum and minimum fan-in set equal to three (F_ = F = 3) . The minimum cell load ( ) was set at three and the maximum cell load ( ) was set at four. The maximum allowed shift variation was 0.4. The resulting solution had an average load per shift register cell of 3.8, with four cells having three connections and 21 cells having four connections. The actual maximum shift variation for this solution (maximum u_f from condition 1 of problem statement hereinabove) was 0.32. Each tap pattern line in TABLE I indicates the cells to be tapped to produce the bit stream having the particular shift. As an example, the first bit stream is generated from the modulo 2 combination from the outputs of the cells 9, 13, and 21, the cells being numbered from 0 to 24 from right to left, as hereinabove.

TABLE I Tap Pattern Solution number of bit streams: 32 feedback cells: 0 and 3 feedback pattern: 0000000000000000000001001 sequence length: 33,554,431 fan-in = 3, all bit streams maximum number of taps on a cell: 4 minimum number of taps on a cell: 3 average number of taps per cell: 3.8 maximum shiftvariation: 0.32

Tap Pattern Shift

0001000000010001000000000 813 0000100000000001000100000 977154 0000000100000000101000000 1918423 1000000001000000001000000 3065848 0000000000100000110000000 4202921 0001010000000010000000000 5199153 0000000000100000010000001 6561452 1000000000000001000000001 7319152 0000001000000000001010000 8405832 0000000000110000010000000 9153942 0000000001001000000100000 10119558 0000001000000000000000110 11177548 0100000010000000001000000 12240651 0000000110000100000000000 13588669 0000000000000010000000110 14519726 0010000000010000000001000 15641663 0000010000000000100100000 16777233 1100000000000000000001000 17769307 0001010000000000000000100 18739334 1000000010000000000000001 19774592 0000101000001000000000000 20796598 0000100100000000000000010 22115686 0000000100000000100000010 23150710 0000000001001000000010000 24450889 0000001000000100010000000 25165828 0000000000001100000000100 26245895 0010100000000010000000000 27177326 0100000000010001000000000 28164123 0010000000000010000100000 29180947 0010000010000100000000000 29900123 0100000001000000000001000 31268791 0001010000000000000010000 32533378

4444444444344444444433443 loading pattern By using the techniques of the present invention in a CMOS implementation of a neural network to generate plural uncorrelated analog noise sources for annealing, the substantial cost in silicon area of the LFSR can be amortized over many neurons while the incremental cost per neuron is limited to some simple combinatorial logic. The function of the low-pass filter could also be served by the frequency response of the neuron itself, thereby saving the area cost associated with the filter. In addition to the area advantage, a single LFSR avoids the control and synchronization problems of multiple LFSRs. In the example of TABLE I, the maximal length sequence becomes 32 separate bit streams with an average separation of about one million clock cycles. By clocking the LFSR at 100 MHz, each bit stream is separated from repetition by its nearest neighbor by approximately 10 milliseconds. Each bit stream is low pass filtered to about 5 MHz. An anneal cycle of 10 microseconds would therefore have about 50 analog zero crossings. In a network containing 32 neurons, each neuron would see noise that would not be repeated anywhere else in the network for about 1000 anneal cycles which is a substantially greater separation than is required. This same LFSR could conceivably be used for 1000 times as many neurons. For neural network applications, therefore, shift spacing is less important for design than fan-in or fan-out considerations.

The hardware advantage of the present invention is particularly important when such large numbers of bit streams need to be generated. In the present invention, for a given relative shift spacing between the bit streams, the number of cells in the shift register grows as log(P ), where P is the number of bit streams. In contrast, the hardware requirement for prior art methods grows directly with P. In future generations of neural network chips, it is envisioned that hundreds and perhaps thousands of bit streams will be required. Accordingly, the hardware advantage of the present invention will be significant.

Although described in connection with providing noise sources for stochastic neural networks, the present invention has other applications. For example, bit-error rate testers use a pseudo-random bit stream to test communication systems at high speed. The speed is limited by the rate at which the shift register can be clocked. By providing multiple uncorrelated noise sources and then multiplexing them, a new pseudo¬ random bit stream at higher speed can be provided because a multiplexer can operate faster than a shift register in a given technology. Alternatively, the bit-error rate tester can provide multiple outputs for parallel testing, which is generally not available in currently available equipment .

The above-described embodiment is illustrative of the principles of the present invention. Other embodiments could be devised by those skilled in the art without departing from the spirit and scope of the present invention.

Claims

What is claimed is:

1. A generator of plural pseudo-random bit streams comprising a single maximal length linear feedback shift register having a plurality of cells; for each bit stream, means for modulo 2 combining the tapped outputs of selected ones of said cells to produce the bit stream, the cell outputs selected to be tapped and combined being determined so that the bit streams are separated by predetermined shifts.

2. A generator of plural pseudo-random bit streams in accordance with claim 1 further comprising means for low-pass filtering each of the plural bit streams to produce plural sources of essentially Gaussian noise.

3. A generator of plural pseudo-random bit streams comprising a single maximal length linear feedback shift register having a plurality of cells; for each bit stream, means for modulo 2 combining the tapped outputs of selected ones of said cells to produce the bit stream, the cell outputs selected to be tapped and combined being determined so that the maximum allowed shift variation between bits streams, the maximum and minimum allowed fan-in, and the maximum and minimum cell load are within predetermined limits.

4. A generator of plural pseudo-random bit streams in accordance with claim 3 further comprising means for low-pass filtering each of the plural bit streams to produce plural sources of essentially Gaussian noise.

5. A stochastic element for a neural network comprising a single maximal length linear feedback shift register having a plurality of cells; means for producing plural pseudo-random bit streams from said single shift register by modulo 2 combining for each bit stream the tapped outputs of selected ones of said cells, the cell outputs selected to be tapped and combined being determined so that the bit streams are separated by predetermined shifts.

6. A stochastic element for a neural network in accordance with claim 5 further comprising means for low-pass filtering each of the plural bit streams to produce plural sources of essentially Gaussian noise.

7. A stochastic element for a neural network comprising a single maximal length linear feedback shift register having a plurality of cells; means for producing plural pseudo-random bit streams from said single shift register by modulo 2 combining for each bit stream the tapped outputs of selected ones of said cells, the cell outputs selected to be tapped and combined being determined so that the maximum allowed shift variation between bits streams, the maximum and minimum allowed fan-in, and the maximum and minimum cell load are within predetermined limits.

8. A stochastic element for a neural network in accordance with claim 7 further comprising means for low-pass filtering each of the plural bit streams to produce plural sources of essentially Gaussian noise.