WO1991010182A1 - Generator of multiple uncorrelated noise sources - Google Patents

Generator of multiple uncorrelated noise sources Download PDF

Info

Publication number
WO1991010182A1
WO1991010182A1 PCT/US1990/004407 US9004407W WO9110182A1 WO 1991010182 A1 WO1991010182 A1 WO 1991010182A1 US 9004407 W US9004407 W US 9004407W WO 9110182 A1 WO9110182 A1 WO 9110182A1
Authority
WO
WIPO (PCT)
Prior art keywords
cells
bit streams
plural
outputs
cell
Prior art date
Application number
PCT/US1990/004407
Other languages
French (fr)
Inventor
Joshua Alspector
Robert Ray Chu
Joel Wright Gannett
Stuart Alan Haber
Michael Benjamin Parker
Original Assignee
Bell Communications Research, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bell Communications Research, Inc. filed Critical Bell Communications Research, Inc.
Publication of WO1991010182A1 publication Critical patent/WO1991010182A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/58Random or pseudo-random number generators
    • G06F7/582Pseudo-random number generators
    • G06F7/584Pseudo-random number generators using finite field arithmetic, e.g. using a linear feedback shift register
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/58Indexing scheme relating to groups G06F7/58 - G06F7/588
    • G06F2207/581Generating an LFSR sequence, e.g. an m-sequence; sequence may be generated without LFSR, e.g. using Galois Field arithmetic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/58Indexing scheme relating to groups G06F7/58 - G06F7/588
    • G06F2207/583Serial finite field implementation, i.e. serial implementation of finite field arithmetic, generating one new bit or trit per step, e.g. using an LFSR or several independent LFSRs; also includes PRNGs with parallel operation between LFSR and outputs

Definitions

  • Neural learning algorithms such as this capture correlations seen by neural states to perform classification based on input data.
  • stochastic elements are necessary for, among other reasons, performing unbiased averaging over neural states elsewhere in the network. Correlations in the noise they see would cause errors in the learning since these undesired correlations would be captured by the learning rule.
  • Other reasons for stochastic elements in neural networks include the search of a large solution space, helping a network settle while avoiding local minima, and interpolating between discrete values of weights by time averaging.
  • thermal noise generator seems simple and unbiased it has implementation problems. In particular, it exacts a substantial area penalty; and, in fact, occupies much more area than the neuron itself. More significantly, the large gain needed to amplify thermal noise can lead to cross coupling of the on-chip amplifiers thereby frustrating the original purpose of using separate noise amplifiers to obtain zero cross correlation. Despite this, the small network on the prior art test chip demonstrated satisfactory learning for small problems. To scale this network to larger size, it would have to be sensitive to more subtle correlations and therefore the noise sources must show minimal correlation.
  • a linear feedback shift register produces a pseudo-random bit stream (PRBS) that can be used to make an analog noise source.
  • PRBS pseudo-random bit stream
  • the PRBS is processed by a low-pass filter with cutoff frequency just a few percent of the clock frequency. This has the effect of performing a time integration over many bits. If each bit's value is randomly distributed with a probability of 0.5 for 0 or 1, then the value of this integration follows a binomial distribution that approaches a Gaussian distribution for a large number of bits. This creates a Gaussian analog pseudo-random noise source whose statistical properties are similar to the thermal noise which is to be modeled with a simulated annealing technique. Variable amplifiers with gains low enough to avoid coupling problems are then sufficient to perform the annealing process.
  • An ⁇ T-stage LFSR creates a PRBS of maximal length, 2 N — 1, when the feedback taps are chosen appropriately.
  • One useful property of such a PRBS is that it has cross correlation — 1/
  • This shifting could be accomplished by using a collection of identical LFSR s, one per neuron . Each would be loaded with a specified initial state to obtain a desired shift relative to the other LFSR s. All LFSRs would be clocked simultaneously. The overhead of such an approach, however, is unacceptable. For instance, a single 25-stage shift register (with a maximal period of 34 million clock cycles) would require approximately 400,000 square microns in 2 micron CMOS technology, which is considerably larger even than the thermal noise amplifier of the prior art implementation in the same technology.
  • An object of the present invention is reduce to a minimum the hardware necessary to generate multiple pseudo-random noise sources required for annealing in neural networks.
  • An additional object of the present invention is to amortize the space required for a single generator of plural noise sources amongst many neurons in a neural network so that an acceptable small area overhead for VLSI implementations results.
  • a single maximal length linear feedback shift register is used to generate multiple, arbitrarily-shifted, pseudo-random bit streams.
  • Each bit stream is converted to an analog noise source by filtering.
  • each bit stream is obtained by tapping the outputs of selected LFSR cells and feeding these tapped cell outputs through a parity tree consisting of exclusive-OR gates.
  • the particular cells of the LFSR tapped to form each bit stream are selected to meet certain constraints.
  • the taps are chosen so that: (1) the shift variation between bit streams is within a set limit; (2) each cell is tapped to provide an input to no fewer than and no greater than preset numbers of bit streams; and (3) each bit stream is formed from no fewer than and no greater than preset numbers of cell outputs.
  • An advantage of the present invention is that the number of cells needed to produce P bit streams grows as log( P ) rather than linearly with P.
  • FIG. 1 is a schematic diagram of a conventional prior art linear feedback shift register used to make an analog noise source
  • FIG . 2 is a schematic diagram of a single linear feedback shift register used to generate multiple pseudo-random bit streams in accordance with the present invention.
  • the single /V-stage LFSR 101 also denoted Y in the equations derived hereinbelow, consists of N clocked D-type flip-flops 102-(N-1) - 102-0.
  • the N stages also called cells, are arrayed horizontally with the shift direction from left to right, i.e. , the input of every cell except the leftmost cell is connected directly to the output of the cell on its left.
  • the cells are numbered consecutively from (_V - 1) to 0, with the (.V - l)th cell, 102-(N-1), on the left and the zeroth cell, 102-0, on the right.
  • the signal fed to the D input of the (N - l)th (leftmost) cell, 102- (N-l) , is obtained from the feedback function H.
  • This is the modulo 2 sum of the outputs belonging to a subset of the N cells, that is,
  • Exclusive-OR gate 103 forms the modulo 2 sum of the two fed back outputs of cells 102-0 and 102-3. The output of gate 103 provides the D-input to cell 102-(N-1).
  • To shift register Y 101 means to apply one or more clock pulses simultaneously to the CK clock inputs cells of Y 101.
  • the clock is not shown.
  • the PRBS generated by Y is, by definition, the sequence of bits generated by the zeroth (rightmost) cell, one bit per clock cycle, as Y is shifted.
  • the sequence of states that Y evolves through as it is shifted is determined by the initial state and the feedback function H .
  • the PRBS for a given LFSR depends on its initial state. If Y sequences through all possible nonzero states whenever it starts in a nonzero initial state, Y is said to be maximal. Maximality occurs only for certain choices of the feedback coefficients c,-, namely, if the polynomial c(x), where c(x) is defined by the expression
  • N-maximal PRBS A PRBS generated by a maximal N-stage LFSR starting in a nonzero initial state is called an N-maximal PRBS.
  • the pseudo-random bit sequence is taken at the Q output of the rightmost cell, 102-0.
  • This digital bit sequence on lead 104 is processed by a low pass filter having a cutoff frequency just a few percent of the clock frequency, and consisting of resistor 105 and capacitor 106..
  • An essentially Gaussian analog pseudo-random noise source is thus created at output 107.
  • LFSR consists of ⁇ cells, 202-( ⁇ -l) - 202-0.
  • Feedback is provided to the D input of cell 202-(N-l) as determined by a primitive polynomial of the N-stage register.
  • feedback is provided in this illustrative example from the Q outputs of the 0th and 3rd cells, 202-0 and 202-3, respectively, which are modulo 2 summed by exclusive-OR gate 203.
  • these particular cells are selected just for purposes of illustration.
  • the cells tapped can be chosen such that in addition to meeting a shift constraint, other constraints can be met that affect the physicality of a VLSI implementation.
  • the number of bit streams that can be generated from the single LFSR is not limited to the number of cells in the shift register.
  • P sources of random Gaussian noise are generated.
  • these noise sources are generated from P pseudo-random bit streams, which are shifted versions of each other, by modulo 2 combining the Q outputs of selected cells in the register.
  • the first bit stream on lead 205-1 is produced from the modulo 2 combination of the Q outputs of cells 202-0, 202-1, and 202-3 which are combined by exclusive-OR gates 206-1,1 and 206-1,2.
  • the bit stream on lead 205-1 is low-pass filtered by the RC filter, consisting of resistor 207-1 and capacitor 208-1, to produce the random noise source on lead 209-1.
  • the other noise sources on leads 205-j, for 2 ⁇ j ⁇ P, are similarly produced by modulo 2 combining, through exclusive-OR gates 206-j, 1 and 206-j,2, the outputs of selected of the cells.
  • the resultant bit stream is then filtered through a low-pass filter consisting of resistor 207-j and capacitor 208-j, to produce random noise at output
  • each output bit stream is generated from three cell outputs.
  • the minimum and maximum number of cells needed to be tapped to form any of the bit streams from the LFSR is a factor that can be controlled in selecting the tap patterns.
  • the minimum and maximum number of taps on any one cell in the LFSR is controllable.
  • a 0 denote an N-maximal PRBS.
  • a m is obtained from A 0 by shifting forward in time by m clock cycles.
  • Lemma 1 Let m and n be nonnegative integers. Let B — ⁇ bo » bi » b 2 » ' ' ' ⁇ denote the sequence obtained by a bitwise exclusive-OR of A m € S and Apreparing € S, that is, b ( ⁇ a m ⁇ i + a n +i ( mod 2 ) . Then B € S.
  • the first N bits of B cannot all be zero (otherwise, Eq. (3) would imply that B is the all-zero sequence) .
  • a o is an N-maximal PRBS, all possible combinations of N consecutive bits except the all-zero combination must occur in A Q.
  • there must be some nonnegative r such that the first N bits of A r equal the first N bits of B.
  • a Q is periodic with period 2 ⁇ r — 1 , there is no loss of generality in assuming r ⁇ 2 N - 1.
  • B A r € S. Q.E.D.
  • Lemma 1 is a special case of the more general Abelian group property of S under bitwise modulo 2 addition.
  • a pair of taps from an LFSR gives two particular shifted sequences from a restricted set. Their exclusive-OR gives a third sequence by Lemma 1.
  • This sequence in turn, can be exclusive-OR 'ed with another tap to give still other shifted versions of the main sequence.
  • Lemma 1 thus implies that given a maximal LFSR generating a PRBS, the outputs of a collection of cells of this LFSR can be tapped and the mod 2 sum of these outputs taken to obtain a shifted version of the PRBS. The question arises whether any specified shift can be obtained by appropriate choice of the taps.
  • Lemma 2 hereinbelow answers this question in the affirmative.
  • Lemma 2 Let Y denote an N-stage maximal LFSR that is initialized to a nonzero state, and let z ⁇ , 0 ⁇ i ⁇ N — 1, k ⁇ 0, denote the output of cell i of Y at clock cycle k.
  • ⁇ N denotes the set of N-dimensional vectors with components 0 and 1.
  • F can be identified with GF( 2 ).
  • bit streams should be shifted far enough apart so that the network can settle without seeing a shifted version of a noise source in two places. This implies close to equal spacing of the bit streams. In practice, this constraint can be relaxed considerably or eased by simply increasing the shift register size.
  • the fan-out per cell is limited; that is, loading any flip-flop in the register more than is necessary should be avoided.
  • Y denote a maximal N-stage LFSR
  • p ⁇ 2 N — 1 denote the period of the PRBS A Q generated by Y for some specified nonzero initial state.
  • L be a nonnegative integer
  • L , F_, F , and P be positive integers
  • r be a real number such that 0 ⁇ r ⁇ 1.
  • d ⁇ , dj , • • • • , dp_j ⁇ F ⁇ denote a collection of P tap coefficient vectors to be determined, and let G.- ⁇ .
  • S denote the sequence corresponding to d f , as in the proof of Lemma 2.
  • Two vectors, 1 ⁇ IL N and f € R ⁇ , both of which have integer-valued components, are associated with a given collection of tap coefficient vectors d 0 , di , • • • , d _j ⁇ F N .
  • the component ,- of 1 is the number of taps connected to cell i of Y.
  • the component f t of f is the number of Is in (i.e. , the number of cell taps represented by) the tap vector d f .
  • No cell of Y has fewer than L_ taps or more than L taps ( ⁇ l t ⁇ L for
  • the integer L is the aforenoted minimum (resp. , maximum ) allowed cell load.
  • No tap coefficient vector d,- has fewer than F components equal to 1 or more than F components equal to 1 (F ⁇ f ⁇ F for 0 ⁇ i ⁇ P — 1) .
  • the integer F (resp., F) is the aforenoted minimum (resp., maximum ) allowed fan -in. Note: if an N X P matrix is formed such that column i equals vector d,-, then condition 2 says that no row has fewer than Is or more than L Is, and condition 3 says that no column has fewer than F Is or more than F Is.
  • the cost function C is chosen so that minimizing it tends to minimize the components of 1, f, and u. Minimizing the components of 1 alleviates the speed degradation caused by capacitive loading on the cells of Y. Minimizing f minimizes the fan-in (number of inputs) of the exclusive-OR gates whose outputs form the bit streams.
  • state transition matrix M is defined as follows:
  • I ( t f -i ) ⁇ ( w-i ) denotes the (N — l) x (N — 1) identity matrix
  • 0 # _ ⁇ denotes the (N — l)-component all-zero column vector
  • the c t are the feedback coefficients from Eq. (1) .
  • Lemma 3 hereinbelow says that the taps for a given shift t are obtained explicitly by merely calculating the matrix M' and inspecting its first row.
  • Lemma 3 Let Y denote a maximal LFSR initialized in a nonzero state and with state transition matrix M. Let t be a nonnegative integer. Then the tap coefficient vector d for Y that gives a shift forward in time by t clock cycles is the transpose of the first row of the matrix M'. Proof: Let z k denote the vector with components z
  • the set of essential ⁇ T-tap patterns is defined to be the smallest subset from which all _?-tap patterns can be obtained by right-shifting a pattern from this subset by zero or more positions. When right-shifting a pattern, zeros are padded on the left.
  • bit stream shifts for the essential __T-tap patterns are found, the bit stream shifts for all other K-tn ⁇ p patterns can be found trivially. For example, if the shift of 1010000000 is q, then the shift of 0001010000 is q - 3 because the latter pattern is obtained from the former by right-shifting three bit positions.
  • Finding the shift associated with each d € X can take significant CPU time.
  • One straightforward way to do this is a method called simple shifting.
  • an efficient representation Y of the shift register is implemented using the word operations of the host computer.
  • the first N bits of the sequence G corresponding to a given tap coefficient vector d can be calculated easily using Y.
  • g € F ⁇ denote the first N bits of G.
  • g represents the state of Y at the clock cycle that equals the shift of G.
  • Y is shifted one clock cycle at a time until its state is found to equal g .
  • the clock cycle where this equality occurs is the shift associated with d.
  • the simple shifting method uses 0( 1 ) (i.e. , constant) memory and 0( 2 N ) CPU time. It can exact a large time penalty for practical problems. For example, a maximal 25-stage shift register has a sequence length of 34 million clock cycles. Thus, it would be expected that it would be necessary to shift Y an average of 17 million times for each d € X. In practice, however, it has been found that simple shifting is too slow for problems of "practical" size, i.e., when the shift register has more than about 20 cells.
  • Shanks' giant step/baby step method (see, for example, D . E . Knuth, The Art of Computer Programming, Vol. Ill: Sorting and Searching. Reading, MA: Addison-Wesley, 1973, p. 9.)
  • Shanks' giant step/baby step method (see, for example, D . E . Knuth, The Art of Computer Programming, Vol. Ill: Sorting and Searching. Reading, MA: Addison-Wesley, 1973, p. 9.)
  • bit patterns representing the states of the shift register at uniformly-spaced clock cycle intervals.
  • d the associated shift register state g is calculated, as was done for the simple shifting method.
  • the shift register representation Y is started in the state g . It is then shifted one clock cycle at a time until its state is found to equal one of the bit patterns stored in the table.
  • the shift associated with d is then the shift of the table bit pattern less the number of shifts needed to bring Y to that state.
  • this method proceeds as follows.
  • a "reasonable" giant step size h is chosen.
  • a small h implies a fast calculation of the shift for each tap pattern, but the cost in memory usage and table setup time grows as h becomes smaller. Therefore, a compromise value of h must be chosen.
  • h — 5000 might be chosen.
  • a hash table is filled with records, each containing the integer ih and the bit pattern M a z° for some specified nonzero initial state z°.
  • Y denote an efficient representation of the shift register
  • d denote a tap pattern
  • g * denote the first N bits of the bit stream corresponding to d when the initial state of Y is z°.
  • Y is initialized to the state g * .
  • counter r is initialized to zero. Then t is found as follows:
  • a listing of the program appears in APPENDIX A .
  • the user inputs the number of cells in the register, the feedback pattern, and the number of bit streams required. Also input is the maximum and minimum allowable loading on a cell, the maximum and minimum allowed fan-in, and the maximum allowed shift variation.
  • weighting factors are assigned to the u, 1, and f components, which are also specified by the user.
  • the user specifies the coefficients of a penalty function used when a potential solution falls outside the specified ranges.
  • the program was used to derive the tap patterns for 32 bit streams as generated from a 25-stage shift register. Since a 25-stage shift register produces a PRBS of maximal length 33,554,432 C 2 25 - 1 ) , the time separation between bit streams is approximately one million clock cycles.
  • Each tap pattern line in TABLE I indicates the cells to be tapped to produce the bit stream having the particular shift.
  • the first bit stream is generated from the modulo 2 combination from the outputs of the cells 9, 13, and 21, the cells being numbered from 0 to 24 from right to left, as hereinabove.
  • each bit stream is separated from repetition by its nearest neighbor by approximately 10 milliseconds.
  • Each bit stream is low pass filtered to about 5 MHz.
  • An anneal cycle of 10 microseconds would therefore have about 50 analog zero crossings.
  • each neuron would see noise that would not be repeated anywhere else in the network for about 1000 anneal cycles which is a substantially greater separation than is required.
  • This same LFSR could conceivably be used for 1000 times as many neurons. For neural network applications, therefore, shift spacing is less important for design than fan-in or fan-out considerations.
  • the hardware advantage of the present invention is particularly important when such large numbers of bit streams need to be generated.
  • the number of cells in the shift register grows as log(P ), where P is the number of bit streams.
  • the hardware requirement for prior art methods grows directly with P.
  • the hardware advantage of the present invention will be significant.
  • bit-error rate testers use a pseudo-random bit stream to test communication systems at high speed. The speed is limited by the rate at which the shift register can be clocked. By providing multiple uncorrelated noise sources and then multiplexing them, a new pseudo ⁇ random bit stream at higher speed can be provided because a multiplexer can operate faster than a shift register in a given technology.
  • the bit-error rate tester can provide multiple outputs for parallel testing, which is generally not available in currently available equipment .

Abstract

Plural, arbitrarily-shifted, pseudo-random bits streams are generated from a single linear feedback shift register (LFSR) (201). Each bit stream is obtained by tapping the outputs of selected LFSR cells (202) and feeding these tapped cell outputs through a set of exclusive-OR gates (206). The taps are selected in order to achieve the desired shift between bit streams. In addition, the tap patterns can be selected so that the number of inputs (fan-in) to each bit stream are within predetermined bounds and that the number of taps per cell (cell load ) are within predetermined bounds. A disclosed computer program generates the tap patterns as a function of the number of cells and the structure of the LFSR, the number of output bit streams, the maximum allowed shift variation of the bit streams, and the bounds on fan-in and cell load. Each pseudo-random bit stream serves as an input to a low-pass filter which produces an essentially Gaussian noise output. The plural noise outputs are relatively uncorrelated and can be used in a parallel stochastic learning neural network for purposes such as annealing.

Description

GENERATOR OF MULTIPLE UNCORRELATED NOISE SOURCES
BACKGR OUND OF THE INVENTION
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
In a prior art neural network test chip, a stochastic learning technique with a local learning rule was implemented in VLSI, (see, for example, U .S. Patent No . 4,874,964, issued October 17, 1989 to J. Alspector and R . B. Allen; J. Alspector and R . B. Allen, "A neuromorphic vlsi learning system," in Advanced Research in VLSI: Proceedings of the 1987 Stanford Conference, P. Losleben, Ed. Cambridge, MA: MIT Press, pp. 313-349, 1987; J. Alspector, R . B . Allen, V . Hu, and S. Satyanarayanna, "Stochastic learning networks and their electronic implementation," Proceedings of the conference on Neural Information Processing Systems, Denver, CO , pp. 9-21, Nov. 1987, D . Anderson, Ed. New York, NY: Am . Inst. of Phys. , 1988; and J. Alspector, B . Gupta, and R . B . Allen, "Performance of a stochastic learning microchip" in Advances in Neural Information Processing Systems 1, Denver, CO , pp. 748-760, November 1988, D . S. Touretzky, Ed. San Mateo, CA: Morgan Kaufmann, 1989) . The Boltzman algorithm (D . H . Ackley, G . E . Hinton, and T. J. Sejnowski, "A learning algorithm for Boltzmann machines," Cognitive Science 9, pp. 147-169, 1985) depends on the stochastic settling of the neural system using the process of simulated annealing (S. Kirkpatrick, C. D . Gelatt, and M . P. Vecchi, "Optimization by simulated annealing," Science, 220, pp. 671-680, 1983) to avoid local minima in the energy function that describes its evolution. In the aforenoted prior art neural network prototype test chip, highly amplified Gaussian thermal noise generated by electrons in a transistor was used for annealing. Each neuron was fed by a separate thermal noise generator, so that its state would be unaffected by the noise seen by the others.
Neural learning algorithms such as this capture correlations seen by neural states to perform classification based on input data. For local learning rules, stochastic elements are necessary for, among other reasons, performing unbiased averaging over neural states elsewhere in the network. Correlations in the noise they see would cause errors in the learning since these undesired correlations would be captured by the learning rule. Other reasons for stochastic elements in neural networks include the search of a large solution space, helping a network settle while avoiding local minima, and interpolating between discrete values of weights by time averaging.
Although a thermal noise generator seems simple and unbiased it has implementation problems. In particular, it exacts a substantial area penalty; and, in fact, occupies much more area than the neuron itself. More significantly, the large gain needed to amplify thermal noise can lead to cross coupling of the on-chip amplifiers thereby frustrating the original purpose of using separate noise amplifiers to obtain zero cross correlation. Despite this, the small network on the prior art test chip demonstrated satisfactory learning for small problems. To scale this network to larger size, it would have to be sensitive to more subtle correlations and therefore the noise sources must show minimal correlation.
A linear feedback shift register (LFSR) produces a pseudo-random bit stream (PRBS) that can be used to make an analog noise source. The PRBS is processed by a low-pass filter with cutoff frequency just a few percent of the clock frequency. This has the effect of performing a time integration over many bits. If each bit's value is randomly distributed with a probability of 0.5 for 0 or 1, then the value of this integration follows a binomial distribution that approaches a Gaussian distribution for a large number of bits. This creates a Gaussian analog pseudo-random noise source whose statistical properties are similar to the thermal noise which is to be modeled with a simulated annealing technique. Variable amplifiers with gains low enough to avoid coupling problems are then sufficient to perform the annealing process. An ΛT-stage LFSR creates a PRBS of maximal length, 2N — 1, when the feedback taps are chosen appropriately. One useful property of such a PRBS is that it has cross correlation — 1/ (2^ — 1)
(effectively negligible) with a time shifted version of itself, assuming the cross correlation is calculated after replacing each 1 of the binary bit stream with — 1 and each 0 with 1. (see, for example, S. W . Golomb , Shift Register Sequences, revised ed. Laguna Hills, CA : Aegean Park Press, 1982.) For neural network purposes, this time shift must be large enough for the network to settle sufficiently to "forget" the sequence during the anneal cycle before it sees another version of it later. In practice, this is obtained easily with relatively small shift registers because the length of the sequence grows exponentially with the shift register size.
This shifting could be accomplished by using a collection of identical LFSR s, one per neuron . Each would be loaded with a specified initial state to obtain a desired shift relative to the other LFSR s. All LFSRs would be clocked simultaneously. The overhead of such an approach, however, is unacceptable. For instance, a single 25-stage shift register (with a maximal period of 34 million clock cycles) would require approximately 400,000 square microns in 2 micron CMOS technology, which is considerably larger even than the thermal noise amplifier of the prior art implementation in the same technology.
Various techniques for generating plural PRBS have been reported. For example, P. D . Hortensius, R . D . McLeod, W. Pries, D . M . Miller, and H . C. Card, describe a "Cellular automata-based pseudorandom number generators for built-in self-test," in IEEE Trans.
Computer-Aided Design, vol. 8, no. 8, pp. 842-859, Aug. 1989. As disclosed therein, cellular automata are employed to generate pseudo-random bits in parallel. W. J. McFarland, K . H . Springer, and C .-S. Yen , describe a "1- gword/s pseudorandom word generator," in IEEE J. Solid-State Circuits, vol. 24, no. 3, pp. 747-751, June 1989. This pseudorandom word generator uses a feedback/feedforward technique with exclusive-OR gates at each shift register stage. This technique requires as least as many shift register stages as outputs. A wideband digital pseudo-Gaussian noise generator is disclosed in U .S. Patent No. 3,747,381 , issued June 26, 1973 to W . J. Hurd. This noise generator requires at least two feedback shift registers of relatively prime lengths. Disadvantageously, in all these prior art noise and/or PRBS generators, the number of cells required linearly increases with the number of required bit streams, P.
An object of the present invention is reduce to a minimum the hardware necessary to generate multiple pseudo-random noise sources required for annealing in neural networks. An additional object of the present invention is to amortize the space required for a single generator of plural noise sources amongst many neurons in a neural network so that an acceptable small area overhead for VLSI implementations results. SUMMAR Y OF THE INVENTION
In accordance with the present invention a single maximal length linear feedback shift register is used to generate multiple, arbitrarily-shifted, pseudo-random bit streams. Each bit stream is converted to an analog noise source by filtering. In particular, each bit stream is obtained by tapping the outputs of selected LFSR cells and feeding these tapped cell outputs through a parity tree consisting of exclusive-OR gates. In accordance with the invention, the particular cells of the LFSR tapped to form each bit stream are selected to meet certain constraints. In particular the taps are chosen so that: (1) the shift variation between bit streams is within a set limit; (2) each cell is tapped to provide an input to no fewer than and no greater than preset numbers of bit streams; and (3) each bit stream is formed from no fewer than and no greater than preset numbers of cell outputs.
An advantage of the present invention is that the number of cells needed to produce P bit streams grows as log( P ) rather than linearly with P.
BRIEF DESCRIPTION OF THE DRAWING
FIG. 1 is a schematic diagram of a conventional prior art linear feedback shift register used to make an analog noise source; and
FIG . 2 is a schematic diagram of a single linear feedback shift register used to generate multiple pseudo-random bit streams in accordance with the present invention. DETAILED DESCR IPTION
With reference to FIG. 1, the single /V-stage LFSR 101, also denoted Y in the equations derived hereinbelow, consists of N clocked D-type flip-flops 102-(N-1) - 102-0. The N stages, also called cells, are arrayed horizontally with the shift direction from left to right, i.e. , the input of every cell except the leftmost cell is connected directly to the output of the cell on its left. The cells are numbered consecutively from (_V - 1) to 0, with the (.V - l)th cell, 102-(N-1), on the left and the zeroth cell, 102-0, on the right. The signal fed to the D input of the (N - l)th (leftmost) cell, 102- (N-l) , is obtained from the feedback function H. This is the modulo 2 sum of the outputs belonging to a subset of the N cells, that is,
N-l
H i ∑ C& ( mod 2 ) ( 1) i=*0
where -^ denotes "is defined to be equal to," z{ is the output of cell i, and each feedback coefficient ct is either 0 or 1. In the embodiment of FIG . 1, c0 and c3 equal 1 and the other _.,• equal 0. These are just chosen for illustration and in reality would be determined as a function of N and the primitive polynomial thereof, to be defined hereinbelow. Exclusive-OR gate 103 forms the modulo 2 sum of the two fed back outputs of cells 102-0 and 102-3. The output of gate 103 provides the D-input to cell 102-(N-1).
To shift register Y 101 means to apply one or more clock pulses simultaneously to the CK clock inputs cells of Y 101. The clock is not shown. The PRBS generated by Y is, by definition, the sequence of bits generated by the zeroth (rightmost) cell, one bit per clock cycle, as Y is shifted. The sequence of states that Y evolves through as it is shifted is determined by the initial state and the feedback function H . Thus, the PRBS for a given LFSR depends on its initial state. If Y sequences through all possible nonzero states whenever it starts in a nonzero initial state, Y is said to be maximal. Maximality occurs only for certain choices of the feedback coefficients c,-, namely, if the polynomial c(x), where c(x) is defined by the expression
Figure imgf000009_0001
is primitive in GF( 2Λr) , where GF( 2 ) denotes the Galois field with 2N elements. A PRBS generated by a maximal N-stage LFSR starting in a nonzero initial state is called an N-maximal PRBS. Some straightforward implications of maximality include (a) an N-maximal PRBS has period
2^ — 1 , and (b) every possible combination of N consecutive bits, except the all-zero combination, occurs somewhere in an N-maximal PRBS.
In the prior art analog noise generator in FIG . 1, the pseudo-random bit sequence is taken at the Q output of the rightmost cell, 102-0. This digital bit sequence on lead 104 is processed by a low pass filter having a cutoff frequency just a few percent of the clock frequency, and consisting of resistor 105 and capacitor 106.. An essentially Gaussian analog pseudo-random noise source is thus created at output 107.
With reference to FIG. 2, a single maximal length LFSR 201 is used to derive plural pseudo-random bit streams. As in the prior art described hereinabove, LFSR consists of Ν cells, 202-(Ν-l) - 202-0. Feedback is provided to the D input of cell 202-(N-l) as determined by a primitive polynomial of the N-stage register. As in the prior art structure, feedback is provided in this illustrative example from the Q outputs of the 0th and 3rd cells, 202-0 and 202-3, respectively, which are modulo 2 summed by exclusive-OR gate 203. As above, these particular cells are selected just for purposes of illustration.
It has been determined and mathematically proven by the inventors herein, that by tapping and modulo 2 combining the outputs of particularly selected cells of the maximal length LFSR , shifted versions of the basic bit pattern can be generated. By proper selection of the cells tapped, in fact, any of the 2^- 1 possible shifts can be generated. If the shifts are sufficiently far apart, each combination of cell outputs can serve as a separate source of noise that is essentially uncorrelated with the other sources generated from the same LFSR . It is thus necessary to know which cells to tap to generate the plural bit streams that are shifted sufficiently apart to ensure low correlation. As will be described, the cells tapped can be chosen such that in addition to meeting a shift constraint, other constraints can be met that affect the physicality of a VLSI implementation. Advantageously, the number of bit streams that can be generated from the single LFSR is not limited to the number of cells in the shift register.
In the purely illustrative example in FIG .2, P sources of random Gaussian noise are generated. As just noted, these noise sources are generated from P pseudo-random bit streams, which are shifted versions of each other, by modulo 2 combining the Q outputs of selected cells in the register. In the illustrative example of FIG. 2, the first bit stream on lead 205-1 is produced from the modulo 2 combination of the Q outputs of cells 202-0, 202-1, and 202-3 which are combined by exclusive-OR gates 206-1,1 and 206-1,2. The bit stream on lead 205-1 is low-pass filtered by the RC filter, consisting of resistor 207-1 and capacitor 208-1, to produce the random noise source on lead 209-1. The other noise sources on leads 205-j, for 2 ≤ j ≤ P, are similarly produced by modulo 2 combining, through exclusive-OR gates 206-j, 1 and 206-j,2, the outputs of selected of the cells. The resultant bit stream is then filtered through a low-pass filter consisting of resistor 207-j and capacitor 208-j, to produce random noise at output
209-j. In this illustrative example, each output bit stream is generated from three cell outputs. As will be noted hereinafter, the minimum and maximum number of cells needed to be tapped to form any of the bit streams from the LFSR , called the minimum and maximum allowed fan-in, respectively, is a factor that can be controlled in selecting the tap patterns. Also , the minimum and maximum number of taps on any one cell in the LFSR , called the minimum and maximum allowed cell load, respectively, is controllable.
In what follows, an algorithm for determining the taps will be provided. First, however, a mathematical foundation will be presented for the technique of the present invention. Two lemmas that are keys to the technique of the present invention for generating multiple bit streams from a single LFSR will be proven. The first lemma loosely says that the bit stream obtained from the modulo 2 combination of the outputs of the cells of a maximal LFSR gives a shifted version of the basic LFSR bit stream. The second lemma says that any desired shift can be obtained by appropriate choice of the taps.
Preceding the rigorous mathematical foundation, let A0 denote an N-maximal PRBS. For each nonπegative integer k, let ak ζ {0 , 1 } denote the value of the sequence A Q at clock cycle k. This is indicated with the notation A0 = {a0 , aι , a2 , ' ' ' }• For every positive integer m, define Am ^{am , am+1 , am+2 , • • ■ }. Note that Am is obtained from A0 by shifting forward in time by m clock cycles. Finally, for a given nonzero initial state of Y, let S denote the set containing the all-zero sequence along with the shifted sequences Am, where O ≤ m ≤ 2N — 2. Lemma 1, which says, in general terms, that the bitwise exclusive-OR of two shifted versions of a given N-maximal PRBS generates another shifted version of the same PRBS, can now be stated:
Lemma 1: Let m and n be nonnegative integers. Let B — { bo » bi » b2 » ' ' ' } denote the sequence obtained by a bitwise exclusive-OR of Am € S and A„ € S, that is, b( ^am ÷i + an +i ( mod 2 ) . Then B € S.
Proof: A Q is generated by a recursion relation of the form
N-l ak÷N = ∑ c.α*+. ( mod 2 ) (3) ι=0
where the feedback coefficients c are either 0 or 1. Clearly, Am and A„ also satisfy this recursion relation. Since Eq. (3) is linear, B (which equals the bitwise modulo 2 sum of Am and A„) satisfies Eq. (3) as well. Thus, the entire sequence B is determined by its first N bits. Suppose m and n are equal modulo 2N — 1. Then Am and An are identical sequences; thus, B is the all-zero sequence and is therefore a member of S. Now suppose m and n are unequal modulo 2^ — 1. Then Am and A„ are not identical sequences and B is not the all-zero sequence. In particular, the first N bits of B cannot all be zero (otherwise, Eq. (3) would imply that B is the all-zero sequence) . Since A o is an N-maximal PRBS, all possible combinations of N consecutive bits except the all-zero combination must occur in A Q. Thus, there must be some nonnegative r such that the first N bits of Ar equal the first N bits of B. Since A Q is periodic with period 2Λr — 1 , there is no loss of generality in assuming r < 2N - 1. Thus B = Ar € S. Q.E.D.
Lemma 1 is a special case of the more general Abelian group property of S under bitwise modulo 2 addition. A pair of taps from an LFSR gives two particular shifted sequences from a restricted set. Their exclusive-OR gives a third sequence by Lemma 1. This sequence, in turn, can be exclusive-OR 'ed with another tap to give still other shifted versions of the main sequence. Lemma 1 thus implies that given a maximal LFSR generating a PRBS, the outputs of a collection of cells of this LFSR can be tapped and the mod 2 sum of these outputs taken to obtain a shifted version of the PRBS. The question arises whether any specified shift can be obtained by appropriate choice of the taps. Lemma 2 hereinbelow answers this question in the affirmative.
Lemma 2: Let Y denote an N-stage maximal LFSR that is initialized to a nonzero state, and let z\ , 0 ≤ i ≤ N — 1, k ≥ 0, denote the output of cell i of Y at clock cycle k. For a collection of coefficients d[ € {0 , 1}, 0 -≤ i ≤ N — 1, define a sequence G ^{go , g\ , g , ' ' } such that
ΛΓ-I gk ± ∑ dtzk t ( mod 2 ) (4) t=0
Then for every Ar € S, there exists a collection of coefficients dt such that G — Ar.
Note: In what follows, the coefficients dt are called the tap coefficients. ΕN denotes the set of N-dimensional vectors with components 0 and 1. F can be identified with GF( 2 ).
Proof: From Lemma 1, G € S. Each collection of tap coefficients d = [d0 , d1 } ■ • ■ , -_//_ι ]τ (T denotes transpose) is identified with a member of FN. Consider the function Q : FN→S that maps (according to Eq. (4)) each tap coefficient vector d € F^ to its corresponding sequence G € S. It will be shown that Q is injective. Since Q is a linear map, it is injective if and only if it maps all nonzero points in its domain to nonzero points in its range. Let d" be a nonzero point of F^. Then there exists an m such that the with element of d* is not zero, that is, dm" = 1. Since the shift register Y is maximal, there exists a clock cycle k such that z„ = 1 and zk = 0 for all i ≠ m. By Eq. (4) , the bit value of G * *^Q( d* ) at clock cycle k is dm = 1. Thus, G is not the all-zero sequence. Note that ¥N has 2N elements and S has 2N sequences. Since the function Q is injective, it follows that Q is surjective because its domain and range have a finite and equal number of elements. Q.E.D.
It is thus proven that for a maximal LFSR , the 2^ — 1 nonzero tap patterns map uniquely to the 2^ — 1 possible shift values ( 0 , 1 , • • • , 2N — 2). Therefore, any shift is possible if the right tap pattern is found, and each tap pattern can be identified with a unique shift. These two viewpoints form the basis for the practical problem to be solved; namely, that of generating properly shifted versions of the original bit stream in a hardware-efficient manner.
The constraints due to VLSI implementation of a neural net model are first described:
1. The bit streams should be shifted far enough apart so that the network can settle without seeing a shifted version of a noise source in two places. This implies close to equal spacing of the bit streams. In practice, this constraint can be relaxed considerably or eased by simply increasing the shift register size.
2. For performance reasons, the fan-out per cell is limited; that is, loading any flip-flop in the register more than is necessary should be avoided.
3. As few inputs as possible to each set of exclusive-OR gates associated with a bit stream is desired. This reduces silicon area and improves performance. In fact, layout simplicity may require an equal number for all sets.
To formulate a precise problem statement, again let Y denote a maximal N-stage LFSR , and let p ^2N — 1 denote the period of the PRBS A Q generated by Y for some specified nonzero initial state. Let L be a nonnegative integer, let L , F_, F , and P be positive integers, and let r be a real number such that 0 ≤ r < 1. Let dø , dj , • • • , dp_j ζ F^ denote a collection of P tap coefficient vectors to be determined, and let G.- ζ. S denote the sequence corresponding to df, as in the proof of Lemma 2. Let s{ denote the shift of G,- relative to A Q, where O ≤ st < p . Without loss of generality, assume _?,- ≤ si+ι for all i < P — 1. Define the shift differences t Q ≤ i ≤ P - 1, as follows:
(•Si+i - ■*«» if O -≤ i ≤ P - 2 h ± p + SQ — _- _-ι, if i = P — 1 (5)
Let u ( Rf denote the P -vector whose ith component is u{ = \tt/{p IP ) — 1 This is the normalized version of the shift difference. Two vectors, 1 ζ ILN and f € R^, both of which have integer-valued components, are associated with a given collection of tap coefficient vectors d0 , di , • • • , d _j ζ FN. The component ,- of 1 is the number of taps connected to cell i of Y. The component ft of f is the number of Is in (i.e. , the number of cell taps represented by) the tap vector df. Let C : RP X RP X RΛΓ ** [0 , ∞ ) denote a cost function. C( u , f , 1 ) is the cost associated with a collection of tap coefficient vectors dg , -^ , • • • , dP- ζ. FN. With these definitions, the problem can be stated precisely. The implementation constraints noted above can be restated in mathematical terms as follows:
Problem Statement: A collection of tap coefficient vectors do , dj , • • • , dp_j € F^ needs to be found that minimizes the cost C( u , f , 1 ) subject to the following conditions:
1. Uι = \tt/ p IP ) — 1 I ≤ r for all i. The parameter r is the maximum allowed shift variation.
2. No cell of Y has fewer than L_ taps or more than L taps ( ≤ lt ≤ L for
0 ≤ i ≤ N — 1) . The integer L (resp. , ) is the aforenoted minimum (resp. , maximum ) allowed cell load.
3. No tap coefficient vector d,- has fewer than F components equal to 1 or more than F components equal to 1 (F ≤ f ≤ F for 0 < i ≤ P — 1) .
The integer F (resp., F) is the aforenoted minimum (resp., maximum ) allowed fan -in. Note: if an N X P matrix is formed such that column i equals vector d,-, then condition 2 says that no row has fewer than Is or more than L Is, and condition 3 says that no column has fewer than F Is or more than F Is.
The cost function C is chosen so that minimizing it tends to minimize the components of 1, f, and u. Minimizing the components of 1 alleviates the speed degradation caused by capacitive loading on the cells of Y. Minimizing f minimizes the fan-in (number of inputs) of the exclusive-OR gates whose outputs form the bit streams.
Clearly, 1 and f are strongly correlated (minimizing the components of one tends to minimize those of the other) . Minimizing the components of u tends to keep the bit streams uniformly separated in time. The exact form of the cost function C depends on the relative importance of minimizing these various quantities in a particular application.
If the loads on the cells of the shift register or the fan- ins of the exclusive-OR gates are of no concern, then the cost function C does not depend on 1 or f; moreover, and F are small enough and L and F are large enough so that conditions 2 and 3 are satisfied trivially. The problem then reduces to generating P bit streams with specified, exact time separations. This problem has a simple analytical solution. To see this, first note that the evolution of the shift register's state is governed by the following equation:
(6)
Figure imgf000015_0002
where the state transition matrix M is defined as follows:
M 0_v-ι I(_v-i)x(_v-i)
C Q C ι • • ■ CN → (7)
Here I(tf-i)χ(w-i) denotes the (N — l) x (N — 1) identity matrix, 0#_ι denotes the (N — l)-component all-zero column vector, and the ct are the feedback coefficients from Eq. (1) .
Lemma 3 hereinbelow says that the taps for a given shift t are obtained explicitly by merely calculating the matrix M' and inspecting its first row.
Lemma 3: Let Y denote a maximal LFSR initialized in a nonzero state and with state transition matrix M. Let t be a nonnegative integer. Then the tap coefficient vector d for Y that gives a shift forward in time by t clock cycles is the transpose of the first row of the matrix M'. Proof: Let zk denote the vector with components z
(cf. Eq. (6)) . Let βø *%. FN denote the column vector with 1 as its zeroth component and 0s for the remaining N — 1 components. Then the value of the PRBS generated by Y at clock cycle k is
Figure imgf000015_0001
For any tap coefficient vector d, the output generated at clock cycle k is dτ z* = dTM*z°. If dτ = ej Mf is chosen, then the output at clock cycle k is e<f Mf M*z0 = ej M*+' z°. But this is the same as ak+t, by Eq. (8) . Q.E.D. Lemma 3 provides a solution when the loads or fan-ins are of no concern. Note that Mf can be calculated in log ( CPU time. One can calculate a table containing the matrix powers
M° , M1 , M2 , M4 , M8 , • • • , M2^ . Then the binary representation of t can be used to choose the powers of M to multiply together to calculate Mr.
The previous special case showed that it is easy to calculate the taps necessary to obtain exact shifts when the load or fan-in are not a concern. When they are a consideration, the shifts must be allowed to vary from their nominal value (i.e., select a nonzero value for r) and a heuristic technique must be used to find a "good" set of taps. Since a fairly wide variance in the shift values can be allowed for this noise-generating application, solution candidates are abundant and a large state space may be searched to find a solution with acceptably low fan-in and cell load.
The software solution implemented for this problem can be described as follows. First, consider the set of tap patterns with K taps for an N-cell shift register. The number of such patterns is
W = κ\{N - κy. (9)
The set of essential ΛT-tap patterns is defined to be the smallest subset from which all _?-tap patterns can be obtained by right-shifting a pattern from this subset by zero or more positions. When right-shifting a pattern, zeros are padded on the left. The set of essential patterns has only I _ 1 j members, or K/N times the number of total patterns. For example, the number of 2- tap patterns for a 10-cell register is [ ^ j = 45, while there are only I _, j = 9 essential patterns, viz. ,
1100000000 1010000000 1001000000 1000100000 1000010000 1000001000 1000000100 1000000010 1000000001
Note that once the bit stream shifts for the essential __T-tap patterns are found, the bit stream shifts for all other K-tnτp patterns can be found trivially. For example, if the shift of 1010000000 is q, then the shift of 0001010000 is q - 3 because the latter pattern is obtained from the former by right-shifting three bit positions.
Let X denote the collection of all essential tap coefficient vectors d 6 F^ with at least F Is but not more than F Is. The number of elements in X, \\X\\, is
iixii = 'if Cf-.1) (10)
(This is a polynomial in N of order F — 1.) For each d € X, a record is stored in main memory that contains a representation of d along with the shift of its corresponding sequence (see hereinbelow regarding the calculation of this shift) . Note that memory usage is greatly reduced by including only the essential tap patterns in the set X. Simulated annealing, or any desired random or deterministic technique, is used to search X to find a collection of tap coefficient vectors that minimizes the cost function and satisfies conditions 1 and 2 (condition 3 is satisfied by construction) . If a solution exists, this method will find it given enough CPU time.
In practice, it was discovered that even the set of tap coefficient vectors with only two Is produces shifts that are fairly well distributed throughout the interval [ 0 , 2^ — 2 ] . Thus, the procedure is normally tried first with X containing just the tap coefficient vectors with F Is. The members of X are bucket-sorted according to the nominal shift value to which they are closest. If all the buckets contain at least one tap, a solution is sought. If no satisfactory solution can be found, then the tap coefficient vectors with F + 1 Is are added to X and bucket-sorted, and the best solution is sought again. This process (of adding a new set of tap coefficient vectors to X then searching X for the best solution) is continued, if necessary, until the tap coefficient vectors with F Is have been added to X.
Finding the shift associated with each d € X can take significant CPU time. One straightforward way to do this is a method called simple shifting. Here an efficient representation Y of the shift register is implemented using the word operations of the host computer. For a given nonzero initial state z° of the shift register, the first N bits of the sequence G corresponding to a given tap coefficient vector d can be calculated easily using Y. Let g € F^ denote the first N bits of G. Note that g represents the state of Y at the clock cycle that equals the shift of G. Thus, starting at the given initial state z°, Y is shifted one clock cycle at a time until its state is found to equal g . The clock cycle where this equality occurs is the shift associated with d.
The simple shifting method uses 0( 1 ) (i.e. , constant) memory and 0( 2N) CPU time. It can exact a large time penalty for practical problems. For example, a maximal 25-stage shift register has a sequence length of 34 million clock cycles. Thus, it would be expected that it would be necessary to shift Y an average of 17 million times for each d € X. In practice, however, it has been found that simple shifting is too slow for problems of "practical" size, i.e., when the shift register has more than about 20 cells.
Faster calculation at the expense of increased main memory usage can be obtained with a variant of what is known as Shanks' giant step/baby step method, (see, for example, D . E . Knuth, The Art of Computer Programming, Vol. Ill: Sorting and Searching. Reading, MA: Addison-Wesley, 1973, p. 9.) Here are stored a collection of bit patterns representing the states of the shift register at uniformly-spaced clock cycle intervals. Then given a tap coefficient vector d, the associated shift register state g is calculated, as was done for the simple shifting method. The shift register representation Y is started in the state g . It is then shifted one clock cycle at a time until its state is found to equal one of the bit patterns stored in the table. The shift associated with d is then the shift of the table bit pattern less the number of shifts needed to bring Y to that state.
In more detail, this method proceeds as follows. First, a "reasonable" giant step size h is chosen. As will be noted, a small h implies a fast calculation of the shift for each tap pattern, but the cost in memory usage and table setup time grows as h becomes smaller. Therefore, a compromise value of h must be chosen. For the example of a 25-stage shift register, h — 5000 might be chosen. Next, for all integers i such that i ≥ 0 and ih ≤ 2N — 2, a hash table is filled with records, each containing the integer ih and the bit pattern Maz° for some specified nonzero initial state z°. For the example, this means that v ^[( 2N - 2) /h J + 1 = 6711 bit patterns are calculated and installed in the hash table, where [x\ denotes the greatest integer not greater than x. If E = MA is initially calculated, then the hash table building takes v matrix multiplications. Once w = M'Az° has been calculated for some value of i, M^l +1^hz° is simply Ew.
As in the case of the simple shifting method, let Y denote an efficient representation of the shift register, let d denote a tap pattern, and let g* denote the first N bits of the bit stream corresponding to d when the initial state of Y is z°. To find the shift t associated with d, Y is initialized to the state g*. Also, counter r is initialized to zero. Then t is found as follows:
1. Lookup the bit pattern that represents the state of Y in the hash table.
If this bit pattern, which equals Mttz° for some i, is in the hash table, set t = ih — r and exit from loop; otherwise, go to step 2.
2. Shift Y by one clock cycle and increment counter r by 1. Go to step 1. Note that the loop will never be executed more than h times, and, on average, it is executed Λ /2 times for each calculation of /. That is, the time complexity of each t calculation is O( h ) . This results in a significant savings in CPU time for each t calculation relative to the simple shift method. Since v ~ 2N Ih bit patterns must be stored in the hash table, the memory complexity in terms of N and h is 0( 2NIh ) . The time required to calculate the v bit patterns in the hash table is also proportional to v and is therefore 0( 2N h ) . Clearly, the value of h must be chosen based on N and the number of t calculations to minimize the total time (setup time plus t calculation time) while keeping the memory usage within reasonable limits.
The tap-calculating procedure described above has been implemented in the C programming language ( B . W. Kernighan and D . M . Ritchie, The C Programming Language, Prentice-Hall, Inc. , 1978) . For the shift registers of interest (i.e. those having fewer than 30 stages) , it was found that the giant step/baby step algorithm was adequate for tap shift calculations. Even after the floating-point intensive code for tap cost calculation was optimized for efficient execution on a vector processor machine, it was found that the CPU time bottleneck was the solution search (optimization) step, not the tap shift calculation. For shift register larger than 30 stages, other algorithms may be needed for tap shift calculation.
A listing of the program appears in APPENDIX A . The user inputs the number of cells in the register, the feedback pattern, and the number of bit streams required. Also input is the maximum and minimum allowable loading on a cell, the maximum and minimum allowed fan-in, and the maximum allowed shift variation. In minimizing the cost function C, weighting factors are assigned to the u, 1, and f components, which are also specified by the user. In addition the user specifies the coefficients of a penalty function used when a potential solution falls outside the specified ranges.
The program was used to derive the tap patterns for 32 bit streams as generated from a 25-stage shift register. Since a 25-stage shift register produces a PRBS of maximal length 33,554,432 C 225 - 1 ) , the time separation between bit streams is approximately one million clock cycles. The solution is shown in TABLE I. This solution search was run with the maximum and minimum fan-in set equal to three (F_ = F = 3) . The minimum cell load ( ) was set at three and the maximum cell load ( ) was set at four. The maximum allowed shift variation was 0.4. The resulting solution had an average load per shift register cell of 3.8, with four cells having three connections and 21 cells having four connections. The actual maximum shift variation for this solution (maximum uf from condition 1 of problem statement hereinabove) was 0.32. Each tap pattern line in TABLE I indicates the cells to be tapped to produce the bit stream having the particular shift. As an example, the first bit stream is generated from the modulo 2 combination from the outputs of the cells 9, 13, and 21, the cells being numbered from 0 to 24 from right to left, as hereinabove.
TABLE I Tap Pattern Solution number of bit streams: 32 feedback cells: 0 and 3 feedback pattern: 0000000000000000000001001 sequence length: 33,554,431 fan-in = 3, all bit streams maximum number of taps on a cell: 4 minimum number of taps on a cell: 3 average number of taps per cell: 3.8 maximum shiftvariation: 0.32
Tap Pattern Shift

4444444444344444444433443 loading pattern By using the techniques of the present invention in a CMOS implementation of a neural network to generate plural uncorrelated analog noise sources for annealing, the substantial cost in silicon area of the LFSR can be amortized over many neurons while the incremental cost per neuron is limited to some simple combinatorial logic. The function of the low-pass filter could also be served by the frequency response of the neuron itself, thereby saving the area cost associated with the filter. In addition to the area advantage, a single LFSR avoids the control and synchronization problems of multiple LFSRs. In the example of TABLE I, the maximal length sequence becomes 32 separate bit streams with an average separation of about one million clock cycles. By clocking the LFSR at 100 MHz, each bit stream is separated from repetition by its nearest neighbor by approximately 10 milliseconds. Each bit stream is low pass filtered to about 5 MHz. An anneal cycle of 10 microseconds would therefore have about 50 analog zero crossings. In a network containing 32 neurons, each neuron would see noise that would not be repeated anywhere else in the network for about 1000 anneal cycles which is a substantially greater separation than is required. This same LFSR could conceivably be used for 1000 times as many neurons. For neural network applications, therefore, shift spacing is less important for design than fan-in or fan-out considerations.
The hardware advantage of the present invention is particularly important when such large numbers of bit streams need to be generated. In the present invention, for a given relative shift spacing between the bit streams, the number of cells in the shift register grows as log(P ), where P is the number of bit streams. In contrast, the hardware requirement for prior art methods grows directly with P. In future generations of neural network chips, it is envisioned that hundreds and perhaps thousands of bit streams will be required. Accordingly, the hardware advantage of the present invention will be significant.
Although described in connection with providing noise sources for stochastic neural networks, the present invention has other applications. For example, bit-error rate testers use a pseudo-random bit stream to test communication systems at high speed. The speed is limited by the rate at which the shift register can be clocked. By providing multiple uncorrelated noise sources and then multiplexing them, a new pseudo¬ random bit stream at higher speed can be provided because a multiplexer can operate faster than a shift register in a given technology. Alternatively, the bit-error rate tester can provide multiple outputs for parallel testing, which is generally not available in currently available equipment .
The above-described embodiment is illustrative of the principles of the present invention. Other embodiments could be devised by those skilled in the art without departing from the spirit and scope of the present invention.

Claims

What is claimed is:
1. A generator of plural pseudo-random bit streams comprising a single maximal length linear feedback shift register having a plurality of cells; for each bit stream, means for modulo 2 combining the tapped outputs of selected ones of said cells to produce the bit stream, the cell outputs selected to be tapped and combined being determined so that the bit streams are separated by predetermined shifts.
2. A generator of plural pseudo-random bit streams in accordance with claim 1 further comprising means for low-pass filtering each of the plural bit streams to produce plural sources of essentially Gaussian noise.
3. A generator of plural pseudo-random bit streams comprising a single maximal length linear feedback shift register having a plurality of cells; for each bit stream, means for modulo 2 combining the tapped outputs of selected ones of said cells to produce the bit stream, the cell outputs selected to be tapped and combined being determined so that the maximum allowed shift variation between bits streams, the maximum and minimum allowed fan-in, and the maximum and minimum cell load are within predetermined limits.
4. A generator of plural pseudo-random bit streams in accordance with claim 3 further comprising means for low-pass filtering each of the plural bit streams to produce plural sources of essentially Gaussian noise.
5. A stochastic element for a neural network comprising a single maximal length linear feedback shift register having a plurality of cells; means for producing plural pseudo-random bit streams from said single shift register by modulo 2 combining for each bit stream the tapped outputs of selected ones of said cells, the cell outputs selected to be tapped and combined being determined so that the bit streams are separated by predetermined shifts.
6. A stochastic element for a neural network in accordance with claim 5 further comprising means for low-pass filtering each of the plural bit streams to produce plural sources of essentially Gaussian noise.
7. A stochastic element for a neural network comprising a single maximal length linear feedback shift register having a plurality of cells; means for producing plural pseudo-random bit streams from said single shift register by modulo 2 combining for each bit stream the tapped outputs of selected ones of said cells, the cell outputs selected to be tapped and combined being determined so that the maximum allowed shift variation between bits streams, the maximum and minimum allowed fan-in, and the maximum and minimum cell load are within predetermined limits.
8. A stochastic element for a neural network in accordance with claim 7 further comprising means for low-pass filtering each of the plural bit streams to produce plural sources of essentially Gaussian noise.
PCT/US1990/004407 1989-12-21 1990-08-07 Generator of multiple uncorrelated noise sources WO1991010182A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US45476389A 1989-12-21 1989-12-21
US454,763 1989-12-21

Publications (1)

Publication Number Publication Date
WO1991010182A1 true WO1991010182A1 (en) 1991-07-11

Family

ID=23805984

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1990/004407 WO1991010182A1 (en) 1989-12-21 1990-08-07 Generator of multiple uncorrelated noise sources

Country Status (1)

Country Link
WO (1) WO1991010182A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2353155A (en) * 1999-08-05 2001-02-14 Mitsubishi Electric Inf Tech A random binary signal generator with a narrowed autocorrelation function
EP1242859A1 (en) * 1999-11-23 2002-09-25 Mentor Graphics Corporation Method for synthesizing linear finite state machines
US7302624B2 (en) 2003-02-13 2007-11-27 Janusz Rajski Adaptive fault diagnosis of compressed test responses
US7370254B2 (en) 2003-02-13 2008-05-06 Janusz Rajski Compressing test responses using a compactor
US7437640B2 (en) 2003-02-13 2008-10-14 Janusz Rajski Fault diagnosis of compressed test responses having one or more unknown states
US7509550B2 (en) 2003-02-13 2009-03-24 Janusz Rajski Fault diagnosis of compressed test responses
US7818644B2 (en) 2006-02-17 2010-10-19 Janusz Rajski Multi-stage test response compactors
US9134370B2 (en) 1999-11-23 2015-09-15 Mentor Graphics Corporation Continuous application and decompression of test patterns and selective compaction of test responses
US9664739B2 (en) 1999-11-23 2017-05-30 Mentor Graphics Corporation Continuous application and decompression of test patterns and selective compaction of test responses
WO2020025253A1 (en) * 2018-07-31 2020-02-06 Telefonaktiebolaget Lm Ericsson (Publ) Structure, method, transmitter, transceiver and access point suitable for low-complexity implementation
WO2020025252A1 (en) * 2018-07-31 2020-02-06 Telefonaktiebolaget Lm Ericsson (Publ) Method, transmitter, structure, transceiver and access point for provision of multicarrier on-off keying signal
US11281431B2 (en) * 2019-01-24 2022-03-22 Fujitsu Limited Random number generating circuit and semiconductor apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3811038A (en) * 1971-09-15 1974-05-14 Int Computers Ltd Pseudo-random number generators
US3881099A (en) * 1972-12-15 1975-04-29 Lannionnais Electronique Pseudo-random binary sequence generator
US4325129A (en) * 1980-05-01 1982-04-13 Motorola Inc. Non-linear logic module for increasing complexity of bit sequences
US4748576A (en) * 1984-02-06 1988-05-31 U.S. Philips Corporation Pseudo-random binary sequence generators

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3811038A (en) * 1971-09-15 1974-05-14 Int Computers Ltd Pseudo-random number generators
US3881099A (en) * 1972-12-15 1975-04-29 Lannionnais Electronique Pseudo-random binary sequence generator
US4325129A (en) * 1980-05-01 1982-04-13 Motorola Inc. Non-linear logic module for increasing complexity of bit sequences
US4748576A (en) * 1984-02-06 1988-05-31 U.S. Philips Corporation Pseudo-random binary sequence generators

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7145933B1 (en) 1999-08-05 2006-12-05 Mitsubishi Denki Kabushiki Kaisha Method and apparatus for generating random signals
GB2353155A (en) * 1999-08-05 2001-02-14 Mitsubishi Electric Inf Tech A random binary signal generator with a narrowed autocorrelation function
EP2144134A1 (en) * 1999-11-23 2010-01-13 Mentor Graphics Corporation Method for synthesizing linear finite state machines
EP1242859A1 (en) * 1999-11-23 2002-09-25 Mentor Graphics Corporation Method for synthesizing linear finite state machines
EP1242859A4 (en) * 1999-11-23 2006-01-11 Mentor Graphics Corp Method for synthesizing linear finite state machines
US10234506B2 (en) 1999-11-23 2019-03-19 Mentor Graphics Corporation Continuous application and decompression of test patterns and selective compaction of test responses
US9664739B2 (en) 1999-11-23 2017-05-30 Mentor Graphics Corporation Continuous application and decompression of test patterns and selective compaction of test responses
US9134370B2 (en) 1999-11-23 2015-09-15 Mentor Graphics Corporation Continuous application and decompression of test patterns and selective compaction of test responses
US7509550B2 (en) 2003-02-13 2009-03-24 Janusz Rajski Fault diagnosis of compressed test responses
US7302624B2 (en) 2003-02-13 2007-11-27 Janusz Rajski Adaptive fault diagnosis of compressed test responses
US7743302B2 (en) 2003-02-13 2010-06-22 Janusz Rajski Compressing test responses using a compactor
US7370254B2 (en) 2003-02-13 2008-05-06 Janusz Rajski Compressing test responses using a compactor
US7437640B2 (en) 2003-02-13 2008-10-14 Janusz Rajski Fault diagnosis of compressed test responses having one or more unknown states
US8914694B2 (en) 2006-02-17 2014-12-16 Mentor Graphics Corporation On-chip comparison and response collection tools and techniques
US8418007B2 (en) 2006-02-17 2013-04-09 Mentor Graphics Corporation On-chip comparison and response collection tools and techniques
US9250287B2 (en) 2006-02-17 2016-02-02 Mentor Graphics Corporation On-chip comparison and response collection tools and techniques
US7913137B2 (en) 2006-02-17 2011-03-22 Mentor Graphics Corporation On-chip comparison and response collection tools and techniques
US9778316B2 (en) 2006-02-17 2017-10-03 Mentor Graphics Corporation Multi-stage test response compactors
US10120024B2 (en) 2006-02-17 2018-11-06 Mentor Graphics Corporation Multi-stage test response compactors
US7818644B2 (en) 2006-02-17 2010-10-19 Janusz Rajski Multi-stage test response compactors
WO2020025252A1 (en) * 2018-07-31 2020-02-06 Telefonaktiebolaget Lm Ericsson (Publ) Method, transmitter, structure, transceiver and access point for provision of multicarrier on-off keying signal
WO2020025253A1 (en) * 2018-07-31 2020-02-06 Telefonaktiebolaget Lm Ericsson (Publ) Structure, method, transmitter, transceiver and access point suitable for low-complexity implementation
KR20210028730A (en) * 2018-07-31 2021-03-12 텔레호낙티에볼라게트 엘엠 에릭슨(피유비엘) Methods, transmitters, structures, transceivers and access points for provisioning multicarrier on-off keying signals
JP2021532666A (en) * 2018-07-31 2021-11-25 テレフオンアクチーボラゲット エルエム エリクソン(パブル) Structures, methods, transmitters, transceivers and access points suitable for low complexity implementations
RU2761280C1 (en) * 2018-07-31 2021-12-06 Телефонактиеболагет Лм Эрикссон (Пабл) Structure, method, transmission device, receiver-transmission device and access point suitable for implementation with low difficulty
US11362869B2 (en) 2018-07-31 2022-06-14 Telefonaktiebolaget Lm Ericsson (Publ) Method, transmitter, structure, transceiver and access point for provision of multi-carrier on-off keying signal
US11398935B2 (en) 2018-07-31 2022-07-26 Telefonaktiebolaget Lm Ericsson (Publ) Structure, method, transmitter, transceiver and access point suitable for low-complexity implementation
KR102463555B1 (en) * 2018-07-31 2022-11-07 텔레호낙티에볼라게트 엘엠 에릭슨(피유비엘) Method, transmitter, structure, transceiver and access point for provision of multicarrier on-off keying signal
US11750425B2 (en) 2018-07-31 2023-09-05 Telefonaktiebolaget Lm Ericsson (Publ) Method, transmitter, structure, transceiver and access point for provision of multi-carrier on-off keying signal
EP4270173A3 (en) * 2018-07-31 2024-01-10 Telefonaktiebolaget LM Ericsson (publ) Method, transmitter, transceiver and access point for provision of multicarrier on-off keying signal
US11281431B2 (en) * 2019-01-24 2022-03-22 Fujitsu Limited Random number generating circuit and semiconductor apparatus

Similar Documents

Publication Publication Date Title
Parker et al. A VLSI-Efficient Technique for Generating Multiple Uncorrelated Noise Sources and Its Application to Stochastic
Hortensius et al. Parallel random number generation for VLSI systems using cellular automata
JP2598866B2 (en) Circuit for generating a controllable weighted binary sequence
Hortensius et al. Cellular automata-based pseudorandom number generators for built-in self-test
Serra et al. The analysis of one-dimensional linear cellular automata and their aliasing properties
US4691291A (en) Random sequence generators
Chattopadhyay et al. Highly regular, modular, and cascadable design of cellular automata-based pattern classifier
US8023649B2 (en) Method and apparatus for cellular automata based generation of pseudorandom sequences with controllable period
Tsalides et al. Pseudorandom number generators for VLSI systems based on linear cellular automata
EP1782181A1 (en) Method and apparatus for generating random data
WO1991010182A1 (en) Generator of multiple uncorrelated noise sources
WO2010034326A1 (en) State machine and generator for generating a description of a state machine feedback function
US6560727B1 (en) Bit error rate tester using fast parallel generation of linear recurring sequences
Baker et al. The hypergeometric distribution as a more accurate model for stochastic computing
US7263540B1 (en) Method of generating multiple random numbers
US6985918B2 (en) Random number generators implemented with cellular array
Kokolakis et al. Comparison between cellular automata and linear feedback shift registers based pseudo-random number generators
JP2940517B2 (en) Nonlinear feedback shift register circuit
Ackermann et al. Parallel random number generator for inexpensive configurable hardware cells
Goncu et al. Cellular automata with random memory and its implementations
US4998263A (en) Generation of trigger signals
Spencer Pseudorandom Bit Generators from Enhanced Cellular Automata.
Hortensius et al. Importance sampling for Ising computers using one-dimensional cellular automata
US6910057B2 (en) Truth table candidate reduction for cellular automata based random number generators
PV et al. Design and implementation of efficient stochastic number generator

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB IT LU NL SE