US20070157166A1 - System, method and software for static and dynamic programming and configuration of an adaptive computing architecture - Google Patents

System, method and software for static and dynamic programming and configuration of an adaptive computing architecture Download PDF

Info

Publication number
US20070157166A1
US20070157166A1 US11/707,301 US70730107A US2007157166A1 US 20070157166 A1 US20070157166 A1 US 20070157166A1 US 70730107 A US70730107 A US 70730107A US 2007157166 A1 US2007157166 A1 US 2007157166A1
Authority
US
United States
Prior art keywords
construct
data
module
task
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/707,301
Inventor
Cameron Stevens
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
QST Holdings LLC
Original Assignee
QST Holdings LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by QST Holdings LLC filed Critical QST Holdings LLC
Priority to US11/707,301 priority Critical patent/US20070157166A1/en
Publication of US20070157166A1 publication Critical patent/US20070157166A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture

Definitions

  • the present invention relates, in general, to programming of integrated circuits and systems for particular applications, and more particularly, to a system, method and software for static and dynamic programming and configuration of an adaptive computing integrated circuit architecture.
  • the related application discloses a new form or type of integrated circuit, referred to as an adaptive computing engine (“ACE”) or adaptive computing machine (“ACM”), which is readily reconfigurable, in real time, and is capable of having corresponding, multiple modes of operation.
  • ACE adaptive computing engine
  • ACM adaptive computing machine
  • the ACM is a new and innovative hardware platform suitable for digital signal processing, Telematics, and other applications where small hardware footprint, low power consumption and high performance characteristics are highly desirable.
  • the ACE architecture for adaptive or reconfigurable computing includes a plurality of different or heterogeneous computational elements coupled to an interconnection network.
  • the plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability.
  • the interconnection network is operative in real time to adapt (configure and reconfigure) the plurality of heterogeneous computational elements for a plurality of different functional modes, including linear algorithmic operations, non-linear algorithmic operations, finite state machine operations, memory operations, and bit-level manipulations.
  • interconnection network and other ACE hardware need to be configured and generally also reconfigured, either statically or dynamically, to perform any given application or algorithm.
  • the ACE architecture also utilizes a data flow model for processing. More particularly, input operand data will be processed to produce output data (without other intervention such as interrupt signals, instruction fetching, etc.), whenever the input data is available and an output port (register or buffer) is available for any resulting output data. Controlling the data flow processing to implement an algorithm, however, presents unusual difficulties, including for controlling data flow in the communication and control algorithms used in a wide variety of applications, such as wideband CDMA (“WCDMA”) and cdma2000.
  • WCDMA wideband CDMA
  • cdma2000 a wideband CDMA
  • the present invention provides a plurality of program constructs which enable the static or dynamic programming and configuration of an adaptive computing device, such as an ACE (ACM) having a plurality of heterogeneous nodes coupled through a matrix interconnect network.
  • ACM adaptive computing device
  • a first program construct such as a “module”, having a correspondence to a selected node of the plurality of heterogeneous nodes
  • a second program construct such as a “process”, having a correspondence to an executable task of the selected node, and having at least one firing condition capable of determining a commencement of the executable task of the selected node;
  • a third program construct such as an “inpipe”, having a correspondence to at least one input port coupling the selected node to the matrix interconnect network for input data to be consumed by the executable task;
  • a fourth program construct such as an “outpipe”, having a correspondence to at least one output port coupling the selected node to the matrix interconnect network for output data to be produced by the executable task;
  • a fifth program construct such as a “notify” routine, having a correspondence to a notification of creation of output data
  • a sixth program construct such as a “release” routine, having a correspondence to a notification of consumption of input data, such that the fifth program construct and the sixth program construct provide for synchronization of production of output data with consumption of input data
  • a seventh program construct such as a “ready” routine, having a correspondence to a task manager of the selected node to provide for commencement of the executable task, which also provides initialization of a producer count table of the task manager or a consumer count table of the task manager within the selected node;
  • an eighth program construct such as a “link” routine, linking the fourth program construct to the third program construct, the eighth program construct corresponding to a selected configuration of the matrix interconnection network providing a communication path from a selected output port to a selected input port.
  • FIG. 1 is a block diagram illustrating an exemplary first apparatus embodiment in accordance with the invention of the related application.
  • FIG. 2 is a schematic diagram illustrating an exemplary data flow graph.
  • FIG. 3 is a block diagram illustrating a reconfigurable matrix (or node), a plurality of computation units, and a plurality of computational elements.
  • FIG. 4 is a block diagram illustrating, in greater detail, a computational unit of a reconfigurable matrix.
  • FIGS. 5A through 5E are block diagrams illustrating, in detail, exemplary fixed and specific computational elements, forming computational units.
  • FIG. 6 is a block diagram illustrating, in detail, an exemplary multi-function adaptive computational unit having a plurality of different, fixed computational elements.
  • FIG. 7 is a block diagram illustrating, in detail, an adaptive logic processor computational unit having a plurality of fixed computational elements.
  • FIG. 8 is a block diagram illustrating, in greater detail, an exemplary core cell of an adaptive logic processor computational unit with a fixed computational element.
  • FIG. 9 is a block diagram illustrating, in greater detail, an exemplary fixed computational element of a core cell of an adaptive logic processor computational unit.
  • FIG. 10 is a block diagram illustrating a second exemplary apparatus embodiment in accordance with the invention of the related application.
  • FIG. 11 is a block diagram illustrating an exemplary first system embodiment in accordance with the invention of the related application.
  • FIG. 12 is a block diagram illustrating an exemplary node quadrant with routing elements.
  • FIG. 13 is a block diagram illustrating exemplary network interconnections.
  • FIG. 14 is a block diagram illustrating an exemplary data structure embodiment.
  • FIG. 15 is a block diagram illustrating an exemplary second system embodiment 1000 in accordance with the invention of the related application.
  • the present invention provides a system, method and software for programming and configuring an adaptive computing device such as an ACE 100 .
  • the present invention provides such a programming methodology using a series of unique constructs which are capable of being mapped directly to the hardware features of the ACE 100 and which are also capable of configuring the matrix interconnect network of the ACE 100 for, among other things, the routing of output data and input data.
  • the various program constructs of the present invention have additional features, such as providing synchronization among the various tasks which may be executed within the ACE 100 .
  • FIGS. 1 through 15 a background of an exemplary adaptive computing architecture is provided with reference to FIGS. 1 through 15 . Following this background discussion, the present invention is discussed in detail with reference to Examples 1 through 25.
  • FIG. 1 is a block diagram illustrating a first apparatus 100 embodiment in accordance with the invention of the related application.
  • the apparatus 100 referred to herein as an adaptive computing engine (“ACE”) 100 , is preferably embodied as an integrated circuit, or as a portion of an integrated circuit having other, additional components.
  • the ACE 100 includes one or more reconfigurable matrices (or nodes) 150 , such as matrices 150 A through 150 N as illustrated, and a matrix interconnection network 110 .
  • one or more of the matrices (nodes) 150 are configured for functionality as a controller 120
  • other matrices, such as matrices 150 C and 150 D are configured for functionality as a memory 140
  • the various matrices 150 and matrix interconnection network 110 may also be implemented together as fractal subunits, which may be scaled from a few nodes to thousands of nodes.
  • the ACE 100 does not utilize traditional (and typically separate) data, direct memory access (DMA), random access, configuration and instruction busses for signaling and other transmission between and among the reconfigurable matrices 150 , the controller 120 , and the memory 140 , or for other input/output (“I/O”) functionality. Rather, data, control and configuration information are transmitted between and among these matrix 150 elements, utilizing the matrix interconnection network 110 , which may be configured and reconfigured, in real time, to provide any given connection between and among the reconfigurable matrices 150 , including those matrices 150 configured as the controller 120 and the memory 140 , as discussed in greater detail below.
  • DMA direct memory access
  • I/O input/output
  • the matrices 150 configured to function as memory 140 may be implemented in any desired or preferred way, utilizing computational elements (discussed below) of fixed memory elements, and may be included within the ACE 100 or incorporated within another IC or portion of an IC.
  • the memory 140 is included within the ACE 100 , and preferably is comprised of computational elements which are low power consumption random access memory (RAM), but also may be comprised of computational elements of any other form of memory, such as flash, DRAM, SRAM, SDRAM, FRAM, MRAM, ROM, EPROM or E 2 PROM.
  • the memory 140 preferably includes DMA engines, not separately illustrated.
  • the controller 120 is preferably implemented, using matrices 150 A and 150 B configured as adaptive finite state machines, as a reduced instruction set (“RISC”) processor, controller or other device or IC capable of performing the two types of functionality discussed below. (Alternatively, these functions may be implemented utilizing a conventional RISC or other processor.)
  • the first control functionality referred to as “kernel” control, is illustrated as kernel controller (“KARC”) of matrix 150 A
  • matrix controller (“MARC”) is illustrated as matrix controller (“MARC”) of matrix 150 B.
  • the kernel and matrix control functions of the controller 120 are explained in greater detail below, with reference to the configurability and reconfigurability of the various matrices 150 , and with reference to the exemplary form of combined data, configuration and control information referred to herein as a “silverware” module.
  • the kernel controller is also referred to as a “K-node”, discussed in greater detail below with reference to FIGS. 10 and 11 .
  • the matrix interconnection network (“MIN”) 110 of FIG. 1 and its subset interconnection networks separately illustrated in FIGS. 3 and 4 (Boolean interconnection network 210 , data interconnection network 240 , and interconnect 220 ), individually, collectively and generally referred to herein as “interconnect”, “interconnection(s)” or “interconnection network(s)”, may be implemented generally as known in the art, such as utilizing FPGA interconnection networks or switching fabrics, albeit in a considerably more varied fashion.
  • the various interconnection networks are implemented as described, for example, in U.S. Pat. Nos. 5,218,240, 5,336,950, 5,245,227, and 5,144,166, and also as discussed below and as illustrated with reference to FIGS.
  • the various interconnection networks ( 110 , 210 , 240 and 220 ) provide selectable or switchable data, input, output, control and configuration paths, between and among the controller 120 , the memory 140 , the various matrices 150 , and the computational units 200 and computational elements 250 , in lieu of any form of traditional or separate input/output busses, data busses, DMA, RAM, configuration and instruction busses.
  • the various interconnection networks are implemented as described below with reference to FIGS. 12 and 13 , using various combinations of routing elements, such as token rings or arbiters, and multiplexers, at varying levels within the system and apparatus embodiments of the invention of the related application.
  • any given level of switching or selecting operation of or within the various interconnection networks ( 110 , 210 , 240 and 220 ) may be implemented as known in the art, the combinations of routing elements and multiplexing elements, the use of different routing elements and multiplexing elements at differing levels within the system, and the design and layout of the various interconnection networks ( 110 , 210 , 240 and 220 ), are new and novel, as discussed in greater detail below.
  • varying levels of interconnection are provided to correspond to the varying levels of the matrices 150 , the computational units 200 , and the computational elements 250 , discussed below.
  • the matrix interconnection network 110 is considerably more limited and less “rich”, with lesser connection capability in a given area, to reduce capacitance and increase speed of operation.
  • the interconnection network ( 210 , 220 and 240 ) may be considerably more dense and rich, to provide greater adaptation and reconfiguration capability within a narrow or close locality of reference.
  • the various matrices or nodes 150 are reconfigurable and heterogeneous, namely, in general, and depending upon the desired configuration: reconfigurable matrix 150 A is generally different from reconfigurable matrices 150 B through 150 N; reconfigurable matrix 150 B is generally different from reconfigurable matrices 150 A and 150 C through 150 N; reconfigurable matrix 150 C is generally different from reconfigurable matrices 150 A, 150 B and 150 D through 150 N, and so on.
  • the various reconfigurable matrices 150 each generally contain a different or varied mix of adaptive and reconfigurable computational (or computation) units ( 200 ); the computational units 200 , in turn, generally contain a different or varied mix of fixed, application specific computational elements ( 250 ), discussed in greater detail below with reference to FIGS.
  • the various matrices 150 may be connected, configured and reconfigured at a higher level, with respect to each of the other matrices 150 , through the matrix interconnection network 110 , also as discussed in greater detail below.
  • the first novel concepts concern the adaptive and reconfigurable use of application specific, dedicated or fixed hardware units (computational elements 250 ), and the selection of particular functions for acceleration, to be included within these application specific, dedicated or fixed hardware units (computational elements 250 ) within the computational units 200 ( FIG. 3 ) of the matrices 150 , such as pluralities of multipliers, complex multipliers, and adders, each of which are designed for optimal execution of corresponding multiplication, complex multiplication, and addition functions.
  • the functions for acceleration are selected based upon power consumption. For example, for a given application such as mobile communication, corresponding C (C# or C++) or other code may be analyzed for power consumption.
  • Such empirical analysis may reveal, for example, that a small portion of such code, such as 10%, actually consumes 90% of the operating power when executed.
  • this small portion of code is selected for acceleration within certain types of the reconfigurable matrices 150 , with the remaining code, for example, adapted to run within matrices 150 configured as controller 120 .
  • Additional code may also be selected for acceleration, resulting in an optimization of power consumption by the ACE 100 , up to any potential trade-off resulting from design or operational complexity.
  • other functionality such as control code, may be accelerated within matrices 150 when configured as finite state machines.
  • the ACE 100 utilizes a data flow model for all processes and computations.
  • Algorithms or other functions selected for acceleration may be converted into a form which may be represented as a “data flow graph” (“DFG”).
  • DFG data flow graph
  • FIG. 2 A schematic diagram of an exemplary data flow graph is illustrated in FIG. 2 .
  • an algorithm or function useful for CDMA voice coding (QCELP (Qualcomm code excited linear prediction)) is implemented utilizing four multipliers 190 followed by four adders 195 .
  • the algorithms of this data flow graph are then implemented, at any given time, through the configuration and reconfiguration of fixed computational elements ( 250 ), namely, implemented within hardware which has been optimized and configured for efficiency, i.e., a “machine” is configured in real time which is optimized to perform the particular algorithm.
  • fixed computational elements 250 namely, implemented within hardware which has been optimized and configured for efficiency
  • a “machine” is configured in real time which is optimized to perform the particular algorithm.
  • four fixed or dedicated multipliers, as computational elements 250 , and four fixed or dedicated adders, also as different computational elements 250 are configured in real time through the interconnect to perform the functions or algorithms of the particular DFG.
  • data which is produced, such as by the multipliers 190 is immediately consumed, such as by adders 195 .
  • different computational elements ( 250 ) are implemented directly as correspondingly different fixed (or dedicated) application specific hardware, such as dedicated multipliers, complex multipliers, accumulators, arithmetic logic units (ALUs), registers, and adders.
  • interconnect ( 210 and 220 ) these differing, heterogeneous computational elements ( 250 ) may then be adaptively configured, in real time, to perform the selected algorithm, such as the performance of discrete cosine transformations often utilized in mobile communications.
  • four multipliers and four adders will be configured, i.e., connected in real time, to perform the particular algorithm.
  • heterogeneous computational elements are configured and reconfigured, at any given time, to optimally perform a given algorithm or other function.
  • a given instantiation or configuration of computational elements may also remain in place over time, i.e., unchanged, throughout the course of such repetitive calculations.
  • the temporal nature of the ACE 100 architecture should also be noted.
  • a particular configuration may exist within the ACE 100 which has been optimized to perform a given function or implement a particular algorithm.
  • the configuration may be changed, to interconnect other computational elements ( 250 ) or connect the same computational elements 250 differently, for the performance of another function or algorithm.
  • Two important features arise from this temporal reconfigurability.
  • algorithms may change over time to, for example, implement a new technology standard, the ACE 100 may co-evolve and be reconfigured to implement the new algorithm. For a simplified example, a fifth multiplier and a fifth adder may be incorporated into the DFG of FIG.
  • This temporal reconfigurability of computational elements 250 also illustrates a conceptual distinction utilized herein between adaptation (configuration and reconfiguration), on the one hand, and programming or reprogrammability, on the other hand.
  • Typical programmability utilizes a pre-existing group or set of functions, which may be called in various orders, over time, to implement a particular algorithm.
  • configurability and reconfigurability (or adaptation) includes the additional capability of adding or creating new functions which were previously unavailable or non-existent.
  • the present and related inventions also utilize a tight coupling (or interdigitation) of data and configuration (or other control) information, within one, effectively continuous stream of information.
  • This coupling or commingling of data and configuration information referred to as a “silverware” module, is the subject of a separate, related patent application.
  • this coupling of data and configuration information into one information (or bit) stream helps to enable real time reconfigurability of the ACE 100 , without a need for the (often unused) multiple, overlaying networks of hardware interconnections of the prior art.
  • a particular, first configuration of computational elements at a particular, first period of time as the hardware to execute a corresponding algorithm during or after that first period of time, may be viewed or conceptualized as a hardware analog of “calling” a subroutine in software which may perform the same algorithm.
  • the configuration of the computational elements 250 has occurred (i.e., is in place), as directed by the configuration information, the data for use in the algorithm is immediately available as part of the silverware module.
  • the same computational elements may then be reconfigured for a second period of time, as directed by second configuration information, for execution of a second, different algorithm, also utilizing immediately available data.
  • the immediacy of the data, for use in the configured computational elements 250 provides a one or two clock cycle hardware analog to the multiple and separate software steps of determining a memory address and fetching stored data from the addressed registers. This has the further result of additional efficiency, as the configured computational elements may execute, in comparatively few clock cycles, an algorithm which may require orders of magnitude more clock cycles for execution if called as a subroutine in a conventional microprocessor or DSP.
  • This use of silverware modules, as a commingling of data and configuration information, in conjunction with the real time reconfigurability of a plurality of heterogeneous and fixed computational elements 250 to form adaptive, different and heterogeneous computation units 200 and matrices 150 , enables the ACE 100 architecture to have multiple and different modes of operation.
  • the ACE 100 may have various and different operating modes as a cellular or other mobile telephone, a music player, a pager, a personal digital assistant, and other new or existing functionalities.
  • these operating modes may change based upon the physical location of the device; for example, when configured as a CDMA mobile telephone for use in the United States, the ACE 100 may be reconfigured as a GSM mobile telephone for use in Europe.
  • the functions of the controller 120 may be explained: (1) with reference to a silverware module, namely, the tight coupling of data and configuration information within a single stream of information; (2) with reference to multiple potential modes of operation; (3) with reference to the reconfigurable matrices 150 ; and (4) with reference to the reconfigurable computation units 200 and the computational elements 150 illustrated in FIG. 3 .
  • the ACE 100 may be configured or reconfigured to perform a new or additional function, such as an upgrade to a new technology standard or the addition of an entirely new function, such as the addition of a music function to a mobile communication device.
  • Such a silverware module may be stored in the matrices 150 of memory 140 , or may be input from an external (wired or wireless) source through, for example, matrix interconnection network 110 .
  • one of the plurality of matrices 150 is configured to decrypt such a module and verify its validity, for security purposes.
  • the controller 120 through the matrix (KARC) 150 A, checks and verifies that the configuration or reconfiguration may occur without adversely affecting any pre-existing functionality, such as whether the addition of music functionality would adversely affect pre-existing mobile communications functionality.
  • the system requirements for such configuration or reconfiguration are included within the silverware module, for use by the matrix (KARC) 150 A in performing this evaluative function. If the configuration or reconfiguration may occur without such adverse affects, the silverware module is allowed to load into the matrices 150 of memory 140 , with the matrix (KARC) 150 A setting up the DMA engines within the matrices 150 C and 150 D of the memory 140 (or other stand-alone DMA engines of a conventional memory). If the configuration or reconfiguration would or may have such adverse affects, the matrix (KARC) 150 A does not allow the new module to be incorporated within the ACE 100 . Additional functions of the kernel controller, as a K-node, are discussed in greater detail below.
  • the matrix (MARC) 150 B manages the scheduling of matrix 150 resources and the timing of any corresponding data, to synchronize any configuration or reconfiguration of the various computational elements 250 and computation units 200 with any corresponding input data and output data.
  • timing information is also included within a silverware module, to allow the matrix (MARC) 150 B through the various interconnection networks to direct a reconfiguration of the various matrices 150 in time, and preferably just in time, for the reconfiguration to occur before corresponding data has appeared at any inputs of the various reconfigured computation units 200 .
  • the matrix (MARC) 150 B may also perform any residual processing which has not been accelerated within any of the various matrices 150 .
  • the matrix (MARC) 150 B may be viewed as a control unit which “calls” the configurations and reconfigurations of the matrices 150 , computation units 200 and computational elements 250 , in real time, in synchronization with any corresponding data to be utilized by these various reconfigurable hardware units, and which performs any residual or other control processing.
  • Other matrices 150 may also include this control functionality, with any given matrix 150 capable of calling and controlling a configuration and reconfiguration of other matrices 150 .
  • This matrix control functionality may also be combined with kernel control, such as in the K-node, discussed below.
  • FIG. 3 is a block diagram illustrating, in greater detail, a reconfigurable matrix (or node) 150 with a plurality of computation units 200 (illustrated as computation units 200 A through 200 N), and a plurality of computational elements 250 (illustrated as computational elements 250 A through 250 Z), and provides additional illustration of the exemplary types of computational elements 250 and a useful summary.
  • any matrix 150 generally includes a matrix controller 230 , a plurality of computation (or computational) units 200 , and as logical or conceptual subsets or portions of the matrix interconnect network 110 , a data interconnect network 240 and a Boolean interconnect network 210 .
  • the matrix controller 230 may also be implemented as a hardware task manager, discussed below with reference to FIG.
  • the Boolean interconnect network 210 provides the reconfiguration and data interconnection capability between and among the various computation units 200 , and is preferably small (i.e., only a few bits wide), while the data interconnect network 240 provides the reconfiguration and data interconnection capability for data input and output between and among the various computation units 200 , and is preferably comparatively large (i.e., many bits wide).
  • any given physical portion of the matrix interconnection network 110 may be operating as either the Boolean interconnect network 210 , the data interconnect network 240 , the lowest level interconnect 220 (between and among the various computational elements 250 ), or other input, output, or connection functionality. It should also be noted that other, exemplary forms of interconnect are discussed in greater detail below with reference to FIGS. 11-13 .
  • a computation unit 200 included within a computation unit 200 are a plurality of computational elements 250 , illustrated as computational elements 250 A through 250 Z (individually and collectively referred to as computational elements 250 ), and additional interconnect 220 .
  • the interconnect 220 provides the reconfigurable interconnection capability and input/output paths between and among the various computational elements 250 .
  • each of the various computational elements 250 consist of dedicated, application specific hardware designed to perform a given task or range of tasks, resulting in a plurality of different, fixed computational elements 250 .
  • the fixed computational elements 250 may be reconfigurably connected together into adaptive and varied computational units 200 , which also may be further reconfigured and interconnected, to execute an algorithm or other function, at any given time, such as the quadruple multiplications and additions of the DFG of FIG. 2 , utilizing the interconnect 220 , the Boolean network 210 , and the matrix interconnection network 110 .
  • the inputs/outputs of a computational element 250 may be coupled to outputs/inputs of a first set of (other) computational elements 250 , for performance of a first function or algorithm, and subsequently adapted or reconfigured, such that these inputs/outputs are coupled to outputs/inputs of a second set of (other) computational elements 250 , for performance of a second function or algorithm.
  • the various computational elements 250 are designed and grouped together, into the various adaptive and reconfigurable computation units 200 (as illustrated, for example, in FIGS. 5A through 9 ).
  • computational elements 250 which are designed to execute a particular algorithm or function, such as multiplication or addition
  • other types of computational elements 250 are also utilized in the first apparatus embodiment.
  • computational elements 250 A and 250 B implement memory, to provide local memory elements for any given calculation or processing function (compared to the more “remote” memory 140 ).
  • computational elements 250 I, 250 J, 250 K and 250 L are configured to implement finite state machines (using, for example, the computational elements illustrated in FIGS. 7, 8 and 9 ), to provide local processing capability (compared to the more “remote” matrix (MARC) 150 B), especially suitable for complicated control processing, and which may be utilized within the hardware task manager, discussed below.
  • MMC remote processing matrix
  • a first category of computation units 200 includes computational elements 250 performing linear operations, such as multiplication, addition, finite impulse response filtering, and so on (as illustrated below, for example, with reference to FIGS. 5A through 5E and FIG. 6 ).
  • a second category of computation units 200 includes computational elements 250 performing non-linear operations, such as discrete cosine transformation, trigonometric calculations, and complex multiplications.
  • a third type of computation unit 200 implements a finite state machine, such as computation unit 200 C as illustrated in FIG. 3 and as illustrated in greater detail below with respect to FIGS.
  • computation unit 200 A may be included to perform bit-level manipulation, such as for encryption, decryption, channel coding, Viterbi decoding, and packet and protocol processing (such as Internet Protocol processing).
  • a matrix controller 230 may also be included within any given matrix 150 , also to provide greater locality of reference and control of any reconfiguration processes and any corresponding data manipulations. For example, once a reconfiguration of computational elements 250 has occurred within any given computation unit 200 , the matrix controller 230 may direct that that particular instantiation (or configuration) remain intact for a certain period of time to, for example, continue repetitive data processing for a given application.
  • the plurality of heterogeneous computational elements 250 may be configured and reconfigured, through the levels of the interconnect network ( 110 , 210 , 220 , 240 ), for performance of a plurality of functional or operational modes, such as linear operations, non-linear operations, finite state machine operations, memory and memory management, and bit-level manipulation.
  • This configuration and reconfiguration of the plurality of heterogeneous computational elements 250 through the levels of the interconnect network ( 110 , 210 , 220 , 240 ) may be conceptualized on another, higher or more abstract level, namely, configuration and reconfiguration for the performance of a plurality of algorithmic elements.
  • the performance of any one of the algorithmic elements may be considered to require a simultaneous performance of a plurality of the lower-level functions or operations, such as move, input, output, add, subtract, multiply, complex multiply, divide, shift, multiply and accumulate, and so on, using a configuration (and reconfiguration) of computational elements having a plurality of fixed architectures such as memory, addition, multiplication, complex multiplication, subtraction, synchronization, queuing, over sampling, under sampling, adaptation, configuration, reconfiguration, control, input, output, and field programmability.
  • a configuration (and reconfiguration) of computational elements having a plurality of fixed architectures such as memory, addition, multiplication, complex multiplication, subtraction, synchronization, queuing, over sampling, under sampling, adaptation, configuration, reconfiguration, control, input, output, and field programmability.
  • the algorithmic elements may be selected from a plurality of algorithmic elements comprising, for example: a radix-2 Fast Fourier Transformation (FFT), a radix-4 Fast Fourier Transformation (FFT), a radix-2 inverse Fast Fourier Transformation (IFFN), a radix IFFT, a one-dimensional Discrete Cosine Transformation (DCT), a multi-dimensional Discrete Cosine Transformation (DCT), finite impulse response (FIR) filtering, convolutional encoding, scrambling, puncturing, interleaving, modulation mapping, Golay correlation, OVSF code generation, Haddamard Transformation, Turbo Decoding, bit correlation, Gdiffiths LMS algorithm, variable length encoding, uplink scrambling code generation, downlink scrambling code generation, downlink despreading, uplink spreading, up
  • one or more of the matrices (or nodes) 150 may be designed to be application specific, having a fixed architecture with a corresponding fixed function (or predetermined application), rather than being comprised of a plurality of heterogeneous computational elements which may be configured and reconfigured for performance of a plurality of operations, functions, or algorithmic elements.
  • an analog-to-digital (A/D) or digital-to-analog (D/A) converter may be implemented without adaptive capability.
  • common node (matrix) functions also may be implemented without adaptive capability, such as the node wrapper functions discussed below. Under various circumstances, however, the fixed function node may be capable of parameter adjustment for performance of the predetermined application.
  • the parameter adjustment may comprise changing one or more of the following parameters: a number of filter coefficients, a number of parallel input bits, a number of parallel output bits, a number of selected points for Fast Fourier Transformation, a number of bits of precision, a code rate, a number of bits of interpolation of a trigonometric function, and real or complex number valuation.
  • This fixed function node (or matrix) 150 which may be parametizable, will typically be utilized in circumstances where an algorithmic element is used on a virtually continuous basis, such as in certain types of communications or computing applications.
  • the fixed function node 150 may be a microprocessor (such as a RISC processor), a digital signal processor (DSP), a co-processor, a parallel processor, a controller, a microcontroller, a finite state machine, and so on (with the term “processor” utilized herein to individually or collectively refer, generally and inclusively, to any of the types of processors mentioned above and their equivalents), and may or may not have an embedded operating system.
  • a microprocessor such as a RISC processor
  • DSP digital signal processor
  • co-processor such as a parallel processor, a controller, a microcontroller, a finite state machine, and so on
  • processor utilized herein to individually or collectively refer, generally and inclusively, to any of the types of processors mentioned above and their equivalents
  • Such a controller or processor fixed function node 150 may be utilized for the various KARC 150 A or MARC 150 B applications mentioned above, such as providing configuration information to the interconnection network, directing and scheduling the configuration of the plurality of heterogeneous computational elements 250 of the other nodes 150 for performance of the various functional modes or algorithmic elements, or timing and scheduling the configuration and reconfiguration of the plurality of heterogeneous computational elements with corresponding data.
  • the fixed function node may be a cascaded integrated comb (CIC) filter or a parameterized, cascaded integrated comb (CIC) filter; a finite impulse response (FIR) filter or a finite impulse response (FIR) filter parameterized for variable filter length; or an A/D or D/A converter.
  • FIG. 4 is a block diagram illustrating, in greater detail, an exemplary or representative computation unit 200 of a reconfigurable matrix 150 .
  • a computation unit 200 typically includes a plurality of diverse, heterogeneous and fixed computational elements 250 , such as a plurality of memory computational elements 250 A and 250 B, and forming a computational unit (“CU”) core 260 , a plurality of algorithmic or finite state machine computational elements 250 C through 250 K.
  • each computational element 250 , of the plurality of diverse computational elements 250 is a fixed or dedicated, application specific circuit, designed and having a corresponding logic gate layout to perform a specific function or algorithm, such as addition or multiplication.
  • the various memory computational elements 250 A and 250 B may be implemented with various bit depths, such as RAM (having significant depth), or as a register, having a depth of 1 or 2 bits.
  • the exemplary computation unit 200 also includes a plurality of input multiplexers 280 , a plurality of input lines (or wires) 281 , and for the output of the CU core 260 (illustrated as line or wire 270 ), a plurality of output demultiplexers 285 and 290 , and a plurality of output lines (or wires) 291 .
  • an appropriate input line 281 may be selected for input use in data transformation and in the configuration and interconnection processes, and through the output demultiplexers 285 and 290 , an output or multiple outputs may be placed on a selected output line 291 , also for use in additional data transformation and in the configuration and interconnection processes.
  • the selection of various input and output lines 281 and 291 , and the creation of various connections through the interconnect ( 210 , 220 and 240 ), is under control of control bits 265 from a computational unit controller 255 , as discussed below. Based upon these control bits 265 , any of the various input enables 251 , input selects 252 , output selects 253 , MUX selects 254 , DEMUX enables 256 , DEMUX selects 257 , and DEMUX output selects 258 , may be activated or deactivated.
  • the exemplary computation unit 200 includes the computation unit controller 255 which provides control, through control bits 265 , over what each computational element 250 , interconnect ( 210 , 220 and 240 ), and other elements (above) does with every clock cycle. Not separately illustrated, through the interconnect ( 210 , 220 and 240 ), the various control bits 265 are distributed, as may be needed, to the various portions of the computation unit 200 , such as the various input enables 251 , input selects 252 , output selects 253 , MUX selects 254 , DEMUX enables 256 , DEMUX selects 257 , and DEMUX output selects 258 .
  • the CU controller 255 also includes one or more lines 295 for reception of control (or configuration) information and transmission of status information.
  • the interconnect may include a conceptual division into a data interconnect network 240 and a Boolean interconnect network 210 , of varying bit widths, as mentioned above.
  • the (wider) data interconnection network 240 is utilized for creating configurable and reconfigurable connections, for corresponding routing of data and configuration information.
  • the (narrower) Boolean interconnect network 210 while also utilized for creating configurable and reconfigurable connections, is utilized for control of logic (or Boolean) decisions of the various data flow graphs, generating decision nodes in such DFGs, and may also be used for data routing within such DFGs.
  • FIGS. 5A through 5E are block diagrams illustrating, in detail, exemplary fixed and specific computational elements, forming computational units. As will be apparent from review of these Figures, many of the same fixed computational elements are utilized, with varying configurations, for the performance of different algorithms.
  • FIG. 5A is a block diagram illustrating a four-point asymmetric finite impulse response (FIR) filter computational unit 300 .
  • this exemplary computational unit 300 includes a particular, first configuration of a plurality of fixed computational elements, including coefficient memory 305 , data memory 310 , registers 315 , 320 and 325 , multiplier 330 , adder 335 , and accumulator registers 340 , 345 , 350 and 355 , with multiplexers (MUXes) 360 and 365 forming a portion of the interconnection network ( 210 , 220 and 240 ).
  • MUXes multiplexers
  • FIG. 5B is a block diagram illustrating a two-point symmetric finite impulse response (FIR) filter computational unit 370 .
  • this exemplary computational unit 370 includes a second configuration of a plurality of fixed computational elements, including coefficient memory 305 , data memory 310 , registers 315 , 320 and 325 , multiplier 330 , adder 335 , second adder 375 , and accumulator registers 340 and 345 , also with multiplexers (MUXes) 360 and 365 forming a portion of the interconnection network ( 210 , 220 and 240 ).
  • MUXes multiplexers
  • FIG. 5C is a block diagram illustrating a subunit for a fast Fourier transform (FFT) computational unit 400 .
  • this exemplary computational unit 400 includes a third configuration of a plurality of fixed computational elements, including coefficient memory 305 , data memory 310 , registers 315 , 320 , 325 and 385 , multiplier 330 , adder 335 , and adder/subtracter 380 , with multiplexers (MUXes) 360 , 365 , 390 , 395 and 405 forming a portion of the interconnection network ( 210 , 220 and 240 ).
  • MUXes multiplexers
  • FIG. 5D is a block diagram illustrating a complex finite impulse response (FIR) filter computational unit 440 .
  • this exemplary computational unit 440 includes a fourth configuration of a plurality of fixed computational elements, including memory 410 , registers 315 and 320 , multiplier 330 , adder/subtracter 380 , and real and imaginary accumulator registers 415 and 420 , also with multiplexers (MUXes) 360 and 365 forming a portion of the interconnection network ( 210 , 220 and 240 ).
  • MUXes multiplexers
  • FIG. 5E is a block diagram illustrating a biquad infinite impulse response (FIR) filter computational unit 450 , with a corresponding data flow graph 460 .
  • this exemplary computational unit 450 includes a fifth configuration of a plurality of fixed computational elements, including coefficient memory 305 , input memory 490 , registers 470 , 475 , 480 and 485 , multiplier 330 , and adder 335 , with multiplexers (MUXes) 360 , 365 , 390 and 395 forming a portion of the interconnection network ( 210 , 220 and 240 ).
  • MUXes multiplexers
  • FIG. 6 is a block diagram illustrating, in detail, an exemplary multi-function adaptive computational unit 500 having a plurality of different, fixed computational elements.
  • the adaptive computation unit 500 performs each of the various functions previously illustrated with reference to FIGS. 5A though 5 E, plus other functions such as discrete cosine transformation.
  • this multi-function adaptive computational unit 500 includes capability for a plurality of configurations of a plurality of fixed computational elements, including input memory 520 , data memory 525 , registers 530 (illustrated as registers 530 A through 530 Q), multipliers 540 (illustrated as multipliers 540 A through 540 D), adder 545 , first arithmetic logic unit (ALU) 550 (illustrated as ALU_ 1 s 550 A through 550 D), second arithmetic logic unit (ALU) 555 (illustrated as ALU_ 2 s 555 A through 555 D), and pipeline (length 1) register 560 , with inputs 505 , lines 515 , outputs 570 , and multiplexers (MUXes or MXes) 510 (illustrates as MUXes and MXes 510 A through 510 KK) forming an interconnection network ( 210 , 220 and 240 ).
  • the two different ALUs 550 and 555 are
  • FIG. 7 is a block diagram illustrating, in detail, an exemplary adaptive logic processor (ALP) computational unit 600 having a plurality of fixed computational elements.
  • the ALP 600 is highly adaptable, and is preferably utilized for input/output configuration, finite state machine implementation, general field programmability, and bit manipulation.
  • the fixed computational element of ALP 600 is a portion ( 650 ) of each of the plurality of adaptive core cells (CCs) 610 ( FIG. 8 ), as separately illustrated in FIG. 9 .
  • An interconnection network ( 210 , 220 and 240 ) is formed from various combinations and permutations of the pluralities of vertical inputs (VIs) 615 , vertical repeaters (VRs) 620 , vertical outputs (VOs) 625 , horizontal repeaters (HRs) 630 , horizontal terminators (HTs) 635 , and horizontal controllers (HCs) 640 .
  • FIG. 8 is a block diagram illustrating, in greater detail, an exemplary core cell 610 of an adaptive logic processor computational unit 600 with a fixed computational element 650 .
  • the fixed computational element is a 3 input-2 output function generator 550 , separately illustrated in FIG. 9 .
  • the preferred core cell 610 also includes control logic 655 , control inputs 665 , control outputs 670 (providing output interconnect), output 675 , and inputs (with interconnect muxes) 660 (providing input interconnect).
  • FIG. 9 is a block diagram illustrating, in greater detail, an exemplary fixed computational element 650 of a core cell 610 of an adaptive logic processor computational unit 600 .
  • the fixed computational element 650 is comprised of a fixed layout of pluralities of exclusive NOR (XNOR) gates 680 , NOR gates 685 , NAND gates 690 , and exclusive OR (XOR) gates 695 , with three inputs 720 and two outputs 710 . Configuration and interconnection is provided through MUX 705 and interconnect inputs 730 .
  • FIG. 10 is a block diagram illustrating a prototypical node or matrix 800 comprising the second apparatus embodiment of the invention of the related application. The node 800 is connected to other nodes 150 within the ACE 100 through the matrix interconnection network 110 .
  • the prototypical node 800 includes a fixed (and non-reconfigurable) “node wrapper”, an adaptive (reconfigurable) execution unit 840 , and a memory 845 (which also may be variable).
  • This fixed and non-reconfigurable “node wrapper” includes an input pipeline register 815 , a data decoder and distributor 820 , a hardware task manager 810 , an address register 825 (optional), a DMA engine 830 (optional), a data aggregator and selector 850 , and an output pipeline register 855 .
  • These components comprising the node wrapper are generally common to all nodes of the ACE 100 , and are comprised of fixed architectures (i.e., application-specific or non-reconfigurable architectures).
  • the node or matrix 800 is a unique blend of fixed, non-reconfigurable node wrapper components, memory, and the reconfigurable components of an adaptive execution unit 840 (which, in turn, are comprised of fixed computational elements and an interconnection network).
  • Various nodes 800 in general, will have a distinctive and variably-sized adaptive execution unit 840 , tailored for one or more particular applications or algorithms, and a memory 845 , also implemented in various sizes depending upon the requirements of the adaptive execution unit 840 .
  • An adaptive execution unit 840 for a given node 800 will generally be different than the adaptive execution units 840 of the other nodes 800 .
  • Each adaptive execution unit 840 is reconfigurable in response to configuration information, and is comprised of a plurality of computation units 200 , which are in turn further comprised of a plurality of computational elements 250 , and corresponding interconnect networks 210 , 220 and 240 .
  • Particular adaptive execution units 840 utilized in exemplary embodiments, and the operation of the node 800 and node wrapper, are discussed in greater detail below.
  • FIG. 11 is a block diagram illustrating a first system embodiment 900 in accordance with the invention of the related application.
  • This first system 900 may be included as part of a larger system or host environment, such as within a computer or communications device, for example.
  • FIG. 11 illustrates a “root” level of such a system 100 , where global resources have connectivity (or otherwise may be found).
  • the first system 900 includes one or more adaptive cores 950 , external (off-IC or off-chip) memory 905 (such as SDRAM), host (system) input and output connections, and network (MIN 110 ) input and output connections (for additional adaptive cores 950 ).
  • external (off-IC or off-chip) memory 905 such as SDRAM
  • MIN 110 network input and output connections
  • Each adaptive core 950 includes (on-IC or on-chip) memory 920 , a “K-node” 925 , and one or more sets of nodes ( 150 , 800 ) referred to as a node quadrant 930 .
  • the K-node 925 (like the kernel controller 150 A) provides an operating system for the adaptive core 950 .
  • each node quadrant 930 consists of 16 nodes in a scalable by-four ( ⁇ 4) fractal arrangement.
  • each of these (seven) illustrated elements has total connectivity with all other (six) elements.
  • the output of a root-level element is provided to (and may drive) all other root-level inputs, and the input of each root-level input is provided with the outputs of all other root-level elements.
  • the MIN 110 includes a network with routing (or switching) elements ( 935 ), such as round-robin, token ring, cross point switches, or other arbiter elements, and a network (or path) for real time data transfer (or transmission) (such as a data network 240 ).
  • routing elements such as round-robin, token ring, cross point switches, or other arbiter elements
  • a network (or path) for real time data transfer (or transmission) such as a data network 240 .
  • FIG. 12 is a block diagram illustrating an exemplary node quadrant 930 with routing elements 935 .
  • the node quadrant 930 has a tree topology and consists of 16 nodes ( 150 or 800 ), with every four nodes connected as a node “quad” 940 having a routing (or switching) element 935 .
  • the routing elements may be implemented variously, such as through round-robin, token ring, cross point switches, (four-way) switching, (1/4, 1/3 or 1/2) arbitration or other arbiter or arbitration elements, or depending upon the degree of control overhead which may be tolerable, through other routing or switching elements such as multiplexers and demultiplexers.
  • This by-four fractal architecture provides for routing capability, scalability, and expansion, without logical limitation.
  • the node quadrant 930 is coupled within the first system 900 at the root-level, as illustrated.
  • This by-four fractal architecture also provides for significant and complete connectivity, with the worst-case distance between any node being log 4 of “k” hops (or number of nodes) (rather than a linear distance), and provides for avoiding the overhead and capacitance of, for example, busses or full crossbar switches.
  • the node quadrant 930 and node quad 940 structures exhibit a fractal self-similarity with regard to scalability, repeating structures, and expansion.
  • the node quadrant 930 and node quad 940 structures also exhibit a fractal self-similarity with regard to a heterogeneity of the plurality of heterogeneous and reconfigurable nodes 800 , heterogeneity of the plurality of heterogeneous computation units 200 , and heterogeneity of the plurality of heterogeneous computational elements 250 .
  • the adaptive computing integrated circuit 900 exhibits increasing heterogeneity from a first level of the plurality of heterogeneous and reconfigurable matrices, to a second level of the plurality of heterogeneous computation units, and further to a third level of the plurality of heterogeneous computational elements.
  • the plurality of interconnection levels also exhibits a fractal self-similarity with regard to each interconnection level of the plurality of interconnection levels.
  • the interconnection network is increasingly rich, providing an increasing amount of bandwidth and an increasing number of connections or connectability for a correspondingly increased level of reconfigurability.
  • the matrix-level interconnection network, the computation unit-level interconnection network, and the computational element-level interconnection network also constitute a fractal arrangement.
  • the system embodiment 900 utilizes point-to-point service for streaming data and configuration information transfer, using a data packet (or data structure) discussed below.
  • a packet-switched protocol is utilized for this communication, and in an exemplary embodiment the packet length is limited to a length of 51 bits, with a one word (32 bits) data payload, to obviate any need for data buffering.
  • the routing information within the data packet provides for selecting the particular adaptive core 950 , followed by selecting root-level (or not) of the selected adaptive core 950 , followed by selecting a particular node ( 110 or 800 ) of the selected adaptive core 950 . This selection path may be visualized by following the illustrated connections of FIGS. 11 and 12 . Routing of data packets out of a particular node may be performed similarly, or may be provided more directly, such as by switching or arbitrating within a node 800 or quad 940 , as discussed below.
  • FIG. 13 is a block diagram illustrating exemplary network interconnections into and out of nodes 800 and node quads 940 .
  • MIN 100 connections into a node, via a routing element 935 include a common input 945 (provided to all four nodes 800 within a quad 940 ), and inputs from the other (three) “peer” nodes within the particular quad 940 .
  • the routing element 935 may be implemented, for example, as a round-robin, token ring, arbiter, cross point switch, or other four-way switching element.
  • the output from the routing element 935 is provided to a multiplexer 955 (or other switching element) for the corresponding node 800 , along with a feedback input 960 from the corresponding node 800 , and an input for real time data (from data network 240 ) (to provide a fast track for input of real time data into nodes 800 ).
  • the multiplexer 955 (or other switching element) provides selection (switching or arbitration) of one of 3 inputs, namely, selection of input from the selected peer or common 945 , selection of input from the same node as feedback, or selection of input of real time data, with the output of the multiplexer 955 provided as the network (MIN 110 ) input into the corresponding node 800 (via the node's pipeline register 815 ).
  • the various inputs into the pipeline register 815 of a node 800 and outputs from the pipeline register 855 from a node 800 are each in the form of a bus, preferably a 32-bit parallel bus.
  • Each separate line or input (output) of the (32-bit) bus is referred to herein as a “port”, and is assigned a port number (5 bits) which maps to memory 845 , which is referred to as a port identifier (or port ID).
  • the node 800 output is provided to the data aggregator and selector (“DAS”) 850 within the node 800 , which determines the routing of output information to the node itself (same node feedback), to the network (MIN 110 ) (for routing to another node or other system element), or to the data network 240 (for real time data output). As indicated above, this output is provided using a 32-bit output bus, with each output port of the bus also referred to using an (output) port identifier.
  • DAS data aggregator and selector
  • the output from the DAS 850 is provided to the corresponding output routing element 935 , which routes the output information to peer nodes within the quad 940 or to another, subsequent routing element 935 for routing out of the particular quad 940 through a common output 965 (such for routing to another node quad 940 , node quadrant 930 , or adaptive core 950 ).
  • FIG. 14 is a block diagram illustrating an exemplary data structure embodiment.
  • the system embodiment 900 utilizes point-to-point data and configuration information transfer, using a data packet (as an exemplary data structure) 970 , and may be considered as an exemplary form of “silverware”, as previously described herein.
  • the exemplary data packet 970 provides for 51 bits per packet, with 8 bits provided for a routing field ( 971 ), 1 bit for a security field ( 972 ), 4 bits for a service code field ( 973 ), 6 bits for an auxiliary field ( 974 ), and 32 bits (one word length) for data (as a data payload or data field) ( 975 ).
  • the routing field 971 may be further divided into fields for adaptive core selection ( 976 ), root selection ( 977 ), and node selection ( 978 ). In this selected 51-bit embodiment, up to four adaptive cores may be selected, and up to 32 nodes per adaptive core. As the packet is being routed, the routing bits may be stripped from the packet as they are being used in the routing process.
  • the service code field 973 provides for designations such as point-to-point inter-process communication, acknowledgements for data flow control, “peeks” and “pokes” (as coined terminology referring to reads and writes by the K-node into memory 845 ), DMA operations (for memory moves), and random addressing for reads and writes to memory 845 .
  • the auxiliary (AUX) field 974 supports up to 32 streams for any of up to 32 tasks for execution on the adaptive execution unit 840 , as discussed below, and may be considered to be a configuration information payload.
  • the one word length (32-bit) data payload is then provided in the data field 975 .
  • the exemplary data structure 970 (as a data packet) illustrates the interdigitation of data and configuration/control information, as discussed above.
  • the input pipeline register 815 is utilized to receive data and configuration information from the network interconnect 110 , through a plurality of input ports.
  • the input pipeline register 815 does not permit any data stalls. More particularly, in accordance with the data flow modeling, the input pipeline register 815 should accept new data from the interconnection network 110 every clock period; consequently, the data should also be consumed as it is produced.
  • the data decoder and distributor 820 interfaces the input pipeline register 815 to the various memories (e.g., 845 ) and registers (e.g., 825 ) within the node 800 , the hardware task manager 810 , and the DMA engine 830 , based upon the values in the service and auxiliary fields of the 51-bit data structure.
  • the data decoder 820 also decodes security, service, and auxiliary fields of the 51-bit network data structure (of the configuration information or of operand data) to direct the received word to its intended destination within the node 800 .
  • data from the node 800 to the network (MIN 110 or to other nodes) is transferred through a plurality of output ports via the output pipeline register 855 , which holds data from one of the various memories ( 845 ) or registers (e.g., 825 or registers within the adaptive execution unit 840 ) of the node 800 , the adaptive execution unit 840 , the DMA engine 830 , and/or the hardware task manager 810 .
  • Permission to load data into the output pipeline register 855 is granted by the data aggregator and selector (DAS) 850 , which arbitrates or selects between and among any competing demands of the various (four) components of the node 800 (namely, requests from the hardware task manager 810 , the adaptive execution unit 840 , the memory 845 , and the DMA engine 830 ).
  • the data aggregator and selector 850 will issue one and only one grant whenever there is one or more requests and the output pipeline register 855 is available.
  • the priority for issuance of such a grant is, first, for K-node peek (read) data; second, for the adaptive execution unit 840 output data; third, for source DMA data; and fourth, for hardware task manager 810 message data.
  • the output pipeline register 855 is available when it is empty or when its contents will be transferred to another register at the end of the current clock cycle.
  • the DMA engine 830 of the node 800 is an optional component.
  • the DMA engine 830 will follow a five register model, providing a starting address register, an address stride register, a transfer count register, a duty cycle register, and a control register.
  • the control register within the DMA engine 830 utilizes a GO bit, a target node number and/or port number, and a DONE protocol.
  • the K-node 925 writes the registers, sets the GO bit, and receives a DONE message when the data transfer is complete.
  • the DMA engine 830 facilitates block moves from any of the memories of the node 800 to another memory, such as an on-chip bulk memory, external SDRAM memory, another node's memory, or a K-node memory for diagnostics and/or operational purposes.
  • the DMA engine 830 in general, is controlled by the K-node 925 .
  • the hardware task manager 810 is configured and controlled by the K-node 925 and interfaces to all node components except the DMA engine 830 .
  • the hardware task manager 810 executes on each node 800 , processing a task list and producing a task ready-to-run queue implemented as a first in—first out (FIFO) memory.
  • the hardware task manager 810 has a top level finite state machine that interfaces with a number of subordinate finite state machines that control the individual hardware task manager components.
  • the hardware task manager 810 controls the configuration and reconfiguration of the computational elements 250 within the adaptive execution unit 840 for the execution of any given task by the adaptive execution unit 840 .
  • the K-node 925 initializes the hardware task manager 810 and provides it with set up information for the tasks needed for a given operating mode, such as operating as a communication processor or an MP3 player.
  • the K-node 925 provides configuration information as stored tasks (i.e., stored tasks or programs) within memory 845 and within local memory within the adaptive execution unit 840 .
  • the K-node 925 initializes the hardware task manager 810 (as a parameter table) with designations of input ports, output ports, routing information, the type of operations (tasks) to be executed (e.g., FFF, DCT), and memory pointers.
  • the K-node 925 also initializes the DMA engine 830 .
  • the hardware task manager 810 maintains a port translation table and generates addresses for point-to-point data delivery, mapping input port numbers to a current address of where incoming data should be stored in memory 845 .
  • the hardware task manager 810 provides data flow control services, tracking both production and consumption of data, using corresponding production and consumption counters, and thereby determines whether a data buffer is available for a given task.
  • the hardware task manager 810 maintains a state table for tasks and, in the selected embodiment, for up to 32 tasks.
  • the state table includes a GO bit (which is enabled or not enabled (suspended) by the K-node 925 ), a state bit for the task (idle, ready-to-run, run (running)), an input port count, and an output port count (for tracking input data and output data).
  • a GO bit which is enabled or not enabled (suspended) by the K-node 925
  • a state bit for the task idle, ready-to-run, run (running)
  • an input port count for tracking input data and output data
  • up to 32 tasks may be enabled at a given time. For a given enabled task, if its state is idle, and if sufficient input data (at the input ports) are available and sufficient output ports are available for output data, its state is changed to ready-to-run and queued for running (transferred into a ready-to-run FIFO or queue).
  • the adaptive execution unit 840 is provided with configuration information (or code) and two data operands (x and y
  • the task is transferred to an active task queue, the adaptive execution unit 840 is configured for the task (set up), the task is executed by the adaptive execution unit 840 , and output data is provided to the data aggregator and selector 850 .
  • the adaptive execution unit 840 provides an acknowledgement message to the hardware task manager 810 , requesting the next item.
  • the hardware task manager 810 may then direct the adaptive execution unit 840 to continue to process data with the same configuration in place, or to tear down the current configuration, acknowledge completion of the tear down and request the next task from the ready-to-run queue.
  • a module is a self-contained block of code (for execution by a processor) or a hardware-implemented function (embodied as configured computational elements 250 ), which is processed or performed by an execution unit 840 .
  • a task is an instance of a module, and has four states: suspend, idle, ready or run.
  • a task is created by associating the task to a specific module (computational elements 250 ) on a specific node 800 ; by associating physical memories and logical input buffers, logical output buffers, logical input ports and logical output ports of the module; and by initializing configuration parameters for the task.
  • a task is formed by the K-node writing the control registers in the node 800 where the task is being created (i.e., enabling the configuration of computational elements 250 to perform the task), and by the K-node writing to the control registers in other nodes, if any, that will be producing data for the task and/or consuming data from the task.
  • These registers are memory mapped into the K-node's address space, and “peek and poke” network services are used to read and write these values.
  • a newly created task starts in the “suspend” state.
  • the K-node can issue a “go” command, setting a bit in a control register in the hardware task manager 810 .
  • the action of this command is to move the task from the “suspend” state to the “idle” state.
  • the task is added to the “ready-to-run” queue which is implemented as a FIFO; and the task state is changed to “ready/run”. Buffers are available to the task when subsequent task execution will not consume more data than is present in its input buffers or will not produce more data than there is capacity in its output buffers.
  • the task (executed by the configured adaptive execution unit 840 ) consumes data from its input buffers and produces data for its output buffers.
  • the adaptive execution units 840 will vary depending upon the type of node 800 implemented. Various adaptive execution units 840 may be specifically designed and implemented for use in heterogeneous nodes 800 , for example, for a programmable RISC processing node; for a programmable DSP node; for an adaptive or reconfigurable node for a particular domain, such as an arithmetic node; and for an adaptive bit-manipulation unit (RBU). Various adaptive execution units 840 are discussed in greater detail below.
  • a node 800 through its execution unit 840 , will perform an entire algorithmic element in a comparatively few clock cycles, such as one or two clock cycles, compared to performing a long sequence of separate operations, loads/stores, memory fetches, and so on, over many hundreds or thousands of clock cycles, to eventually achieve the same end result.
  • the execution unit 840 may then be reconfigured to perform another, different algorithmic element.
  • algorithmic elements are selected from a plurality of algorithmic elements comprising, for example: a radix-2 Fast Fourier Transformation (FFT), a radix-4 Fast Fourier Transformation (FFT), a radix-2 Inverse Fast Fourier Transformation (IFFT), a radix-4 Inverse Fast Fourier Transformation (IFFT), a one-dimensional Discrete Cosine Transformation (DCT), a multi-dimensional Discrete Cosine Transformation (DCT), finite impulse response (FIR) filtering, convolutional encoding, scrambling, puncturing, interleaving, modulation mapping, Golay correlation, OVSF code generation, Haddamard Transformation, Turbo Decoding, bit correlation, Griffiths LMS algorithm, variable length encoding, uplink scrambling code generation, downlink scrambling code generation, downlink despreading, uplink spreading, uplink concatenation, Viterbi encoding, Viterbi decoding, cyclic redundancy coding (CRC), complex multiplication, data compression, motion compensation, channel searching,
  • a plurality of different nodes 800 are created, by varying the type and amount of computational elements 250 (Formning computational units 200 ), and varying the type, amount and location of interconnect (with switching or routing elements) which form the execution unit 840 of each such node 800 .
  • two different nodes 800 perform, generally, arithmetic or mathematical algorithms, and are referred to as adaptive (or reconfigurable) arithmetic nodes (AN), as AN 1 and AN 2 .
  • the AN 1 node as a first node 800 of the plurality of heterogeneous and reconfigurable nodes, comprises a first selection of computational elements 250 from the plurality of heterogeneous computational elements to form a first reconfigurable arithmetic node for performance of Fast Fourier Transformation (FFT) and Discrete Cosine Transformation (DCT).
  • FFT Fast Fourier Transformation
  • DCT Discrete Cosine Transformation
  • the AN 2 node as a second node 800 of the plurality of heterogeneous and reconfigurable nodes, comprises a second selection of computational elements 250 from the plurality of heterogeneous computational elements to form a second reconfigurable arithmetic node, the second selection different than the first selection, for performance of at least two of the following algorithmic elements: multi-dimensional Discrete Cosine Transformation (DCT), finite impulse response (FIR) filtering, OVSF code generation, Haddamard Transformation, bit-wise WCDMA Turbo interleaving, WCDMA uplink concatenation, WCDMA uplink repeating, and WCDMA uplink real spreading and gain scaling.
  • DCT multi-dimensional Discrete Cosine Transformation
  • FIR finite impulse response
  • OVSF code generation OVSF code generation
  • WCDMA Turbo interleaving bit-wise WCDMA Turbo interleaving
  • WCDMA uplink concatenation WCDMA uplink repeating
  • WCDMA uplink real spreading and gain scaling bit-wise W
  • nodes 800 are defined, such as, for example:
  • a bit manipulation node as a third node of the plurality of heterogeneous and reconfigurable nodes, comprising a third selection of computational elements 250 from the plurality of heterogeneous computational elements, the third selection different than the first selection, for performance of at least two of the following algorithmic elements: variable and multiple rate convolutional encoding, scrambling code generation, puncturing, interleaving, modulation mapping, complex multiplication, Viterbi algorithm, Turbo encoding, Turbo decoding, correlation, linear feedback shifting, downlink despreading, uplink spreading, CRC encoding, de-puncturing, and de-repeating.
  • a reconfigurable filter node as a fourth node of the plurality of heterogeneous and reconfigurable nodes, comprising a fourth selection of computational elements 250 from the plurality of heterogeneous computational elements, the fourth selection different than the first selection, for performance of at least two of the following algorithmic elements: adaptive finite impulse response (FIR) filtering, Griffith's LMS algorithm, and RRC filtering.
  • FIR adaptive finite impulse response
  • a reconfigurable finite state machine node as a fifth node of the plurality of heterogeneous and reconfigurable nodes, comprising a fifth selection of computational elements 250 from the plurality of heterogeneous computational elements, the fifth selection different than the first selection, for performance of at least two of the following processes: control processing; routing data and control information between and among the plurality of heterogeneous computational elements 250 ; directing and scheduling the configuration of the plurality of heterogeneous computational elements for performance of a first algorithmic element and the reconfiguration of the plurality of heterogeneous computational elements for performance of a second algorithmic element; timing and scheduling the configuration and reconfiguration of the plurality of heterogeneous computational elements with corresponding data; controlling power distribution to the plurality of heterogeneous computational elements and the interconnection network; and selecting the first configuration information and the second configuration information from a singular bit stream comprising data commingled with a plurality of configuration information.
  • a reconfigurable multimedia node as a sixth node of the plurality of heterogeneous and reconfigurable nodes, comprising a sixth selection of computational elements 250 from the plurality of heterogeneous computational elements, the sixth selection different than the first selection, for performance of at least two of the following algorithmic elements: radix-4 Fast Fourier Transformation (FFT); multi-dimensional radix-2 Discrete Cosine Transformation (DCT); Golay correlation; adaptive finite impulse response (FIR) filtering; Griffith's LMS algorithm; and RRC filtering.
  • FFT Fast Fourier Transformation
  • DCT Discrete Cosine Transformation
  • Golay correlation Golay correlation
  • adaptive finite impulse response (FIR) filtering Griffith's LMS algorithm
  • RRC filtering for performance of at least two of the following algorithmic elements: radix-4 Fast Fourier Transformation (FFT); multi-dimensional radix-2 Discrete Cosine Transformation (DCT); Golay correlation; adaptive finite impulse response (FIR) filtering; Griffith's LMS algorithm; and RRC filter
  • a reconfigurable hybrid node as a seventh node of the plurality of heterogeneous and reconfigurable nodes, comprising a seventh selection of computational elements 250 from the plurality of heterogeneous computational elements, the seventh selection different than the first selection, for performance of arithmetic functions and bit manipulation functions.
  • a reconfigurable input and output (I/O) node as an eighth node of the plurality of heterogeneous and reconfigurable nodes, comprising an eighth selection of computational elements 250 from the plurality of heterogeneous computational elements, the eighth selection different than the first selection, for adaptation of input and output functionality for a plurality of types of I/O standards, the plurality of types of I/O standards comprising standards for at least two of the following: PCI busses, Universal Serial Bus types one and two (USB 1 and USB 2 ), and small computer systems interface (SCSI).
  • PCI busses PCI busses
  • USB 1 and USB 2 Universal Serial Bus types one and two
  • SCSI small computer systems interface
  • a reconfigurable operating system node as a ninth node of the plurality of heterogeneous and reconfigurable nodes, comprising a ninth selection of computational elements 250 from the plurality of heterogeneous computational elements, the ninth selection different than the first selection, for storing and executing a selected operating system of a plurality of operating systems.
  • FIG. 15 is a block diagram illustrating a second system embodiment 1000 in accordance with the invention of the related application.
  • the second system embodiment 1000 is comprised of a plurality of variably-sized nodes (or matrices) 1010 (illustrated as nodes 1010 through 1010 X), with the illustrated size of a given node 1010 also indicative of an amount of computational elements 250 within the node 1010 and an amount of memory included within the node 1010 itself.
  • the nodes 1010 are coupled to an interconnect network 110 , for configuration, reconfiguration, routing, and so on, as discussed above.
  • the second system embodiment 1000 illustrates node 800 and system configurations which are different and more varied than the quadrant 930 and quad 940 configurations discussed above.
  • the second system embodiment 1000 is designed for use with other circuits within a larger system and, as a consequence, includes configurable input/output (I/O) circuits 1025 , comprised of a plurality of heterogeneous computational elements configurable (through corresponding interconnect, not separately illustrated) for I/O functionality.
  • the configurable input/output (I/O) circuits 1025 provide connectivity to and communication with a system bus (external), external SDRAM, and provide for real time inputs and outputs.
  • a K-node (KARC) 1050 provides the K-node (KARC) functionality discussed above.
  • the second system embodiment 1000 further includes memory 1030 (as on-chip RAM, with a memory controller), and a memory controller 1035 (for use with the external memory (SDRAM)). Also included in the apparatus 1000 are an aggregator/formatter 1040 and a de-formatter/distributor 1045 , providing functions corresponding to the functions of the data aggregator and selector 850 and data distributor and decoder 820 , respectively, but for the larger system 1000 (rather than within a node 800 ).
  • memory 1030 as on-chip RAM, with a memory controller
  • SDRAM external memory
  • aggregator/formatter 1040 and a de-formatter/distributor 1045 providing functions corresponding to the functions of the data aggregator and selector 850 and data distributor and decoder 820 , respectively, but for the larger system 1000 (rather than within a node 800 ).
  • one of the novel aspects of the ACE architecture is its heterogeneous collection of nodes 150 , 800 , which communicate via the matrix interconnection network (MIN) 110 .
  • the MIN 110 architecture allows data to be transmitted between tasks running on pairs of nodes 150 , 800 (or between pairs of tasks on the same node), with one task acting as the producer of the data, and the other as the consumer.
  • the producing task will provide data through one or more output ports coupled to the MIN 110 , via pipeline register 855 (for immediate consumption by a consuming task).
  • the consuming task will receive data through one or more input ports coupled to the MIN 110 , via pipeline register 815 .
  • These pairs of tasks can be configured either statically at the time of device initialization, or reconfigured dynamically.
  • the minimal information required to statically or dynamically reconfigure a MIN 110 connection consists of the following:
  • the nodes of the ACE are heterogeneous in nature, meaning their internal architectures differ from one another, allowing each node to optimize its performance for differing computational types.
  • a feature common to all nodes is the Hardware Task Manager (HTM) 810 , a component of the node that is responsible for interacting with the MIN 110 .
  • the HTM 810 is also responsible for keeping track of the tasks running on each node, and controlling when each task executes.
  • the HTM 810 employs a technique known as co-operative multitasking to control task scheduling.
  • co-operative multitasking to control task scheduling.
  • only one task is allowed to execute on a node 150 , 800 at any given time. It is the running task's responsibility to yield the processor back to the Hardware Task Manager when it has completed its computation.
  • the HTM associates firing conditions with each task. These firing conditions are based on the availability of input data for a task to consume, and the availability of memory to store output data produced by a task. These firing conditions are represented as counters in a Consumer Count Table (CCT) and Producer Count Table (PCT).
  • CCT Consumer Count Table
  • PCT Producer Count Table
  • the minimal information required to statically or dynamically configure a node's HTM 810 to specify task firing conditions consists of the following:
  • a new general purpose programming language (referred to herein as “SilverC”) is provided to facilitate static and dynamic configuration of the ACE 100 . While applicable to many hardware platforms and programming styles, it contains several constructs that directly support the static or dynamic reconfiguration of the MIN 110 and HTMs 810 of the ACE (ACM) 100 . These constructs are modules, processes, and pipes.
  • a “construct” or “program construct”, as used herein, means and refers to use of any programming language, of any kind, with any syntax or signatures, which provide or can be interpreted to provide a mapping or correspondence from the language to the hardware, such as a first program construct which maps to a node 800 , a second program construct which maps to a task to executed on the node 800 , and so on. While exemplary constructs are illustrated as examples, it should be understood that other constructs which are correspondingly mapped or can be interpreted to be mapped, such as through a compiler, are within the scope of the present invention.
  • a SilverC module acts as a container for program instructions and data that will be used to perform some computation on some hardware platform, such as a node within the ACE (ACM) 100 .
  • a module corresponds to or maps to a selected node 800 .
  • a SilverC module may contain zero or more processes and pipes. SilverC modules add a layer of encapsulation to the SilverC programming language.
  • a module may be completely described by the input and output characteristics of its pipes. As such, developers incorporating a pre-existing module into their application may remain unaware of the details of its processes and how the actual computation is performed within the module.
  • a SilverC process is a collection of program instructions and data that is instantiated as an individual thread or task on some hardware platform, such as the ACE (ACM) 100 .
  • a process corresponds to or maps to a task to be performed by the adaptive execution unit (AEU) 840 under the control of the HTM 810 on a selected node 800 .
  • the process will only execute when its firing conditions are met, providing event-driven programming.
  • a process maps as a software analog to the hardware task, with the firing conditions mapping to the HTM 810 which provides that a task is ready-to-run when the input data is available and there are a sufficient number of output ports for the output data, as discussed above in greater detail.
  • Multiple processes may be aggregated within a single SilverC module and work cooperatively in order to perform the overall computation of that module.
  • a SilverC pipe represents communication between tasks, and acts as a conduit for data that is either produced or consumed by a process.
  • An inpipe acts as a conduit for data that is consumed by a process.
  • An outpipe acts as a conduit for data that is produced by a process.
  • SilverC While suitable as a general purpose programming language that is applicable to many hardware platforms, the language constructs of Silverc directly support the static and dynamic reconfiguration capabilities of the ACE (ACM) 100 hardware.
  • ACM ACE
  • the SilverC module, process and pipe constructs are an efficient means to specify the static and dynamic reconfiguration parameters of the MIN 110 and HTM 810 .
  • the various modules, with their processes, pipes, and other SilverC constructs described below, may then be compiled to a bit file or other object code, by a compiler, for execution on the selected computing hardware, such as a bit file which provides configuration information (silverware) for execution on the ACE (ACM) 100 .
  • a bit file which provides configuration information (silverware) for execution on the ACE (ACM) 100 .
  • ACM ACE
  • such compilation and resulting bit file may vary depending upon the particular node types available in the selected ACE 100 embodiment.
  • any module, with its processes, pipes, and other SilverC constructs of the preferred SilverC embodiment, is considered capable of being mapped or otherwise has a direct (1:1) correspondence to a selected node 800 of an ACE 100 (and associated system) with its associated HTM 810 , AEU 840 , and MIN 110 connections (ports).
  • SilverC modules are code containers that are mapped (by a compiler) to a single “execution unit” having computational elements on some hardware platform, such as to a node 800 on the ACE (ACM) 100 having an AEU 840 and HTM 810 .
  • the computational elements of the AEU 840 may support multiple modules at a time, but a module should not be distributed across multiple AEUs 840 (i.e., a single module is executed by a single node 800 ).
  • SilverC modules contain a configuration-time interface and a run-time interface.
  • the configuration-time interface consists of values that are used to parameterize the definition of the module and which are specified at the point when the module is instantiated.
  • T gain parameter
  • Such instantiation may occur at either compile-time or run-time.
  • the run-time interface consists of input and output pipes that are used to dynamically transmit data to and from the module. These form the basis for the SilverC dataflow-style semantics.
  • SilverC modules are also composed of processes that define the computation performed by the module on its input data.
  • the code used to specify these processes can be C-like in nature, with some additions to support dataflow-style programming and specific hardware features. Equivalently, other coding languages and styles may be utilized, also with the additions to support dataflow-style programming and specific hardware features of the ACE 100 .
  • SilverC modules may contain constants that are global to the module, as well as some amount of state information shared between its processes, in the form of memory or registers. For example, memory may be shared across processes, and variables and constants may be declared and shared across processes.
  • the nodeType specifies for which type of node (or AEU 840 ) the module is targeted, such as an arithmetic node or a bit-manipulation node.
  • the moduleName is a placeholder for a unique identifier (or name) that identifies the module
  • parameterList represents the list of configuration-time parameters for the module.
  • the parameter list of a module is preferably a comma-separated list of const identifier declarations, resembling a parameter list of a C function.
  • an exemplary parameter list would be (Example 2):
  • Modules that require no configuration-time parameters may be declared by omitting the parameter list, and optionally by omitting the angle brackets used to enclose it as well.
  • both of the following modules have no parameters (Example 3): module NoParametersHere ⁇ > ⁇ ... ⁇ module NorHere ⁇ ... ⁇
  • the rest of the module definition is given in one or more module sections.
  • the preferred SilverC embodiment currently supports four different module sections, each identified by a keyword followed by a colon: constants, state, pipes, and processes.
  • the constants section is used to define constant values that are global to the module.
  • the state section declares shared state information between the module processes.
  • the pipes section defines the module run-time interface.
  • the processes section defines the processes themselves (i.e., algorithms to be performed).
  • Module sections may appear in any order, though each may only be defined in terms of identifiers declared in sections that precede it. Each module section type may be omitted, may contain no declarations at all, or may be used multiple times within a module. Modules whose pipes and/or processes sections are omitted or empty are relatively useless in a real system.
  • Example 4 An exemplary module (named “Sample”, and omitting its nodeType) that has one instance of each type of module section is shown in the following code (Example 4): module Sample ⁇ const int16 blockSize> ⁇ constants: ... state: ... pipes: ... processes: ... ⁇
  • a parameter “blockSize” was declared as a constant value of a 16-bit integer data type. As illustrated below, it will be used to determine the size of pipes (number of ports) and the amount of data to be consumed or produced in this module, and will be instantiated by other parts of the code of the module illustrated in other examples below. While illustrating a single parameter, it should be understood that a list of multiple parameters may be utilized.
  • the constants section of a module is used to declare constants that are global to the module scope. It consists of traditional constant variable declarations as in C, the initializers of which may be composed of any expression formed of literals, global constants defined at the file scope, the parameters of the module, and any module constants declared previously within the module. Module constants are often used to define the sizes of the input pipe buffers, as well as state variables declared within the state section.
  • This state section of a module is used to declare shared state information between module processes. It supports the declaration of global variables within the module scope whose values can be accessed by any of the module processes. If a module is instantiated multiple times, each instantiation receives its own copy of the state variables—in this sense, state variables are similar to the static variables declared within a process except that they are accessible by multiple processes.
  • Example 6 module Sample ⁇ const int16 blockSize> ⁇ ... state: ram fract16 dataCache[dataCacheSize]; ... ⁇
  • the state section set up random access memory (ram) (or another register), with a 16-bit fractional (fixed point) data type, having a size (datacache) equal to the previously determined constant (dataCacheSize).
  • the pipes section defines the run-time interface of a module by specifying the input and output pipes used to transmit data into and out of the module, and is utilized to configure the MIN 110 .
  • this pipes construct illustrates a 1:1 correspondence between the constructs of SilverC and the configuration of the ACE 100 .
  • All pipes are declared to be either an input pipe, using the inpipe keyword, or an output pipe, using the outpipe keyword.
  • Each pipe type takes its defining parameters enclosed in angle brackets, and these are described in further detail below.
  • Pipes are named, as with any other declaration.
  • a sample pipes section is illustrated as the following code (Example 7): module Sample ⁇ const int16 blockSize> ⁇ ... pipes: inpipe ⁇ ...> dataIn; outpipe ⁇ ...> dataOut; ... ⁇
  • an inpipe has been named dataIn
  • an outpipe has been named dataOut.
  • This pipes section specifies that the module has one input data stream that is stored in the datain pipe and a single output data stream that is controlled by the dataOut pipe.
  • Input pipes buffer data that is streamed into a module. All input pipes can be thought of as single-dimensional arrays of a user-specified element type. Input pipes are uniquely named (inpipeName) and are parameterized using two values: the type of element that is being transferred (elementType), and the number of elements that should be buffered by the input pipe (bufferSize) (i.e., the amount of memory to be reserved for its incoming data).
  • Input pipes buffer data that is streamed into a module. All input pipes can be thought of as single-dimensional arrays of a user-specified element type. Input pipes are uniquely named (inpipeName) and are parameterized using two values: the type of element that is being transferred (elementType), and the number of elements that should be buffered by the input pipe (bufferSize) (i.e., the amount of memory to be reserved for its incoming data).
  • An exemplary input pipe declaration is shown as the following code (Example 8):
  • an input pipe named datain of fract16 data type values is declared whose buffer size is specified via its module parameter (blockSize) and constant values (numBlocks) as follows (Example 9): module Sample ⁇ const int16 blockSize> ⁇ ... pipes: inpipe ⁇ fract16, numBlocks*blockSize> dataIn; ... ⁇ As illustrated, whenever this inpipe is instantiated via instantiation of its parent module, different parameter values may be utilized, and the inpipe buffer allocation will be correspondingly sized automatically, providing for significant code re-use.
  • Output pipes are the means for generating output from a module. Output pipes are similar to input pipes, except that they do not perform any buffering, requiring only a data type declaration (elemeniType) and a unique name (outpipeName). As discussed above, as soon as output data is produced, it is transmitted over the MIN 110 , and stored in the inpipe of another process or module. Output pipe declarations appear as follows in the preferred SilverC embodiment (Example 10):
  • the elementType indicates the type of element that is transferred through the output pipe.
  • An output pipe declaration that would complement the input pipe shown earlier would be declared as follows (Example 11):
  • Input and output pipes both support two main types of operations: readiness checks, for the HTM 810 to determine if the task is ready to run, and synchronization.
  • Output pipes also support assignments, which correspond to placing data on the network. Input pipes currently do not support direct access in the preferred SilverC embodiment, but must be accessed via SilverC pointers (to memory 845 ).
  • Data is written to an output pipe using a simple assignment.
  • the right-hand side expression of the assignment must be of the same type as the element type of the pipe, or of a type that can automatically be coerced into the output type of the pipe.
  • a synchronization message should be sent to the corresponding input pipe to let it know that new data has been written to its input buffer for a consuming task.
  • This downstream notification functionality is provided by using the a notify ( ) routine of the preferred SilverC embodiment, as follows (Example 13):
  • Example 14 // code to inform linked input pipe that 3 values written to its buffer... notify(dataOut, 3);
  • the preferred SilverC embodiment does not prevent a user notification from providing incorrect information about how many values have actually been written to an input pipe buffer, although this usage is strongly discouraged.
  • the value passed to a notify call should be equal to the number of assignments made to the output pipe since the preceding call.
  • the synchronization used to implement the notify routine usually has a certain amount of overhead associated with it, which is why notifications are not assumed to be performed automatically by the runtime system for each assignment to an output pipe.
  • Example 16 // code to read three values from the dataIn buffer... release(dataIn, 3);
  • the synchronization functionality provided by the notify ( ) and release ( ) routines are mapped (through a compiler) directly to the functionality of the HTM 810 with its producer and consumer count tables, and correspondingly modify the CCT and PCT registers of the HTM 810 for each corresponding input or output port.
  • the preferred SilverC embodiment supports a query and initialization functionality, ready ( ), which allows a process (program) to query whether input and output pipes are ready for data to be read from them or written to them.
  • ready a process
  • these functionalities have the effect of initializing the CCT and PCT to their triggering values (firing or execution conditions), i.e., the values which will cause the HTM 810 to place the corresponding task in the ready-to-run queue for execution.
  • the exemplary query function is illustrated using the following code (Example 17):
  • pipeType is a placeholder to indicate that either an inpipe or outpipe can be used with this routine.
  • the pipeName argument is the name of the pipe to be checked, while numberOfElements indicates the number of elements to be checked for (as a necessary and/or sufficient condition for triggering the corresponding task).
  • this routine indicates whether at least numberOfElements data values are ready to be read from the pipe input buffer.
  • For an output pipe it indicates whether there are numberOfElements slots available for writing new values in the corresponding input pipe buffer.
  • the routine returns a first value (0) if the readiness condition of the pipe is not met, and a second value (non-zero) otherwise.
  • the readiness of a pipe does not correspond to the number of actual values written to or read from an input pipe buffer, but rather the number of elements that have been cumulatively specified by the notify ( ) and release ( ) synchronization routines. For example, if three values were written to an output pipe, but no notification was ever made that these three values had been written (and, as a consequence, the producer and consumer counts are unchanged), the following call would return 0 for the corresponding input pipe, even though the values may very well be stored in its buffer (Example 18):
  • the input memory has sufficient space to accommodate the writing of three new values, then the data will be written to the corresponding output ports, and the consuming task will be correspondingly notified.
  • Such pipe readiness is typically checked or determined within the firing conditions of a process, as described below.
  • the processes section of a module contains the process (method or program) definitions that define a module.
  • a module may consist of one or more processes, which are cooperatively multitasked with each other, as well as with any other modules mapped to the same AEU 840 or other form of hardware computational element. Each such process corresponds to a task to be performed on a node 150 , 800 .
  • processes are where the bulk of the program behavior is defined and where most of the C-style code appears.
  • Process declarations vaguely resemble C-style functions, but due to their adaptive computing nature, they take no parameters and have no return type. Instead, they are defined with associated firing conditions that indicate when the process should run (typically in terms of the readiness of one or more input and/or output pipes).
  • processName when firingCondition ⁇ ... ⁇
  • firingCondition indicates the condition that must be true in order for the process (corresponding task) to be executed. This is typically the logical AND of a number of pipe readiness conditions and, as indicated above, initializes the PCT and CCT values.
  • the following code declares a process for a sample module named passThrough. It is declared to fire whenever its input pipe has a block of values (of size blockSize) ready for reading and its output pipe has a block of locations (also in this example of size blocksize) free for writing (Example 21): module Sample ⁇ const int16 blockSize> ⁇ ... processes: process passThrough when (ready(dataIn, blockSize) && ready(dataOut, blockSize)) ⁇ ... ⁇ ⁇
  • the body of a process is preferably made up of SilverC code as it has been described, namely, traditional C or C++ language program constructs augmented with SilverC constructs, definitions, extensions, pointers, and pipe operations.
  • the body of a process may alternately contain inline C or assembly code.
  • most processes begin by firing based on the readiness of their input and output pipes, perform some computations using the input data and module state, followed by assigning the results to their output pipes, and then performing notification and release calls on the pipes.
  • This process runs whenever a block of values (of size blockSize) is ready for reading from its input, and a block of locations (of size blocksize) are ready for writing on its output, as the firing conditions which initialize the CCT and PCT of the HTM 810 . It proceeds by running a SilverC pointer (dataInPtr++) incrementally, one element at a time, across that input block of values (in a buffer corresponding to dataIn), and writing them to its output pipe. This process then notifies the downstream pipe that it has sent a block of values to it, and releases the input values so that the upstream process may overwrite them, modifying the values held in the CCT and PCT. It should be noted that these synchronization calls notify ( ) and release ( ) could be performed in any order, with the choice of order depending on which message should be delivered first.
  • SilverC modules may be used as a new parameterized type in the language of the preferred SilverC embodiment. Declaring “variables” of these types corresponds to creating a new instantiation of the module that executes in parallel with all other module instantiations. For example, given a module definition as follows (Example 23): module Sample ⁇ const int16 blockSize> ⁇ ... ⁇ then an instantiation of the module with a blocksize parameter of “8” would appear as:
  • a module may be instantiated more than once.
  • the preferred SilverC embodiment provides for input and output pipes of a module to be linked to the output and input pipes of other modules.
  • This linking or connecting of pipes across modules may be performed statically or dynamically, and may be implemented repeatedly with different linking connections, such as linking “A” to “B” at one instant, followed by linking “A” to “C” at another instant.
  • the preferred SilverC embodiment utilizes a link( ) function, which may be specified as (Example 24):
  • pipes are referred to using the identifier of the module instantiation followed by a dot (.), followed by the name of the pipe as declared within the module definition.
  • this code declares an instance of each of the Producer and Consumer modules, as myproducer and myConsumer, respectively, similarly to the C++ declaration of an object as an instance of a class.
  • This Example 25 then links the output pipe of the instantiated producer, dataOut, to the input pipe of the instantiated consumer, datain.
  • the language constructs of the preferred SilverC embodiment directly support the static and dynamic reconfiguration capabilities of the ACE (ACM) 100 hardware.
  • the SilverC module, process and pipe constructs are an efficient means to specify the static and dynamic reconfiguration parameters of the ACE (ACM) 100 MIN 110 and node 800 Hardware Task Manager 810 .
  • the preferred SilverC embodiment provides the following direct mapping from the programming language domain to the ACE (ACM) 100 hardware domain:
  • the SilverC module constructs provides a direct mapping from the programming language domain to the ACE (ACM) 100 node identifier domain.
  • the SilverC compiler assigns module instances to ACE (ACM) 100 nodes according to the node type specified in the module definition and any additional constraints applied to the module instance.
  • the SilverC process construct provides a direct mapping from the programming language domain to the ACE (ACM) 100 task identifier domain.
  • ACM ACE
  • a unique task identifier is generated for each process of each module instance.
  • the SilverC pipe construct provides a direct mapping from the programming language domain to the ACE (ACM) 100 port identifier domain.
  • ACM ACE
  • a unique unit port identifier is generated for each port of each module instance.
  • the SilverC link ( ) function provides the association between source node, task and port identifiers and destination node, task and port identifiers. It provides a direct mapping from the programming language domain to the MIN 110 connection domain of the ACE (ACM) 100 .
  • the SilverC programming language provides the following direct mapping from the programming language domain to the ACE (ACM) 100 hardware domain:
  • the SilverC process construct provides a direct mapping from the programming language domain to the ACE (ACM) 100 task identifier domain.
  • ACM ACE
  • a unique task identifier is generated for each process of each module instance.
  • the SilverC ready ( ) function provides a direct mapping from the programming language domain to the HTM firing condition domain.
  • the HTM Consumer Count Table (CCT) and Producer Count Table (PCT) are populated using the counter values specified in the ready ( ) function.
  • the SilverC module construct plays an indirect role in this mapping, as it provides the association between processes and pipes.
  • the SilverC pipe construct also provides an indirect role as it provides the mapping to MIN 110 ports, as described above.
  • the SilverC pipe construct provides a direct mapping from the programming language domain to the HTM initial counter value domain.
  • the initial counter value for the corresponding input port is simply—bufferSize, where bufferSize is size of the inpipe buffer as specified in its declaration.
  • the initial counter value for the corresponding output port is -(bufferSize—readyCount+1), where bufferSize is size of the buffer of the inpipe that is linked to this outpipe through a link ( ) expression, and readyCount is the firing condition associated with the output port through a ready ( ) expression.
  • the release ( ) and notify ( ) constructs may then be utilized to increment or decrement the counter values held in the corresponding CCT and PCT of the HTM 810 .
  • the system, methods and programs of the present invention may be embodied in any number of forms, such as within a computer, within a workstation, within a computer network, within an adaptive computing device such as an ACE 100 , or within any other form of computing or other system used to create or contain source code.
  • Such source code further may be compiled into some form of instructions or object code (including assembly language instructions or configuration information for adaptive computing).
  • the source code of the present invention may be embodied as any type of software, such as C++, C#, Java, or any other type of programming language which performs the functionality discussed above, including the preferred SilverC embodiment.
  • the source code of the present invention and any resulting bit file may be embodied within any tangible storage medium, such as within a memory or storage device for use by a computer, a workstation, any other machine-readable medium or form, or any other storage form or medium for use in a computing system.
  • Such storage medium, memory or other storage devices may be any type of memory device, memory integrated circuit (“IC”), or memory portion of an integrated circuit (such as the resident memory within a processor IC), including without limitation RAM, FLASH, DRAM, SRAM, MRAM, FeRAM, ROM, EPROM or E 2 PROM, or any other type of memory, storage medium, or data storage apparatus or circuit, depending upon the selected embodiment.
  • a tangible medium storing computer readable software, or other machine-readable medium may include a floppy disk, a CDROM, a CD-RW, a magnetic hard drive, an optical drive, a quantum computing storage medium or device, a transmitted electromagnetic signal (e.g., used in internet downloading), or any other type of data storage apparatus or medium.
  • the present invention provides a system, software, and method for programming an adaptive computing device which has a plurality of heterogeneous nodes coupled through a matrix interconnect network.
  • the method embodiment comprises, in any order: creating a first program construct having a correspondence to a selected node of the plurality of heterogeneous nodes; creating a second program construct having a correspondence to an executable task of the selected node; creating a third program construct having a correspondence to at least one input port coupling the selected node to the matrix interconnect network for input data to be consumed by the executable task; and creating a fourth program construct having a correspondence to at least one output port coupling the selected node to the matrix interconnect network for output data to be produced by the executable task.
  • the first program construct is a module declaration, optionally having a first unique identifier, a first reference to a node type corresponding to the selected node, and a second reference to one or more configuration-time parameters.
  • the preferred module declaration has a form comprising:
  • this first program construct generally includes, within the body of the construct, the second, third and fourth program constructs.
  • the function of the first program construct is merely to map or correspond to a node type.
  • the module declaration further has a constants section which declares at least one constant which is global to the module; a states section which declares shared state information between module processes (such as an array of values stored in a memory); a process section having one or more process declarations, as second program constructs; and a pipes section, the pipes section having the third program construct and the fourth program construct.
  • the third program construct is preferably an inpipe declaration having a first unique identifier and further having a first parameter specifying an element type of the input data and a second parameter specifying an amount of memory to be reserved for the input data; and the fourth program construct is preferably an outpipe declaration having a second unique identifier and further having a third parameter specifying an element type of the output data.
  • An assignment of output data to the outpipe declaration corresponds to writing output data to the output port connecting the node 800 to the MIN 100 .
  • the inpipe declaration preferably has a form comprising:
  • the second program construct is a process declaration having a unique identifier and having at least one firing condition, the firing condition capable of determining a commencement of the executable task of the selected node.
  • the process declaration preferably has a form comprising:
  • Synchronization of production of output data with consumption of input data is provided by creating a fifth program construct corresponding to a data producing task notifying a data consuming task of the creation of output data; and creating a sixth program construct corresponding to a data consuming task notifying a data producing task of the consumption of input data.
  • the data producing task is executable on a first node of the plurality of heterogeneous nodes and the data consuming task is executable on a second node of the plurality of heterogeneous nodes.
  • the fifth program construct is a notify routine and has a form comprising:
  • the present invention also provides for commencement of the executable task through a seventh program construct having a correspondence to a task manager of the selected node, which may be used to and corresponds to an initialization of a producer count table of the task manager or an initialization of a consumer count table of the task manager.
  • the seventh program construct is a ready routine and has a form comprising:
  • An eighth program construct is used to link the fourth program construct to the third program construct, and corresponds to a selected configuration of the matrix interconnection network to provide a communication path from a selected output port to a selected input port.
  • the eighth program construct is a link routine and has a form comprising:
  • a ninth program construct may also be utilized to instantiate a program construct of a plurality of program constructs, such as the first program construct, the second program construct, the third program construct, the fourth program construct, and the eighth program construct.
  • the ninth program construct is a main function and has a form comprising: main( ) ⁇ ... ⁇ wherein the ellipsis “. . . ” is a placeholder for specification of a program construct to be instantiated.
  • the main() function can be utilized to instantiate a module, with all of its incorporated program constructs such as processes, pipes, and links.
  • different module and other program construct parameters will allow different instantiations of modules and their included constructs, as mentioned above, such that each instantiation corresponds to a parameter set contained within the program construct.
  • the invention facilitates static and dynamic configuration of an adaptive computing device such as the ACE 100 . While applicable to many hardware platforms and programming styles, it contains several constructs that directly support the static or dynamic reconfiguration of the MIN 110 and HTMs 810 of the ACE (ACM) 100 .

Abstract

The present invention provides a system, method and software for programming and configuring an adaptive computing architecture or device. The invention utilizes program constructs which correspond to and map directly to the adaptive hardware having a plurality of reconfigurable nodes coupled through a reconfigurable matrix interconnection network. A first program construct corresponds to a selected node. A second program construct corresponds to an executable task of the selected node and includes one or more firing conditions capable of determining the commencement of the executable task of the selected node. A third program construct corresponds to at least one input port coupling the selected node to the matrix interconnect network for input data to be consumed by the executable task. A fourth program construct corresponds to at least one output port coupling the selected node to the matrix interconnect network for output data to be produced by the executable task;

Description

    CROSS-REFERENCE TO A RELATED APPLICATION
  • This application is related to a Paul L. Master et al., U.S. Patent Application Ser. No. 10/384,486, entitled “Adaptive Integrated Circuitry With Heterogeneous And Reconfigurable Matrices Of Diverse And Adaptive Computational Units Having Fixed, Application Specific Computational Elements”, filed Mar. 7, 2003, commonly assigned to QuickSilver Technology, Inc., and incorporated by reference herein, with priority claimed for all commonly disclosed subject matter (the “related application”), which is a continuation-in-part of Paul L. Master et al., U.S. Patent Application Ser. No. 09/815,122, entitled “Adaptive Integrated Circuitry With Heterogeneous And Reconfigurable Matrices Of Diverse And Adaptive Computational Units Having Fixed, Application Specific Computational Elements”, filed Mar. 22, 2001, commonly assigned to QuickSilver Technology, Inc.
  • FIELD OF THE INVENTION
  • The present invention relates, in general, to programming of integrated circuits and systems for particular applications, and more particularly, to a system, method and software for static and dynamic programming and configuration of an adaptive computing integrated circuit architecture.
  • BACKGROUND OF THE INVENTION
  • The related application discloses a new form or type of integrated circuit, referred to as an adaptive computing engine (“ACE”) or adaptive computing machine (“ACM”), which is readily reconfigurable, in real time, and is capable of having corresponding, multiple modes of operation. The ACM is a new and innovative hardware platform suitable for digital signal processing, Telematics, and other applications where small hardware footprint, low power consumption and high performance characteristics are highly desirable.
  • The ACE architecture for adaptive or reconfigurable computing, includes a plurality of different or heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability. In response to configuration information, the interconnection network is operative in real time to adapt (configure and reconfigure) the plurality of heterogeneous computational elements for a plurality of different functional modes, including linear algorithmic operations, non-linear algorithmic operations, finite state machine operations, memory operations, and bit-level manipulations.
  • As a consequence, the interconnection network and other ACE hardware need to be configured and generally also reconfigured, either statically or dynamically, to perform any given application or algorithm.
  • The ACE architecture also utilizes a data flow model for processing. More particularly, input operand data will be processed to produce output data (without other intervention such as interrupt signals, instruction fetching, etc.), whenever the input data is available and an output port (register or buffer) is available for any resulting output data. Controlling the data flow processing to implement an algorithm, however, presents unusual difficulties, including for controlling data flow in the communication and control algorithms used in a wide variety of applications, such as wideband CDMA (“WCDMA”) and cdma2000.
  • Given this new and unique adaptive computing integrated circuit architecture, a need remains for a method, system and software to program and configure the adaptive computing architecture (or device), either statically or dynamically, to perform one or more applications
  • SUMMARY OF THE INVENTION
  • The present invention provides a plurality of program constructs which enable the static or dynamic programming and configuration of an adaptive computing device, such as an ACE (ACM) having a plurality of heterogeneous nodes coupled through a matrix interconnect network.
  • The various system, method and software embodiments of the invention provide a plurality of program constructs:
  • a first program construct, such as a “module”, having a correspondence to a selected node of the plurality of heterogeneous nodes;
  • a second program construct, such as a “process”, having a correspondence to an executable task of the selected node, and having at least one firing condition capable of determining a commencement of the executable task of the selected node;
  • a third program construct, such as an “inpipe”, having a correspondence to at least one input port coupling the selected node to the matrix interconnect network for input data to be consumed by the executable task;
  • a fourth program construct, such as an “outpipe”, having a correspondence to at least one output port coupling the selected node to the matrix interconnect network for output data to be produced by the executable task;
  • a fifth program construct, such as a “notify” routine, having a correspondence to a notification of creation of output data, and a sixth program construct, such as a “release” routine, having a correspondence to a notification of consumption of input data, such that the fifth program construct and the sixth program construct provide for synchronization of production of output data with consumption of input data;
  • a seventh program construct, such as a “ready” routine, having a correspondence to a task manager of the selected node to provide for commencement of the executable task, which also provides initialization of a producer count table of the task manager or a consumer count table of the task manager within the selected node; and
  • an eighth program construct, such as a “link” routine, linking the fourth program construct to the third program construct, the eighth program construct corresponding to a selected configuration of the matrix interconnection network providing a communication path from a selected output port to a selected input port.
  • Numerous other advantages and features of the present invention will become readily apparent from the following detailed description of the invention and the embodiments thereof, from the claims and from the accompanying drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The objects, features and advantages of the present invention will be more readily appreciated upon reference to the following disclosure when considered in conjunction with the accompanying drawings and examples which form a portion of the specification, in which:
  • FIG. 1 is a block diagram illustrating an exemplary first apparatus embodiment in accordance with the invention of the related application.
  • FIG. 2 is a schematic diagram illustrating an exemplary data flow graph.
  • FIG. 3 is a block diagram illustrating a reconfigurable matrix (or node), a plurality of computation units, and a plurality of computational elements.
  • FIG. 4 is a block diagram illustrating, in greater detail, a computational unit of a reconfigurable matrix.
  • FIGS. 5A through 5E are block diagrams illustrating, in detail, exemplary fixed and specific computational elements, forming computational units.
  • FIG. 6 is a block diagram illustrating, in detail, an exemplary multi-function adaptive computational unit having a plurality of different, fixed computational elements.
  • FIG. 7 is a block diagram illustrating, in detail, an adaptive logic processor computational unit having a plurality of fixed computational elements.
  • FIG. 8 is a block diagram illustrating, in greater detail, an exemplary core cell of an adaptive logic processor computational unit with a fixed computational element.
  • FIG. 9 is a block diagram illustrating, in greater detail, an exemplary fixed computational element of a core cell of an adaptive logic processor computational unit.
  • FIG. 10 is a block diagram illustrating a second exemplary apparatus embodiment in accordance with the invention of the related application.
  • FIG. 11 is a block diagram illustrating an exemplary first system embodiment in accordance with the invention of the related application.
  • FIG. 12 is a block diagram illustrating an exemplary node quadrant with routing elements.
  • FIG. 13 is a block diagram illustrating exemplary network interconnections.
  • FIG. 14 is a block diagram illustrating an exemplary data structure embodiment.
  • FIG. 15 is a block diagram illustrating an exemplary second system embodiment 1000 in accordance with the invention of the related application.
  • DETAILED DESCRIPTION OF THE INVENTION
  • While the present invention is susceptible of embodiment in many different forms, there are shown in the drawings and will be described herein in detail specific examples and embodiments thereof, with the understanding that the present disclosure is to be considered as an exemplification of the principles of the invention and is not intended to limit the invention to the specific examples and embodiments illustrated.
  • As indicated above, the present invention provides a system, method and software for programming and configuring an adaptive computing device such as an ACE 100. The present invention provides such a programming methodology using a series of unique constructs which are capable of being mapped directly to the hardware features of the ACE 100 and which are also capable of configuring the matrix interconnect network of the ACE 100 for, among other things, the routing of output data and input data. The various program constructs of the present invention have additional features, such as providing synchronization among the various tasks which may be executed within the ACE 100.
  • In the following discussion, a background of an exemplary adaptive computing architecture is provided with reference to FIGS. 1 through 15. Following this background discussion, the present invention is discussed in detail with reference to Examples 1 through 25.
  • FIG. 1 is a block diagram illustrating a first apparatus 100 embodiment in accordance with the invention of the related application. The apparatus 100, referred to herein as an adaptive computing engine (“ACE”) 100, is preferably embodied as an integrated circuit, or as a portion of an integrated circuit having other, additional components. In the first apparatus embodiment, and as discussed in greater detail below, the ACE 100 includes one or more reconfigurable matrices (or nodes) 150, such as matrices 150A through 150N as illustrated, and a matrix interconnection network 110. Also in the first apparatus embodiment, and as discussed in detail below, one or more of the matrices (nodes) 150, such as matrices 150A and 150B, are configured for functionality as a controller 120, while other matrices, such as matrices 150C and 150D, are configured for functionality as a memory 140. The various matrices 150 and matrix interconnection network 110 may also be implemented together as fractal subunits, which may be scaled from a few nodes to thousands of nodes.
  • A significant departure from the prior art, the ACE 100 does not utilize traditional (and typically separate) data, direct memory access (DMA), random access, configuration and instruction busses for signaling and other transmission between and among the reconfigurable matrices 150, the controller 120, and the memory 140, or for other input/output (“I/O”) functionality. Rather, data, control and configuration information are transmitted between and among these matrix 150 elements, utilizing the matrix interconnection network 110, which may be configured and reconfigured, in real time, to provide any given connection between and among the reconfigurable matrices 150, including those matrices 150 configured as the controller 120 and the memory 140, as discussed in greater detail below.
  • The matrices 150 configured to function as memory 140 may be implemented in any desired or preferred way, utilizing computational elements (discussed below) of fixed memory elements, and may be included within the ACE 100 or incorporated within another IC or portion of an IC. In the first apparatus embodiment, the memory 140 is included within the ACE 100, and preferably is comprised of computational elements which are low power consumption random access memory (RAM), but also may be comprised of computational elements of any other form of memory, such as flash, DRAM, SRAM, SDRAM, FRAM, MRAM, ROM, EPROM or E2PROM. In the first apparatus embodiment, the memory 140 preferably includes DMA engines, not separately illustrated.
  • The controller 120 is preferably implemented, using matrices 150A and 150B configured as adaptive finite state machines, as a reduced instruction set (“RISC”) processor, controller or other device or IC capable of performing the two types of functionality discussed below. (Alternatively, these functions may be implemented utilizing a conventional RISC or other processor.) The first control functionality, referred to as “kernel” control, is illustrated as kernel controller (“KARC”) of matrix 150A, and the second control functionality, referred to as “matrix” control, is illustrated as matrix controller (“MARC”) of matrix 150B. The kernel and matrix control functions of the controller 120 are explained in greater detail below, with reference to the configurability and reconfigurability of the various matrices 150, and with reference to the exemplary form of combined data, configuration and control information referred to herein as a “silverware” module. The kernel controller is also referred to as a “K-node”, discussed in greater detail below with reference to FIGS. 10 and 11.
  • The matrix interconnection network (“MIN”) 110 of FIG. 1, and its subset interconnection networks separately illustrated in FIGS. 3 and 4 (Boolean interconnection network 210, data interconnection network 240, and interconnect 220), individually, collectively and generally referred to herein as “interconnect”, “interconnection(s)” or “interconnection network(s)”, may be implemented generally as known in the art, such as utilizing FPGA interconnection networks or switching fabrics, albeit in a considerably more varied fashion. In the first apparatus embodiment, the various interconnection networks are implemented as described, for example, in U.S. Pat. Nos. 5,218,240, 5,336,950, 5,245,227, and 5,144,166, and also as discussed below and as illustrated with reference to FIGS. 7, 8 and 9. These various interconnection networks provide selectable (or switchable) connections between and among the controller 120, the memory 140, the various matrices 150, and the computational units 200 and computational elements 250 discussed below, providing the physical basis for the configuration and reconfiguration referred to herein, in response to and under the control of configuration signaling generally referred to herein as “configuration information”. In addition, the various interconnection networks (110, 210, 240 and 220) provide selectable or switchable data, input, output, control and configuration paths, between and among the controller 120, the memory 140, the various matrices 150, and the computational units 200 and computational elements 250, in lieu of any form of traditional or separate input/output busses, data busses, DMA, RAM, configuration and instruction busses. In the second apparatus embodiment, the various interconnection networks are implemented as described below with reference to FIGS. 12 and 13, using various combinations of routing elements, such as token rings or arbiters, and multiplexers, at varying levels within the system and apparatus embodiments of the invention of the related application.
  • It should be pointed out, however, that while any given level of switching or selecting operation of or within the various interconnection networks (110, 210, 240 and 220) may be implemented as known in the art, the combinations of routing elements and multiplexing elements, the use of different routing elements and multiplexing elements at differing levels within the system, and the design and layout of the various interconnection networks (110, 210, 240 and 220), are new and novel, as discussed in greater detail below. For example, varying levels of interconnection are provided to correspond to the varying levels of the matrices 150, the computational units 200, and the computational elements 250, discussed below. At the matrix 150 level, in comparison with the prior art FPGA interconnect, the matrix interconnection network 110 is considerably more limited and less “rich”, with lesser connection capability in a given area, to reduce capacitance and increase speed of operation. Within a particular matrix 150 or computational unit 200, however, the interconnection network (210, 220 and 240) may be considerably more dense and rich, to provide greater adaptation and reconfiguration capability within a narrow or close locality of reference.
  • The various matrices or nodes 150 are reconfigurable and heterogeneous, namely, in general, and depending upon the desired configuration: reconfigurable matrix 150A is generally different from reconfigurable matrices 150B through 150N; reconfigurable matrix 150B is generally different from reconfigurable matrices 150A and 150C through 150N; reconfigurable matrix 150C is generally different from reconfigurable matrices 150A, 150B and 150D through 150N, and so on. The various reconfigurable matrices 150 each generally contain a different or varied mix of adaptive and reconfigurable computational (or computation) units (200); the computational units 200, in turn, generally contain a different or varied mix of fixed, application specific computational elements (250), discussed in greater detail below with reference to FIGS. 3 and 4, which may be adaptively connected, configured and reconfigured in various ways to perform varied functions, through the various interconnection networks. In addition to varied internal configurations and reconfigurations, the various matrices 150 may be connected, configured and reconfigured at a higher level, with respect to each of the other matrices 150, through the matrix interconnection network 110, also as discussed in greater detail below.
  • Several different, insightful and novel concepts are incorporated within the ACE 100 architecture of the invention of the related application, and provide a useful explanatory basis for the real time operation of the ACE 100 and its inherent advantages.
  • The first novel concepts concern the adaptive and reconfigurable use of application specific, dedicated or fixed hardware units (computational elements 250), and the selection of particular functions for acceleration, to be included within these application specific, dedicated or fixed hardware units (computational elements 250) within the computational units 200 (FIG. 3) of the matrices 150, such as pluralities of multipliers, complex multipliers, and adders, each of which are designed for optimal execution of corresponding multiplication, complex multiplication, and addition functions. Given that the ACE 100 is to be optimized, in the first apparatus embodiment, for low power consumption, the functions for acceleration are selected based upon power consumption. For example, for a given application such as mobile communication, corresponding C (C# or C++) or other code may be analyzed for power consumption. Such empirical analysis may reveal, for example, that a small portion of such code, such as 10%, actually consumes 90% of the operating power when executed. On the basis of such power utilization, this small portion of code is selected for acceleration within certain types of the reconfigurable matrices 150, with the remaining code, for example, adapted to run within matrices 150 configured as controller 120. Additional code may also be selected for acceleration, resulting in an optimization of power consumption by the ACE 100, up to any potential trade-off resulting from design or operational complexity. In addition, as discussed with respect to FIG. 3, other functionality, such as control code, may be accelerated within matrices 150 when configured as finite state machines.
  • Next, the ACE 100 utilizes a data flow model for all processes and computations. Algorithms or other functions selected for acceleration may be converted into a form which may be represented as a “data flow graph” (“DFG”). A schematic diagram of an exemplary data flow graph is illustrated in FIG. 2. As illustrated in FIG. 2, an algorithm or function useful for CDMA voice coding (QCELP (Qualcomm code excited linear prediction)) is implemented utilizing four multipliers 190 followed by four adders 195. Through the varying levels of interconnect, the algorithms of this data flow graph are then implemented, at any given time, through the configuration and reconfiguration of fixed computational elements (250), namely, implemented within hardware which has been optimized and configured for efficiency, i.e., a “machine” is configured in real time which is optimized to perform the particular algorithm. Continuing with the exemplary DFG or FIG. 2, four fixed or dedicated multipliers, as computational elements 250, and four fixed or dedicated adders, also as different computational elements 250, are configured in real time through the interconnect to perform the functions or algorithms of the particular DFG. Using this data flow model, data which is produced, such as by the multipliers 190, is immediately consumed, such as by adders 195.
  • The third and perhaps most significant concept, and a marked departure from the concepts and precepts of the prior art, is the concept of reconfigurable “heterogeneity” utilized to implement the various selected algorithms mentioned above. As indicated above, prior art reconfigurability has relied exclusively on homogeneous FPGAs, in which identical blocks of logic gates are repeated as an array within a rich, programmable interconnect, with the interconnect subsequently configured to provide connections between and among the identical gates to implement a particular function, albeit inefficiently and often with routing and combinatorial problems. In stark contrast, within computation units 200, different computational elements (250) are implemented directly as correspondingly different fixed (or dedicated) application specific hardware, such as dedicated multipliers, complex multipliers, accumulators, arithmetic logic units (ALUs), registers, and adders. Utilizing interconnect (210 and 220), these differing, heterogeneous computational elements (250) may then be adaptively configured, in real time, to perform the selected algorithm, such as the performance of discrete cosine transformations often utilized in mobile communications. For the data flow graph example of FIG. 2, four multipliers and four adders will be configured, i.e., connected in real time, to perform the particular algorithm. As a consequence, different (“heterogeneous”) computational elements (250) are configured and reconfigured, at any given time, to optimally perform a given algorithm or other function. In addition, for repetitive functions, a given instantiation or configuration of computational elements may also remain in place over time, i.e., unchanged, throughout the course of such repetitive calculations.
  • The temporal nature of the ACE 100 architecture should also be noted. At any given instant of time, utilizing different levels of interconnect (110, 210, 240 and 220), a particular configuration may exist within the ACE 100 which has been optimized to perform a given function or implement a particular algorithm. At another instant in time, the configuration may be changed, to interconnect other computational elements (250) or connect the same computational elements 250 differently, for the performance of another function or algorithm. Two important features arise from this temporal reconfigurability. First, as algorithms may change over time to, for example, implement a new technology standard, the ACE 100 may co-evolve and be reconfigured to implement the new algorithm. For a simplified example, a fifth multiplier and a fifth adder may be incorporated into the DFG of FIG. 2 to execute a correspondingly new algorithm, with additional interconnect also potentially utilized to implement any additional bussing functionality. Second, because computational elements are interconnected at one instant in time, as an instantiation of a given algorithm, and then reconfigured at another instant in time for performance of another, different algorithm, gate (or transistor) utilization is maximized, providing significantly better performance than the most efficient ASICs relative to their activity factors.
  • This temporal reconfigurability of computational elements 250, for the performance of various different algorithms, also illustrates a conceptual distinction utilized herein between adaptation (configuration and reconfiguration), on the one hand, and programming or reprogrammability, on the other hand. Typical programmability utilizes a pre-existing group or set of functions, which may be called in various orders, over time, to implement a particular algorithm. In contrast, configurability and reconfigurability (or adaptation), as used herein, includes the additional capability of adding or creating new functions which were previously unavailable or non-existent.
  • Next, the present and related inventions also utilize a tight coupling (or interdigitation) of data and configuration (or other control) information, within one, effectively continuous stream of information. This coupling or commingling of data and configuration information, referred to as a “silverware” module, is the subject of a separate, related patent application. For purposes of the present invention, however, it is sufficient to note that this coupling of data and configuration information into one information (or bit) stream helps to enable real time reconfigurability of the ACE 100, without a need for the (often unused) multiple, overlaying networks of hardware interconnections of the prior art. For example, as an analogy, a particular, first configuration of computational elements at a particular, first period of time, as the hardware to execute a corresponding algorithm during or after that first period of time, may be viewed or conceptualized as a hardware analog of “calling” a subroutine in software which may perform the same algorithm. As a consequence, once the configuration of the computational elements 250 has occurred (i.e., is in place), as directed by the configuration information, the data for use in the algorithm is immediately available as part of the silverware module. The same computational elements may then be reconfigured for a second period of time, as directed by second configuration information, for execution of a second, different algorithm, also utilizing immediately available data. The immediacy of the data, for use in the configured computational elements 250, provides a one or two clock cycle hardware analog to the multiple and separate software steps of determining a memory address and fetching stored data from the addressed registers. This has the further result of additional efficiency, as the configured computational elements may execute, in comparatively few clock cycles, an algorithm which may require orders of magnitude more clock cycles for execution if called as a subroutine in a conventional microprocessor or DSP.
  • This use of silverware modules, as a commingling of data and configuration information, in conjunction with the real time reconfigurability of a plurality of heterogeneous and fixed computational elements 250 to form adaptive, different and heterogeneous computation units 200 and matrices 150, enables the ACE 100 architecture to have multiple and different modes of operation. For example, when included within a hand-held device, given a corresponding silverware module, the ACE 100 may have various and different operating modes as a cellular or other mobile telephone, a music player, a pager, a personal digital assistant, and other new or existing functionalities. In addition, these operating modes may change based upon the physical location of the device; for example, when configured as a CDMA mobile telephone for use in the United States, the ACE 100 may be reconfigured as a GSM mobile telephone for use in Europe.
  • Referring again to FIG. 1, the functions of the controller 120 (preferably matrix (KARC) 150A and matrix (MARC) 150B, configured as finite state machines) may be explained: (1) with reference to a silverware module, namely, the tight coupling of data and configuration information within a single stream of information; (2) with reference to multiple potential modes of operation; (3) with reference to the reconfigurable matrices 150; and (4) with reference to the reconfigurable computation units 200 and the computational elements 150 illustrated in FIG. 3. As indicated above, through a silverware module, the ACE 100 may be configured or reconfigured to perform a new or additional function, such as an upgrade to a new technology standard or the addition of an entirely new function, such as the addition of a music function to a mobile communication device. Such a silverware module may be stored in the matrices 150 of memory 140, or may be input from an external (wired or wireless) source through, for example, matrix interconnection network 110. In the first apparatus embodiment, one of the plurality of matrices 150 is configured to decrypt such a module and verify its validity, for security purposes. Next, prior to any configuration or reconfiguration of existing ACE 100 resources, the controller 120, through the matrix (KARC) 150A, checks and verifies that the configuration or reconfiguration may occur without adversely affecting any pre-existing functionality, such as whether the addition of music functionality would adversely affect pre-existing mobile communications functionality. In the first apparatus embodiment, the system requirements for such configuration or reconfiguration are included within the silverware module, for use by the matrix (KARC) 150A in performing this evaluative function. If the configuration or reconfiguration may occur without such adverse affects, the silverware module is allowed to load into the matrices 150 of memory 140, with the matrix (KARC) 150A setting up the DMA engines within the matrices 150C and 150D of the memory 140 (or other stand-alone DMA engines of a conventional memory). If the configuration or reconfiguration would or may have such adverse affects, the matrix (KARC) 150A does not allow the new module to be incorporated within the ACE 100. Additional functions of the kernel controller, as a K-node, are discussed in greater detail below.
  • Continuing to refer to FIG. 1, the matrix (MARC) 150B manages the scheduling of matrix 150 resources and the timing of any corresponding data, to synchronize any configuration or reconfiguration of the various computational elements 250 and computation units 200 with any corresponding input data and output data. In the first apparatus embodiment, timing information is also included within a silverware module, to allow the matrix (MARC) 150B through the various interconnection networks to direct a reconfiguration of the various matrices 150 in time, and preferably just in time, for the reconfiguration to occur before corresponding data has appeared at any inputs of the various reconfigured computation units 200. In addition, the matrix (MARC) 150B may also perform any residual processing which has not been accelerated within any of the various matrices 150. As a consequence, the matrix (MARC) 150B may be viewed as a control unit which “calls” the configurations and reconfigurations of the matrices 150, computation units 200 and computational elements 250, in real time, in synchronization with any corresponding data to be utilized by these various reconfigurable hardware units, and which performs any residual or other control processing. Other matrices 150 may also include this control functionality, with any given matrix 150 capable of calling and controlling a configuration and reconfiguration of other matrices 150. This matrix control functionality may also be combined with kernel control, such as in the K-node, discussed below.
  • FIG. 3 is a block diagram illustrating, in greater detail, a reconfigurable matrix (or node) 150 with a plurality of computation units 200 (illustrated as computation units 200A through 200N), and a plurality of computational elements 250 (illustrated as computational elements 250A through 250Z), and provides additional illustration of the exemplary types of computational elements 250 and a useful summary. As illustrated in FIG. 3, any matrix 150 generally includes a matrix controller 230, a plurality of computation (or computational) units 200, and as logical or conceptual subsets or portions of the matrix interconnect network 110, a data interconnect network 240 and a Boolean interconnect network 210. The matrix controller 230 may also be implemented as a hardware task manager, discussed below with reference to FIG. 10. As mentioned above, in the first apparatus embodiment, at increasing “depths” within the ACE 100 architecture, the interconnect networks become increasingly rich, for greater levels of adaptability and reconfiguration. The Boolean interconnect network 210, also as mentioned above, provides the reconfiguration and data interconnection capability between and among the various computation units 200, and is preferably small (i.e., only a few bits wide), while the data interconnect network 240 provides the reconfiguration and data interconnection capability for data input and output between and among the various computation units 200, and is preferably comparatively large (i.e., many bits wide). It should be noted, however, that while conceptually divided into reconfiguration and data capabilities, any given physical portion of the matrix interconnection network 110, at any given time, may be operating as either the Boolean interconnect network 210, the data interconnect network 240, the lowest level interconnect 220 (between and among the various computational elements 250), or other input, output, or connection functionality. It should also be noted that other, exemplary forms of interconnect are discussed in greater detail below with reference to FIGS. 11-13.
  • Continuing to refer to FIG. 3, included within a computation unit 200 are a plurality of computational elements 250, illustrated as computational elements 250A through 250Z (individually and collectively referred to as computational elements 250), and additional interconnect 220. The interconnect 220 provides the reconfigurable interconnection capability and input/output paths between and among the various computational elements 250. As indicated above, each of the various computational elements 250 consist of dedicated, application specific hardware designed to perform a given task or range of tasks, resulting in a plurality of different, fixed computational elements 250. Utilizing the interconnect 220, the fixed computational elements 250 may be reconfigurably connected together into adaptive and varied computational units 200, which also may be further reconfigured and interconnected, to execute an algorithm or other function, at any given time, such as the quadruple multiplications and additions of the DFG of FIG. 2, utilizing the interconnect 220, the Boolean network 210, and the matrix interconnection network 110. For example, using the multiplexing or routing capabilities discussed below, the inputs/outputs of a computational element 250 may be coupled to outputs/inputs of a first set of (other) computational elements 250, for performance of a first function or algorithm, and subsequently adapted or reconfigured, such that these inputs/outputs are coupled to outputs/inputs of a second set of (other) computational elements 250, for performance of a second function or algorithm.
  • In the first apparatus embodiment, the various computational elements 250 are designed and grouped together, into the various adaptive and reconfigurable computation units 200 (as illustrated, for example, in FIGS. 5A through 9). In addition to computational elements 250 which are designed to execute a particular algorithm or function, such as multiplication or addition, other types of computational elements 250 are also utilized in the first apparatus embodiment. As illustrated in FIG. 3, computational elements 250A and 250B implement memory, to provide local memory elements for any given calculation or processing function (compared to the more “remote” memory 140). In addition, computational elements 250I, 250J, 250K and 250L are configured to implement finite state machines (using, for example, the computational elements illustrated in FIGS. 7, 8 and 9), to provide local processing capability (compared to the more “remote” matrix (MARC) 150B), especially suitable for complicated control processing, and which may be utilized within the hardware task manager, discussed below.
  • With the various types of different computational elements 250 which may be available, depending upon the desired functionality of the ACE 100, the computation units 200 may be loosely categorized. A first category of computation units 200 includes computational elements 250 performing linear operations, such as multiplication, addition, finite impulse response filtering, and so on (as illustrated below, for example, with reference to FIGS. 5A through 5E and FIG. 6). A second category of computation units 200 includes computational elements 250 performing non-linear operations, such as discrete cosine transformation, trigonometric calculations, and complex multiplications. A third type of computation unit 200 implements a finite state machine, such as computation unit 200C as illustrated in FIG. 3 and as illustrated in greater detail below with respect to FIGS. 7 through 9), particularly useful for complicated control sequences, dynamic scheduling, and input/output management, while a fourth type may implement memory and memory management, such as computation unit 200A as illustrated in FIG. 3. Lastly, a fifth type of computation unit 200 may be included to perform bit-level manipulation, such as for encryption, decryption, channel coding, Viterbi decoding, and packet and protocol processing (such as Internet Protocol processing).
  • In the first apparatus embodiment, in addition to control from other matrices or nodes 150, a matrix controller 230 may also be included within any given matrix 150, also to provide greater locality of reference and control of any reconfiguration processes and any corresponding data manipulations. For example, once a reconfiguration of computational elements 250 has occurred within any given computation unit 200, the matrix controller 230 may direct that that particular instantiation (or configuration) remain intact for a certain period of time to, for example, continue repetitive data processing for a given application.
  • As indicated above, the plurality of heterogeneous computational elements 250 may be configured and reconfigured, through the levels of the interconnect network (110, 210, 220, 240), for performance of a plurality of functional or operational modes, such as linear operations, non-linear operations, finite state machine operations, memory and memory management, and bit-level manipulation. This configuration and reconfiguration of the plurality of heterogeneous computational elements 250 through the levels of the interconnect network (110, 210, 220, 240), however, may be conceptualized on another, higher or more abstract level, namely, configuration and reconfiguration for the performance of a plurality of algorithmic elements.
  • At this more abstract level of the algorithmic element, the performance of any one of the algorithmic elements may be considered to require a simultaneous performance of a plurality of the lower-level functions or operations, such as move, input, output, add, subtract, multiply, complex multiply, divide, shift, multiply and accumulate, and so on, using a configuration (and reconfiguration) of computational elements having a plurality of fixed architectures such as memory, addition, multiplication, complex multiplication, subtraction, synchronization, queuing, over sampling, under sampling, adaptation, configuration, reconfiguration, control, input, output, and field programmability.
  • When such a plurality of fixed architectures are configured and reconfigured for performance of an entire algorithmic element, this performance may occur using comparatively few clock cycles, compared to the orders of magnitude more clock cycles typically required. The algorithmic elements may be selected from a plurality of algorithmic elements comprising, for example: a radix-2 Fast Fourier Transformation (FFT), a radix-4 Fast Fourier Transformation (FFT), a radix-2 inverse Fast Fourier Transformation (IFFN), a radix IFFT, a one-dimensional Discrete Cosine Transformation (DCT), a multi-dimensional Discrete Cosine Transformation (DCT), finite impulse response (FIR) filtering, convolutional encoding, scrambling, puncturing, interleaving, modulation mapping, Golay correlation, OVSF code generation, Haddamard Transformation, Turbo Decoding, bit correlation, Gdiffiths LMS algorithm, variable length encoding, uplink scrambling code generation, downlink scrambling code generation, downlink despreading, uplink spreading, uplink concatenation, Viterbi encoding, Viterbi decoding, cyclic redundancy coding (CRC), complex multiplication, data compression, motion compensation, channel searching, channel acquisition, and multipath correlation. Numerous other algorithmic element examples are discussed in greater detail below with reference to FIG. 10.
  • In another embodiment of the ACE 100, one or more of the matrices (or nodes) 150 may be designed to be application specific, having a fixed architecture with a corresponding fixed function (or predetermined application), rather than being comprised of a plurality of heterogeneous computational elements which may be configured and reconfigured for performance of a plurality of operations, functions, or algorithmic elements. For example, an analog-to-digital (A/D) or digital-to-analog (D/A) converter may be implemented without adaptive capability. As discussed in greater detail below, common node (matrix) functions also may be implemented without adaptive capability, such as the node wrapper functions discussed below. Under various circumstances, however, the fixed function node may be capable of parameter adjustment for performance of the predetermined application. For example, the parameter adjustment may comprise changing one or more of the following parameters: a number of filter coefficients, a number of parallel input bits, a number of parallel output bits, a number of selected points for Fast Fourier Transformation, a number of bits of precision, a code rate, a number of bits of interpolation of a trigonometric function, and real or complex number valuation. This fixed function node (or matrix) 150, which may be parametizable, will typically be utilized in circumstances where an algorithmic element is used on a virtually continuous basis, such as in certain types of communications or computing applications.
  • For example, the fixed function node 150 may be a microprocessor (such as a RISC processor), a digital signal processor (DSP), a co-processor, a parallel processor, a controller, a microcontroller, a finite state machine, and so on (with the term “processor” utilized herein to individually or collectively refer, generally and inclusively, to any of the types of processors mentioned above and their equivalents), and may or may not have an embedded operating system. Such a controller or processor fixed function node 150 may be utilized for the various KARC 150A or MARC 150B applications mentioned above, such as providing configuration information to the interconnection network, directing and scheduling the configuration of the plurality of heterogeneous computational elements 250 of the other nodes 150 for performance of the various functional modes or algorithmic elements, or timing and scheduling the configuration and reconfiguration of the plurality of heterogeneous computational elements with corresponding data. In other applications, also for example, the fixed function node may be a cascaded integrated comb (CIC) filter or a parameterized, cascaded integrated comb (CIC) filter; a finite impulse response (FIR) filter or a finite impulse response (FIR) filter parameterized for variable filter length; or an A/D or D/A converter.
  • FIG. 4 is a block diagram illustrating, in greater detail, an exemplary or representative computation unit 200 of a reconfigurable matrix 150. As illustrated in FIG. 4, a computation unit 200 typically includes a plurality of diverse, heterogeneous and fixed computational elements 250, such as a plurality of memory computational elements 250A and 250B, and forming a computational unit (“CU”) core 260, a plurality of algorithmic or finite state machine computational elements 250C through 250K. As discussed above, each computational element 250, of the plurality of diverse computational elements 250, is a fixed or dedicated, application specific circuit, designed and having a corresponding logic gate layout to perform a specific function or algorithm, such as addition or multiplication. In addition, the various memory computational elements 250A and 250B may be implemented with various bit depths, such as RAM (having significant depth), or as a register, having a depth of 1 or 2 bits.
  • Formning the conceptual data and Boolean interconnect networks 240 and 210, respectively, the exemplary computation unit 200 also includes a plurality of input multiplexers 280, a plurality of input lines (or wires) 281, and for the output of the CU core 260 (illustrated as line or wire 270), a plurality of output demultiplexers 285 and 290, and a plurality of output lines (or wires) 291. Through the input multiplexers 280, an appropriate input line 281 may be selected for input use in data transformation and in the configuration and interconnection processes, and through the output demultiplexers 285 and 290, an output or multiple outputs may be placed on a selected output line 291, also for use in additional data transformation and in the configuration and interconnection processes.
  • In the first apparatus embodiment, the selection of various input and output lines 281 and 291, and the creation of various connections through the interconnect (210, 220 and 240), is under control of control bits 265 from a computational unit controller 255, as discussed below. Based upon these control bits 265, any of the various input enables 251, input selects 252, output selects 253, MUX selects 254, DEMUX enables 256, DEMUX selects 257, and DEMUX output selects 258, may be activated or deactivated.
  • The exemplary computation unit 200 includes the computation unit controller 255 which provides control, through control bits 265, over what each computational element 250, interconnect (210, 220 and 240), and other elements (above) does with every clock cycle. Not separately illustrated, through the interconnect (210, 220 and 240), the various control bits 265 are distributed, as may be needed, to the various portions of the computation unit 200, such as the various input enables 251, input selects 252, output selects 253, MUX selects 254, DEMUX enables 256, DEMUX selects 257, and DEMUX output selects 258. The CU controller 255 also includes one or more lines 295 for reception of control (or configuration) information and transmission of status information.
  • As mentioned above, the interconnect may include a conceptual division into a data interconnect network 240 and a Boolean interconnect network 210, of varying bit widths, as mentioned above. In general, the (wider) data interconnection network 240 is utilized for creating configurable and reconfigurable connections, for corresponding routing of data and configuration information. The (narrower) Boolean interconnect network 210, while also utilized for creating configurable and reconfigurable connections, is utilized for control of logic (or Boolean) decisions of the various data flow graphs, generating decision nodes in such DFGs, and may also be used for data routing within such DFGs.
  • FIGS. 5A through 5E are block diagrams illustrating, in detail, exemplary fixed and specific computational elements, forming computational units. As will be apparent from review of these Figures, many of the same fixed computational elements are utilized, with varying configurations, for the performance of different algorithms.
  • FIG. 5A is a block diagram illustrating a four-point asymmetric finite impulse response (FIR) filter computational unit 300. As illustrated, this exemplary computational unit 300 includes a particular, first configuration of a plurality of fixed computational elements, including coefficient memory 305, data memory 310, registers 315, 320 and 325, multiplier 330, adder 335, and accumulator registers 340, 345, 350 and 355, with multiplexers (MUXes) 360 and 365 forming a portion of the interconnection network (210, 220 and 240).
  • FIG. 5B is a block diagram illustrating a two-point symmetric finite impulse response (FIR) filter computational unit 370. As illustrated, this exemplary computational unit 370 includes a second configuration of a plurality of fixed computational elements, including coefficient memory 305, data memory 310, registers 315, 320 and 325, multiplier 330, adder 335, second adder 375, and accumulator registers 340 and 345, also with multiplexers (MUXes) 360 and 365 forming a portion of the interconnection network (210, 220 and 240).
  • FIG. 5C is a block diagram illustrating a subunit for a fast Fourier transform (FFT) computational unit 400. As illustrated, this exemplary computational unit 400 includes a third configuration of a plurality of fixed computational elements, including coefficient memory 305, data memory 310, registers 315, 320, 325 and 385, multiplier 330, adder 335, and adder/subtracter 380, with multiplexers (MUXes) 360, 365, 390, 395 and 405 forming a portion of the interconnection network (210, 220 and 240).
  • FIG. 5D is a block diagram illustrating a complex finite impulse response (FIR) filter computational unit 440. As illustrated, this exemplary computational unit 440 includes a fourth configuration of a plurality of fixed computational elements, including memory 410, registers 315 and 320, multiplier 330, adder/subtracter 380, and real and imaginary accumulator registers 415 and 420, also with multiplexers (MUXes) 360 and 365 forming a portion of the interconnection network (210, 220 and 240).
  • FIG. 5E is a block diagram illustrating a biquad infinite impulse response (FIR) filter computational unit 450, with a corresponding data flow graph 460. As illustrated, this exemplary computational unit 450 includes a fifth configuration of a plurality of fixed computational elements, including coefficient memory 305, input memory 490, registers 470, 475, 480 and 485, multiplier 330, and adder 335, with multiplexers (MUXes) 360, 365, 390 and 395 forming a portion of the interconnection network (210, 220 and 240).
  • FIG. 6 is a block diagram illustrating, in detail, an exemplary multi-function adaptive computational unit 500 having a plurality of different, fixed computational elements. When configured accordingly, the adaptive computation unit 500 performs each of the various functions previously illustrated with reference to FIGS. 5A though 5E, plus other functions such as discrete cosine transformation. As illustrated, this multi-function adaptive computational unit 500 includes capability for a plurality of configurations of a plurality of fixed computational elements, including input memory 520, data memory 525, registers 530 (illustrated as registers 530A through 530Q), multipliers 540 (illustrated as multipliers 540A through 540D), adder 545, first arithmetic logic unit (ALU) 550 (illustrated as ALU_1s 550A through 550D), second arithmetic logic unit (ALU) 555 (illustrated as ALU_2s 555A through 555D), and pipeline (length 1) register 560, with inputs 505, lines 515, outputs 570, and multiplexers (MUXes or MXes) 510 (illustrates as MUXes and MXes 510A through 510KK) forming an interconnection network (210, 220 and 240). The two different ALUs 550 and 555 are preferably utilized, for example, for parallel addition and subtraction operations, particularly useful for radix 2 operations in discrete cosine transformation.
  • FIG. 7 is a block diagram illustrating, in detail, an exemplary adaptive logic processor (ALP) computational unit 600 having a plurality of fixed computational elements. The ALP 600 is highly adaptable, and is preferably utilized for input/output configuration, finite state machine implementation, general field programmability, and bit manipulation. The fixed computational element of ALP 600 is a portion (650) of each of the plurality of adaptive core cells (CCs) 610 (FIG. 8), as separately illustrated in FIG. 9. An interconnection network (210, 220 and 240) is formed from various combinations and permutations of the pluralities of vertical inputs (VIs) 615, vertical repeaters (VRs) 620, vertical outputs (VOs) 625, horizontal repeaters (HRs) 630, horizontal terminators (HTs) 635, and horizontal controllers (HCs) 640.
  • FIG. 8 is a block diagram illustrating, in greater detail, an exemplary core cell 610 of an adaptive logic processor computational unit 600 with a fixed computational element 650. The fixed computational element is a 3 input-2 output function generator 550, separately illustrated in FIG. 9. The preferred core cell 610 also includes control logic 655, control inputs 665, control outputs 670 (providing output interconnect), output 675, and inputs (with interconnect muxes) 660 (providing input interconnect).
  • FIG. 9 is a block diagram illustrating, in greater detail, an exemplary fixed computational element 650 of a core cell 610 of an adaptive logic processor computational unit 600. The fixed computational element 650 is comprised of a fixed layout of pluralities of exclusive NOR (XNOR) gates 680, NOR gates 685, NAND gates 690, and exclusive OR (XOR) gates 695, with three inputs 720 and two outputs 710. Configuration and interconnection is provided through MUX 705 and interconnect inputs 730. FIG. 10 is a block diagram illustrating a prototypical node or matrix 800 comprising the second apparatus embodiment of the invention of the related application. The node 800 is connected to other nodes 150 within the ACE 100 through the matrix interconnection network 110. The prototypical node 800 includes a fixed (and non-reconfigurable) “node wrapper”, an adaptive (reconfigurable) execution unit 840, and a memory 845 (which also may be variable). This fixed and non-reconfigurable “node wrapper” includes an input pipeline register 815, a data decoder and distributor 820, a hardware task manager 810, an address register 825 (optional), a DMA engine 830 (optional), a data aggregator and selector 850, and an output pipeline register 855. These components comprising the node wrapper are generally common to all nodes of the ACE 100, and are comprised of fixed architectures (i.e., application-specific or non-reconfigurable architectures). As a consequence, the node or matrix 800 is a unique blend of fixed, non-reconfigurable node wrapper components, memory, and the reconfigurable components of an adaptive execution unit 840 (which, in turn, are comprised of fixed computational elements and an interconnection network).
  • Various nodes 800, in general, will have a distinctive and variably-sized adaptive execution unit 840, tailored for one or more particular applications or algorithms, and a memory 845, also implemented in various sizes depending upon the requirements of the adaptive execution unit 840. An adaptive execution unit 840 for a given node 800 will generally be different than the adaptive execution units 840 of the other nodes 800. Each adaptive execution unit 840 is reconfigurable in response to configuration information, and is comprised of a plurality of computation units 200, which are in turn further comprised of a plurality of computational elements 250, and corresponding interconnect networks 210, 220 and 240. Particular adaptive execution units 840 utilized in exemplary embodiments, and the operation of the node 800 and node wrapper, are discussed in greater detail below.
  • FIG. 11 is a block diagram illustrating a first system embodiment 900 in accordance with the invention of the related application. This first system 900 may be included as part of a larger system or host environment, such as within a computer or communications device, for example. FIG. 11 illustrates a “root” level of such a system 100, where global resources have connectivity (or otherwise may be found). At this root level, the first system 900 includes one or more adaptive cores 950, external (off-IC or off-chip) memory 905 (such as SDRAM), host (system) input and output connections, and network (MIN 110) input and output connections (for additional adaptive cores 950). Each adaptive core 950 includes (on-IC or on-chip) memory 920, a “K-node” 925, and one or more sets of nodes (150, 800) referred to as a node quadrant 930. The K-node 925 (like the kernel controller 150A) provides an operating system for the adaptive core 950.
  • Generally, each node quadrant 930 consists of 16 nodes in a scalable by-four (×4) fractal arrangement. At this root level, each of these (seven) illustrated elements has total connectivity with all other (six) elements. As a consequence, the output of a root-level element is provided to (and may drive) all other root-level inputs, and the input of each root-level input is provided with the outputs of all other root-level elements. Not separately illustrated, at this root-level of the first system 900, the MIN 110 includes a network with routing (or switching) elements (935), such as round-robin, token ring, cross point switches, or other arbiter elements, and a network (or path) for real time data transfer (or transmission) (such as a data network 240).
  • FIG. 12 is a block diagram illustrating an exemplary node quadrant 930 with routing elements 935. From the root-level, the node quadrant 930 has a tree topology and consists of 16 nodes (150 or 800), with every four nodes connected as a node “quad” 940 having a routing (or switching) element 935. The routing elements may be implemented variously, such as through round-robin, token ring, cross point switches, (four-way) switching, (1/4, 1/3 or 1/2) arbitration or other arbiter or arbitration elements, or depending upon the degree of control overhead which may be tolerable, through other routing or switching elements such as multiplexers and demultiplexers. This by-four fractal architecture provides for routing capability, scalability, and expansion, without logical limitation. The node quadrant 930 is coupled within the first system 900 at the root-level, as illustrated. This by-four fractal architecture also provides for significant and complete connectivity, with the worst-case distance between any node being log4 of “k” hops (or number of nodes) (rather than a linear distance), and provides for avoiding the overhead and capacitance of, for example, busses or full crossbar switches.
  • The node quadrant 930 and node quad 940 structures exhibit a fractal self-similarity with regard to scalability, repeating structures, and expansion. The node quadrant 930 and node quad 940 structures also exhibit a fractal self-similarity with regard to a heterogeneity of the plurality of heterogeneous and reconfigurable nodes 800, heterogeneity of the plurality of heterogeneous computation units 200, and heterogeneity of the plurality of heterogeneous computational elements 250. With regard to the increasing heterogeneity, the adaptive computing integrated circuit 900 exhibits increasing heterogeneity from a first level of the plurality of heterogeneous and reconfigurable matrices, to a second level of the plurality of heterogeneous computation units, and further to a third level of the plurality of heterogeneous computational elements. The plurality of interconnection levels also exhibits a fractal self-similarity with regard to each interconnection level of the plurality of interconnection levels. At increasing depths within the ACE 100, from the matrix 150 level to the computation unit 200 level and further to the computational element 250 level, the interconnection network is increasingly rich, providing an increasing amount of bandwidth and an increasing number of connections or connectability for a correspondingly increased level of reconfigurability. As a consequence, the matrix-level interconnection network, the computation unit-level interconnection network, and the computational element-level interconnection network also constitute a fractal arrangement.
  • Referring to FIGS. 11 and 12, and as explained in greater detail below, the system embodiment 900 utilizes point-to-point service for streaming data and configuration information transfer, using a data packet (or data structure) discussed below. A packet-switched protocol is utilized for this communication, and in an exemplary embodiment the packet length is limited to a length of 51 bits, with a one word (32 bits) data payload, to obviate any need for data buffering. The routing information within the data packet provides for selecting the particular adaptive core 950, followed by selecting root-level (or not) of the selected adaptive core 950, followed by selecting a particular node (110 or 800) of the selected adaptive core 950. This selection path may be visualized by following the illustrated connections of FIGS. 11 and 12. Routing of data packets out of a particular node may be performed similarly, or may be provided more directly, such as by switching or arbitrating within a node 800 or quad 940, as discussed below.
  • FIG. 13 is a block diagram illustrating exemplary network interconnections into and out of nodes 800 and node quads 940. Referring to FIG. 13, MIN 100 connections into a node, via a routing element 935, include a common input 945 (provided to all four nodes 800 within a quad 940), and inputs from the other (three) “peer” nodes within the particular quad 940. For example, outputs from peer nodes 1, 2 and 3 are utilized for input into node 0, and so on. At this level, the routing element 935 may be implemented, for example, as a round-robin, token ring, arbiter, cross point switch, or other four-way switching element. The output from the routing element 935 is provided to a multiplexer 955 (or other switching element) for the corresponding node 800, along with a feedback input 960 from the corresponding node 800, and an input for real time data (from data network 240) (to provide a fast track for input of real time data into nodes 800). The multiplexer 955 (or other switching element) provides selection (switching or arbitration) of one of 3 inputs, namely, selection of input from the selected peer or common 945, selection of input from the same node as feedback, or selection of input of real time data, with the output of the multiplexer 955 provided as the network (MIN 110) input into the corresponding node 800 (via the node's pipeline register 815). While not separately illustrated in FIG. 13, it should be noted that the various inputs into the pipeline register 815 of a node 800 and outputs from the pipeline register 855 from a node 800 are each in the form of a bus, preferably a 32-bit parallel bus. Each separate line or input (output) of the (32-bit) bus is referred to herein as a “port”, and is assigned a port number (5 bits) which maps to memory 845, which is referred to as a port identifier (or port ID).
  • The node 800 output is provided to the data aggregator and selector (“DAS”) 850 within the node 800, which determines the routing of output information to the node itself (same node feedback), to the network (MIN 110) (for routing to another node or other system element), or to the data network 240 (for real time data output). As indicated above, this output is provided using a 32-bit output bus, with each output port of the bus also referred to using an (output) port identifier. When the output information is selected for routing to the MIN 110, the output from the DAS 850 is provided to the corresponding output routing element 935, which routes the output information to peer nodes within the quad 940 or to another, subsequent routing element 935 for routing out of the particular quad 940 through a common output 965 (such for routing to another node quad 940, node quadrant 930, or adaptive core 950).
  • FIG. 14 is a block diagram illustrating an exemplary data structure embodiment. The system embodiment 900 utilizes point-to-point data and configuration information transfer, using a data packet (as an exemplary data structure) 970, and may be considered as an exemplary form of “silverware”, as previously described herein. The exemplary data packet 970 provides for 51 bits per packet, with 8 bits provided for a routing field (971), 1 bit for a security field (972), 4 bits for a service code field (973), 6 bits for an auxiliary field (974), and 32 bits (one word length) for data (as a data payload or data field) (975). As indicated above, the routing field 971 may be further divided into fields for adaptive core selection (976), root selection (977), and node selection (978). In this selected 51-bit embodiment, up to four adaptive cores may be selected, and up to 32 nodes per adaptive core. As the packet is being routed, the routing bits may be stripped from the packet as they are being used in the routing process. The service code field 973 provides for designations such as point-to-point inter-process communication, acknowledgements for data flow control, “peeks” and “pokes” (as coined terminology referring to reads and writes by the K-node into memory 845), DMA operations (for memory moves), and random addressing for reads and writes to memory 845. The auxiliary (AUX) field 974 supports up to 32 streams for any of up to 32 tasks for execution on the adaptive execution unit 840, as discussed below, and may be considered to be a configuration information payload. The one word length (32-bit) data payload is then provided in the data field 975. The exemplary data structure 970 (as a data packet) illustrates the interdigitation of data and configuration/control information, as discussed above.
  • Referring to FIG. 10, in light of the first system 900 structure and data structure discussed above, the node 800 architecture-of the second apparatus embodiment may be described in more detail. The input pipeline register 815 is utilized to receive data and configuration information from the network interconnect 110, through a plurality of input ports. Preferably, the input pipeline register 815 does not permit any data stalls. More particularly, in accordance with the data flow modeling, the input pipeline register 815 should accept new data from the interconnection network 110 every clock period; consequently, the data should also be consumed as it is produced. This imposes the requirement that any contention issues among the input pipeline register 815 and other resources within the node 800 be resolved in favor of the input pipeline register 815, i.e., input data in the input pipeline register has priority in the selection process implemented in various routing (or switching) elements 935, multiplexers 955, or other switching or arbitration elements which may be utilized.
  • The data decoder and distributor 820 interfaces the input pipeline register 815 to the various memories (e.g., 845) and registers (e.g., 825) within the node 800, the hardware task manager 810, and the DMA engine 830, based upon the values in the service and auxiliary fields of the 51-bit data structure. The data decoder 820 also decodes security, service, and auxiliary fields of the 51-bit network data structure (of the configuration information or of operand data) to direct the received word to its intended destination within the node 800.
  • Conversely, data from the node 800 to the network (MIN 110 or to other nodes) is transferred through a plurality of output ports via the output pipeline register 855, which holds data from one of the various memories (845) or registers (e.g., 825 or registers within the adaptive execution unit 840) of the node 800, the adaptive execution unit 840, the DMA engine 830, and/or the hardware task manager 810. Permission to load data into the output pipeline register 855 is granted by the data aggregator and selector (DAS) 850, which arbitrates or selects between and among any competing demands of the various (four) components of the node 800 (namely, requests from the hardware task manager 810, the adaptive execution unit 840, the memory 845, and the DMA engine 830). The data aggregator and selector 850 will issue one and only one grant whenever there is one or more requests and the output pipeline register 855 is available. In the selected embodiment, the priority for issuance of such a grant is, first, for K-node peek (read) data; second, for the adaptive execution unit 840 output data; third, for source DMA data; and fourth, for hardware task manager 810 message data. The output pipeline register 855 is available when it is empty or when its contents will be transferred to another register at the end of the current clock cycle.
  • The DMA engine 830 of the node 800 is an optional component. In general, the DMA engine 830 will follow a five register model, providing a starting address register, an address stride register, a transfer count register, a duty cycle register, and a control register. The control register within the DMA engine 830 utilizes a GO bit, a target node number and/or port number, and a DONE protocol. The K-node 925 writes the registers, sets the GO bit, and receives a DONE message when the data transfer is complete. The DMA engine 830 facilitates block moves from any of the memories of the node 800 to another memory, such as an on-chip bulk memory, external SDRAM memory, another node's memory, or a K-node memory for diagnostics and/or operational purposes. The DMA engine 830, in general, is controlled by the K-node 925.
  • The hardware task manager 810 is configured and controlled by the K-node 925 and interfaces to all node components except the DMA engine 830. The hardware task manager 810 executes on each node 800, processing a task list and producing a task ready-to-run queue implemented as a first in—first out (FIFO) memory. The hardware task manager 810 has a top level finite state machine that interfaces with a number of subordinate finite state machines that control the individual hardware task manager components. The hardware task manager 810 controls the configuration and reconfiguration of the computational elements 250 within the adaptive execution unit 840 for the execution of any given task by the adaptive execution unit 840.
  • The K-node 925 initializes the hardware task manager 810 and provides it with set up information for the tasks needed for a given operating mode, such as operating as a communication processor or an MP3 player. The K-node 925 provides configuration information as stored tasks (i.e., stored tasks or programs) within memory 845 and within local memory within the adaptive execution unit 840. The K-node 925 initializes the hardware task manager 810 (as a parameter table) with designations of input ports, output ports, routing information, the type of operations (tasks) to be executed (e.g., FFF, DCT), and memory pointers. The K-node 925 also initializes the DMA engine 830.
  • The hardware task manager 810 maintains a port translation table and generates addresses for point-to-point data delivery, mapping input port numbers to a current address of where incoming data should be stored in memory 845. The hardware task manager 810 provides data flow control services, tracking both production and consumption of data, using corresponding production and consumption counters, and thereby determines whether a data buffer is available for a given task. The hardware task manager 810 maintains a state table for tasks and, in the selected embodiment, for up to 32 tasks. The state table includes a GO bit (which is enabled or not enabled (suspended) by the K-node 925), a state bit for the task (idle, ready-to-run, run (running)), an input port count, and an output port count (for tracking input data and output data). In the selected embodiment, up to 32 tasks may be enabled at a given time. For a given enabled task, if its state is idle, and if sufficient input data (at the input ports) are available and sufficient output ports are available for output data, its state is changed to ready-to-run and queued for running (transferred into a ready-to-run FIFO or queue). Typically, the adaptive execution unit 840 is provided with configuration information (or code) and two data operands (x and y).
  • From the ready-to-run queue, the task is transferred to an active task queue, the adaptive execution unit 840 is configured for the task (set up), the task is executed by the adaptive execution unit 840, and output data is provided to the data aggregator and selector 850. Following this execution, the adaptive execution unit 840 provides an acknowledgement message to the hardware task manager 810, requesting the next item. The hardware task manager 810 may then direct the adaptive execution unit 840 to continue to process data with the same configuration in place, or to tear down the current configuration, acknowledge completion of the tear down and request the next task from the ready-to-run queue. Once configured for execution of a selected algorithm, new configuration information is not needed from the hardware task manager 810, and the adaptive execution unit 840 functions effectively like an ASIC, with the limited additional overhead of acknowledgement messaging to the hardware task manager 810. These operations are described in additional detail below.
  • A module is a self-contained block of code (for execution by a processor) or a hardware-implemented function (embodied as configured computational elements 250), which is processed or performed by an execution unit 840. A task is an instance of a module, and has four states: suspend, idle, ready or run. A task is created by associating the task to a specific module (computational elements 250) on a specific node 800; by associating physical memories and logical input buffers, logical output buffers, logical input ports and logical output ports of the module; and by initializing configuration parameters for the task. A task is formed by the K-node writing the control registers in the node 800 where the task is being created (i.e., enabling the configuration of computational elements 250 to perform the task), and by the K-node writing to the control registers in other nodes, if any, that will be producing data for the task and/or consuming data from the task. These registers are memory mapped into the K-node's address space, and “peek and poke” network services are used to read and write these values. A newly created task starts in the “suspend” state.
  • Once a task is configured, the K-node can issue a “go” command, setting a bit in a control register in the hardware task manager 810. The action of this command is to move the task from the “suspend” state to the “idle” state. When the task is “idle” and all its input buffers and output buffers are available, the task is added to the “ready-to-run” queue which is implemented as a FIFO; and the task state is changed to “ready/run”. Buffers are available to the task when subsequent task execution will not consume more data than is present in its input buffers or will not produce more data than there is capacity in its output buffers.
  • When the adaptive execution unit 840 is not busy and the FIFO is not empty, the task number for the next task that is ready to execute is removed from the FIFO, and the state of this task is “run”. In the “run” state, the task (executed by the configured adaptive execution unit 840) consumes data from its input buffers and produces data for its output buffers.
  • The adaptive execution units 840 will vary depending upon the type of node 800 implemented. Various adaptive execution units 840 may be specifically designed and implemented for use in heterogeneous nodes 800, for example, for a programmable RISC processing node; for a programmable DSP node; for an adaptive or reconfigurable node for a particular domain, such as an arithmetic node; and for an adaptive bit-manipulation unit (RBU). Various adaptive execution units 840 are discussed in greater detail below.
  • For example, a node 800, through its execution unit 840, will perform an entire algorithmic element in a comparatively few clock cycles, such as one or two clock cycles, compared to performing a long sequence of separate operations, loads/stores, memory fetches, and so on, over many hundreds or thousands of clock cycles, to eventually achieve the same end result. Through its computational elements 250, the execution unit 840 may then be reconfigured to perform another, different algorithmic element. These algorithmic elements are selected from a plurality of algorithmic elements comprising, for example: a radix-2 Fast Fourier Transformation (FFT), a radix-4 Fast Fourier Transformation (FFT), a radix-2 Inverse Fast Fourier Transformation (IFFT), a radix-4 Inverse Fast Fourier Transformation (IFFT), a one-dimensional Discrete Cosine Transformation (DCT), a multi-dimensional Discrete Cosine Transformation (DCT), finite impulse response (FIR) filtering, convolutional encoding, scrambling, puncturing, interleaving, modulation mapping, Golay correlation, OVSF code generation, Haddamard Transformation, Turbo Decoding, bit correlation, Griffiths LMS algorithm, variable length encoding, uplink scrambling code generation, downlink scrambling code generation, downlink despreading, uplink spreading, uplink concatenation, Viterbi encoding, Viterbi decoding, cyclic redundancy coding (CRC), complex multiplication, data compression, motion compensation, channel searching, channel acquisition, and multipath correlation.
  • In an exemplary embodiment, a plurality of different nodes 800 are created, by varying the type and amount of computational elements 250 (Formning computational units 200), and varying the type, amount and location of interconnect (with switching or routing elements) which form the execution unit 840 of each such node 800. In the exemplary embodiment, two different nodes 800 perform, generally, arithmetic or mathematical algorithms, and are referred to as adaptive (or reconfigurable) arithmetic nodes (AN), as AN1 and AN2. For example, the AN1 node, as a first node 800 of the plurality of heterogeneous and reconfigurable nodes, comprises a first selection of computational elements 250 from the plurality of heterogeneous computational elements to form a first reconfigurable arithmetic node for performance of Fast Fourier Transformation (FFT) and Discrete Cosine Transformation (DCT). Continuing with the example, the AN2 node, as a second node 800 of the plurality of heterogeneous and reconfigurable nodes, comprises a second selection of computational elements 250 from the plurality of heterogeneous computational elements to form a second reconfigurable arithmetic node, the second selection different than the first selection, for performance of at least two of the following algorithmic elements: multi-dimensional Discrete Cosine Transformation (DCT), finite impulse response (FIR) filtering, OVSF code generation, Haddamard Transformation, bit-wise WCDMA Turbo interleaving, WCDMA uplink concatenation, WCDMA uplink repeating, and WCDMA uplink real spreading and gain scaling.
  • Also in the exemplary embodiment, a plurality of other types of nodes 800 are defined, such as, for example:
  • A bit manipulation node, as a third node of the plurality of heterogeneous and reconfigurable nodes, comprising a third selection of computational elements 250 from the plurality of heterogeneous computational elements, the third selection different than the first selection, for performance of at least two of the following algorithmic elements: variable and multiple rate convolutional encoding, scrambling code generation, puncturing, interleaving, modulation mapping, complex multiplication, Viterbi algorithm, Turbo encoding, Turbo decoding, correlation, linear feedback shifting, downlink despreading, uplink spreading, CRC encoding, de-puncturing, and de-repeating.
  • A reconfigurable filter node, as a fourth node of the plurality of heterogeneous and reconfigurable nodes, comprising a fourth selection of computational elements 250 from the plurality of heterogeneous computational elements, the fourth selection different than the first selection, for performance of at least two of the following algorithmic elements: adaptive finite impulse response (FIR) filtering, Griffith's LMS algorithm, and RRC filtering.
  • A reconfigurable finite state machine node, as a fifth node of the plurality of heterogeneous and reconfigurable nodes, comprising a fifth selection of computational elements 250 from the plurality of heterogeneous computational elements, the fifth selection different than the first selection, for performance of at least two of the following processes: control processing; routing data and control information between and among the plurality of heterogeneous computational elements 250; directing and scheduling the configuration of the plurality of heterogeneous computational elements for performance of a first algorithmic element and the reconfiguration of the plurality of heterogeneous computational elements for performance of a second algorithmic element; timing and scheduling the configuration and reconfiguration of the plurality of heterogeneous computational elements with corresponding data; controlling power distribution to the plurality of heterogeneous computational elements and the interconnection network; and selecting the first configuration information and the second configuration information from a singular bit stream comprising data commingled with a plurality of configuration information.
  • A reconfigurable multimedia node, as a sixth node of the plurality of heterogeneous and reconfigurable nodes, comprising a sixth selection of computational elements 250 from the plurality of heterogeneous computational elements, the sixth selection different than the first selection, for performance of at least two of the following algorithmic elements: radix-4 Fast Fourier Transformation (FFT); multi-dimensional radix-2 Discrete Cosine Transformation (DCT); Golay correlation; adaptive finite impulse response (FIR) filtering; Griffith's LMS algorithm; and RRC filtering.
  • A reconfigurable hybrid node, as a seventh node of the plurality of heterogeneous and reconfigurable nodes, comprising a seventh selection of computational elements 250 from the plurality of heterogeneous computational elements, the seventh selection different than the first selection, for performance of arithmetic functions and bit manipulation functions.
  • A reconfigurable input and output (I/O) node, as an eighth node of the plurality of heterogeneous and reconfigurable nodes, comprising an eighth selection of computational elements 250 from the plurality of heterogeneous computational elements, the eighth selection different than the first selection, for adaptation of input and output functionality for a plurality of types of I/O standards, the plurality of types of I/O standards comprising standards for at least two of the following: PCI busses, Universal Serial Bus types one and two (USB1 and USB2), and small computer systems interface (SCSI).
  • A reconfigurable operating system node, as a ninth node of the plurality of heterogeneous and reconfigurable nodes, comprising a ninth selection of computational elements 250 from the plurality of heterogeneous computational elements, the ninth selection different than the first selection, for storing and executing a selected operating system of a plurality of operating systems.
  • FIG. 15 is a block diagram illustrating a second system embodiment 1000 in accordance with the invention of the related application. The second system embodiment 1000 is comprised of a plurality of variably-sized nodes (or matrices) 1010 (illustrated as nodes 1010 through 1010X), with the illustrated size of a given node 1010 also indicative of an amount of computational elements 250 within the node 1010 and an amount of memory included within the node 1010 itself. The nodes 1010 are coupled to an interconnect network 110, for configuration, reconfiguration, routing, and so on, as discussed above. The second system embodiment 1000 illustrates node 800 and system configurations which are different and more varied than the quadrant 930 and quad 940 configurations discussed above.
  • As illustrated, the second system embodiment 1000 is designed for use with other circuits within a larger system and, as a consequence, includes configurable input/output (I/O) circuits 1025, comprised of a plurality of heterogeneous computational elements configurable (through corresponding interconnect, not separately illustrated) for I/O functionality. The configurable input/output (I/O) circuits 1025 provide connectivity to and communication with a system bus (external), external SDRAM, and provide for real time inputs and outputs. A K-node (KARC) 1050 provides the K-node (KARC) functionality discussed above. The second system embodiment 1000 further includes memory 1030 (as on-chip RAM, with a memory controller), and a memory controller 1035 (for use with the external memory (SDRAM)). Also included in the apparatus 1000 are an aggregator/formatter 1040 and a de-formatter/distributor 1045, providing functions corresponding to the functions of the data aggregator and selector 850 and data distributor and decoder 820, respectively, but for the larger system 1000 (rather than within a node 800).
  • As indicated above, one of the novel aspects of the ACE architecture is its heterogeneous collection of nodes 150, 800, which communicate via the matrix interconnection network (MIN) 110. The MIN 110 architecture allows data to be transmitted between tasks running on pairs of nodes 150, 800 (or between pairs of tasks on the same node), with one task acting as the producer of the data, and the other as the consumer. The producing task will provide data through one or more output ports coupled to the MIN 110, via pipeline register 855 (for immediate consumption by a consuming task). The consuming task will receive data through one or more input ports coupled to the MIN 110, via pipeline register 815. These pairs of tasks can be configured either statically at the time of device initialization, or reconfigured dynamically. The minimal information required to statically or dynamically reconfigure a MIN 110 connection consists of the following:
    • 1. A source node identifier which uniquely identifies the node 150, 800 on which the task producing the data resides.
    • 2. A source task identifier which uniquely identifies which task on the source node is acting as the producer.
    • 3. A source port identifier which uniquely identifies which (output) port on the source node is being used to transmit information onto the MIN 110.
    • 4. A target node identifier which uniquely identifies the node 150, 800 on which the task consuming the data resides.
    • 5. A target task identifier which uniquely identifies which task on the target node is acting as the consumer.
    • 6. A target port identifier which uniquely identifies which (input) port on the target node is being used to gather information from the MIN 110.
  • As mentioned above, the nodes of the ACE are heterogeneous in nature, meaning their internal architectures differ from one another, allowing each node to optimize its performance for differing computational types. A feature common to all nodes is the Hardware Task Manager (HTM) 810, a component of the node that is responsible for interacting with the MIN 110. The HTM 810 is also responsible for keeping track of the tasks running on each node, and controlling when each task executes.
  • The HTM 810 employs a technique known as co-operative multitasking to control task scheduling. In a co-operatively multitasked system, only one task is allowed to execute on a node 150, 800 at any given time. It is the running task's responsibility to yield the processor back to the Hardware Task Manager when it has completed its computation.
  • In order to efficiently schedule tasks, the HTM associates firing conditions with each task. These firing conditions are based on the availability of input data for a task to consume, and the availability of memory to store output data produced by a task. These firing conditions are represented as counters in a Consumer Count Table (CCT) and Producer Count Table (PCT).
  • The minimal information required to statically or dynamically configure a node's HTM 810 to specify task firing conditions consists of the following:
    • 1. A task identifier.
    • 2. The number of input ports utilized by the task.
    • 3. For each input port, the counter value required to trigger the task.
    • 4. For each input port, the initial counter value.
    • 5. The number of output ports utilized by the task.
    • 6. For each output port, the counter value required to trigger the task.
    • 7. For each output port, the initial counter value.
  • In accordance with the present invention, a new general purpose programming language (referred to herein as “SilverC”) is provided to facilitate static and dynamic configuration of the ACE 100. While applicable to many hardware platforms and programming styles, it contains several constructs that directly support the static or dynamic reconfiguration of the MIN 110 and HTMs 810 of the ACE (ACM) 100. These constructs are modules, processes, and pipes.
  • A “construct” or “program construct”, as used herein, means and refers to use of any programming language, of any kind, with any syntax or signatures, which provide or can be interpreted to provide a mapping or correspondence from the language to the hardware, such as a first program construct which maps to a node 800, a second program construct which maps to a task to executed on the node 800, and so on. While exemplary constructs are illustrated as examples, it should be understood that other constructs which are correspondingly mapped or can be interpreted to be mapped, such as through a compiler, are within the scope of the present invention. For example, while terminology such as “module”, “process”, “pipes”, etc., are utilized herein, other nomenclature such as “crates”, “methods”, “conduits”, etc. may be utilized, literally or equivalently, provided that a compiler will interpret this nomenclature to be mapped to the adaptive hardware.
  • A SilverC module acts as a container for program instructions and data that will be used to perform some computation on some hardware platform, such as a node within the ACE (ACM) 100. In the preferred SilverC embodiment, a module corresponds to or maps to a selected node 800. A SilverC module may contain zero or more processes and pipes. SilverC modules add a layer of encapsulation to the SilverC programming language. A module may be completely described by the input and output characteristics of its pipes. As such, developers incorporating a pre-existing module into their application may remain unaware of the details of its processes and how the actual computation is performed within the module.
  • A SilverC process is a collection of program instructions and data that is instantiated as an individual thread or task on some hardware platform, such as the ACE (ACM) 100. In the preferred SilverC embodiment, a process corresponds to or maps to a task to be performed by the adaptive execution unit (AEU) 840 under the control of the HTM 810 on a selected node 800. The process will only execute when its firing conditions are met, providing event-driven programming. A process maps as a software analog to the hardware task, with the firing conditions mapping to the HTM 810 which provides that a task is ready-to-run when the input data is available and there are a sufficient number of output ports for the output data, as discussed above in greater detail. Multiple processes may be aggregated within a single SilverC module and work cooperatively in order to perform the overall computation of that module.
  • A SilverC pipe represents communication between tasks, and acts as a conduit for data that is either produced or consumed by a process. An inpipe acts as a conduit for data that is consumed by a process. An outpipe acts as a conduit for data that is produced by a process.
  • While suitable as a general purpose programming language that is applicable to many hardware platforms, the language constructs of Silverc directly support the static and dynamic reconfiguration capabilities of the ACE (ACM) 100 hardware. In particular, the SilverC module, process and pipe constructs are an efficient means to specify the static and dynamic reconfiguration parameters of the MIN 110 and HTM 810.
  • The various modules, with their processes, pipes, and other SilverC constructs described below, may then be compiled to a bit file or other object code, by a compiler, for execution on the selected computing hardware, such as a bit file which provides configuration information (silverware) for execution on the ACE (ACM) 100. In the preferred SilverC embodiment, such compilation and resulting bit file may vary depending upon the particular node types available in the selected ACE 100 embodiment. As a consequence, any module, with its processes, pipes, and other SilverC constructs of the preferred SilverC embodiment, is considered capable of being mapped or otherwise has a direct (1:1) correspondence to a selected node 800 of an ACE 100 (and associated system) with its associated HTM 810, AEU 840, and MIN 110 connections (ports).
  • SilverC modules are code containers that are mapped (by a compiler) to a single “execution unit” having computational elements on some hardware platform, such as to a node 800 on the ACE (ACM) 100 having an AEU 840 and HTM 810. The computational elements of the AEU 840 may support multiple modules at a time, but a module should not be distributed across multiple AEUs 840 (i.e., a single module is executed by a single node 800). SilverC modules contain a configuration-time interface and a run-time interface. The configuration-time interface consists of values that are used to parameterize the definition of the module and which are specified at the point when the module is instantiated. For example, a filter may be defined to have a gain parameter of “T”, which may be instantiated to provide “T=2”, resulting in a filter having a gain of 2 in that instantiation, while at another time, may be instantiated to provide “T=3”, resulting in a filter having a gain of 3 in that instantiation. Such instantiation may occur at either compile-time or run-time. The run-time interface consists of input and output pipes that are used to dynamically transmit data to and from the module. These form the basis for the SilverC dataflow-style semantics.
  • SilverC modules are also composed of processes that define the computation performed by the module on its input data. The code used to specify these processes can be C-like in nature, with some additions to support dataflow-style programming and specific hardware features. Equivalently, other coding languages and styles may be utilized, also with the additions to support dataflow-style programming and specific hardware features of the ACE 100.
  • SilverC modules may contain constants that are global to the module, as well as some amount of state information shared between its processes, in the form of memory or registers. For example, memory may be shared across processes, and variables and constants may be declared and shared across processes.
  • An exemplary syntax for declaring a typical module is (Example 1):
    [nodeType] module moduleName[<parameterList>] {
    ...
    }
  • In this code fragment of Example 1, the nodeType specifies for which type of node (or AEU 840) the module is targeted, such as an arithmetic node or a bit-manipulation node. (In the examples which follow, a module's nodeType will generally be omitted, for ease of discussion). The moduleName is a placeholder for a unique identifier (or name) that identifies the module, while parameterList represents the list of configuration-time parameters for the module. The parameter list of a module is preferably a comma-separated list of const identifier declarations, resembling a parameter list of a C function. For example, an exemplary parameter list would be (Example 2):
    • const int16 blocksize, const fract16 epsilon
  • Modules that require no configuration-time parameters may be declared by omitting the parameter list, and optionally by omitting the angle brackets used to enclose it as well. For example, both of the following modules have no parameters (Example 3):
    module NoParametersHere<> {
    ...
    }
    module NorHere {
    ...
    }
  • The rest of the module definition is given in one or more module sections. The preferred SilverC embodiment currently supports four different module sections, each identified by a keyword followed by a colon: constants, state, pipes, and processes. The constants section is used to define constant values that are global to the module. The state section declares shared state information between the module processes. The pipes section defines the module run-time interface. The processes section defines the processes themselves (i.e., algorithms to be performed).
  • Module sections may appear in any order, though each may only be defined in terms of identifiers declared in sections that precede it. Each module section type may be omitted, may contain no declarations at all, or may be used multiple times within a module. Modules whose pipes and/or processes sections are omitted or empty are relatively useless in a real system.
  • Each of these module sections is described in further detail below. An exemplary module (named “Sample”, and omitting its nodeType) that has one instance of each type of module section is shown in the following code (Example 4):
    module Sample<const int16 blockSize> {
    constants:
    ...
    state:
    ...
    pipes:
    ...
    processes:
    ...
    }

    In Example 4, a parameter “blockSize” was declared as a constant value of a 16-bit integer data type. As illustrated below, it will be used to determine the size of pipes (number of ports) and the amount of data to be consumed or produced in this module, and will be instantiated by other parts of the code of the module illustrated in other examples below. While illustrating a single parameter, it should be understood that a list of multiple parameters may be utilized.
  • The constants section of a module is used to declare constants that are global to the module scope. It consists of traditional constant variable declarations as in C, the initializers of which may be composed of any expression formed of literals, global constants defined at the file scope, the parameters of the module, and any module constants declared previously within the module. Module constants are often used to define the sizes of the input pipe buffers, as well as state variables declared within the state section. A sample constants section is illustrated in the following code (Example 5):
    module Sample<const int16 blockSize> {
    constants:
    const int16 numBlocks = 2;
    const int16 dataCacheSize = numBlocks * blockSize;
    ...
    }
  • This state section of a module is used to declare shared state information between module processes. It supports the declaration of global variables within the module scope whose values can be accessed by any of the module processes. If a module is instantiated multiple times, each instantiation receives its own copy of the state variables—in this sense, state variables are similar to the static variables declared within a process except that they are accessible by multiple processes.
  • Because module processes are cooperatively multi-tasked, there is generally no need for locking or synchronization mechanisms to ensure coherent access to state variables. The variables declared within this section are often arrays of values stored in memory, whose sizes are specified by the module parameters and/or constant declarations, and which values may be shared between processes. The following code shows an exemplary state section for a module (the constants section was shown previously) (Example 6):
    module Sample<const int16 blockSize> {
    ...
    state:
    ram fract16 dataCache[dataCacheSize];
    ...
    }

    In this Example 6, the state section set up random access memory (ram) (or another register), with a 16-bit fractional (fixed point) data type, having a size (datacache) equal to the previously determined constant (dataCacheSize).
  • The pipes section defines the run-time interface of a module by specifying the input and output pipes used to transmit data into and out of the module, and is utilized to configure the MIN 110. For the preferred ACE 100 embodiment, this pipes construct illustrates a 1:1 correspondence between the constructs of SilverC and the configuration of the ACE 100.
  • All pipes are declared to be either an input pipe, using the inpipe keyword, or an output pipe, using the outpipe keyword. Each pipe type takes its defining parameters enclosed in angle brackets, and these are described in further detail below. Pipes are named, as with any other declaration. A sample pipes section is illustrated as the following code (Example 7):
    module Sample<const int16 blockSize> {
    ...
    pipes:
    inpipe<...> dataIn;
    outpipe<...> dataOut;
    ...
    }

    In this Example 7, an inpipe has been named dataIn, and an outpipe has been named dataOut. This pipes section specifies that the module has one input data stream that is stored in the datain pipe and a single output data stream that is controlled by the dataOut pipe.
  • Input pipes buffer data that is streamed into a module. All input pipes can be thought of as single-dimensional arrays of a user-specified element type. Input pipes are uniquely named (inpipeName) and are parameterized using two values: the type of element that is being transferred (elementType), and the number of elements that should be buffered by the input pipe (bufferSize) (i.e., the amount of memory to be reserved for its incoming data). An exemplary input pipe declaration is shown as the following code (Example 8):
    • inpipe<elementtype, buffersize>inpipeName;
  • In the exemplary module below, an input pipe named datain of fract16 data type values is declared whose buffer size is specified via its module parameter (blockSize) and constant values (numBlocks) as follows (Example 9):
    module Sample<const int16 blockSize> {
    ...
    pipes:
    inpipe<fract16, numBlocks*blockSize> dataIn;
    ...
    }

    As illustrated, whenever this inpipe is instantiated via instantiation of its parent module, different parameter values may be utilized, and the inpipe buffer allocation will be correspondingly sized automatically, providing for significant code re-use.
  • For an instantiation of this module with a blockSize parameter of “8”, this declaration would result in the allocation of logical buffer space corresponding to sixteen (2*8) fract16 elements. The memory allocated by an inpipe declaration can be thought of as being equivalent to the following C array declaration:
    • elementlype inpipeName[bufferSize];
  • Output pipes are the means for generating output from a module. Output pipes are similar to input pipes, except that they do not perform any buffering, requiring only a data type declaration (elemeniType) and a unique name (outpipeName). As discussed above, as soon as output data is produced, it is transmitted over the MIN 110, and stored in the inpipe of another process or module. Output pipe declarations appear as follows in the preferred SilverC embodiment (Example 10):
    • outpipe<elementtype>outpipeName;
  • As with the input pipe declaration, the elementType indicates the type of element that is transferred through the output pipe. An output pipe declaration that would complement the input pipe shown earlier would be declared as follows (Example 11):
    • outpipe<fract16>dataOut;
  • Input and output pipes both support two main types of operations: readiness checks, for the HTM 810 to determine if the task is ready to run, and synchronization. Output pipes also support assignments, which correspond to placing data on the network. Input pipes currently do not support direct access in the preferred SilverC embodiment, but must be accessed via SilverC pointers (to memory 845).
  • Data is written to an output pipe using a simple assignment. The right-hand side expression of the assignment must be of the same type as the element type of the pipe, or of a type that can automatically be coerced into the output type of the pipe. For example, the following code fragment would write the value 0.5 to the fract16 output pipe declared above, three times (Example 12):
    // code to write 0.5 three times to the output pipe declared
    above
    fract32 quarter = 0.25;
    fract16 half = 0.5;
    dataOut = 0.5;
    dataOut = half;
    dataOut = 2.0 * quarter;
    ...
  • Assuming that the downstream input pipe contains sufficient space, these assignments of Example 12 would cause the value 0.5 to be written into the next three available slots in the input buffer of the downstream input pipe, i.e., execution of this assignment statement would cause this data to be provided to the specified output port and onto the MIN 110, to be provided to the specified input port and corresponding memory 845 for the next consuming task. If three slots were not available, this program would overwrite old data, resulting in an incorrect program. To avoid such conditions, the readiness condition of the output pipe can be checked, as described in greater detail below.
  • Once data has been written to an output pipe, a synchronization message should be sent to the corresponding input pipe to let it know that new data has been written to its input buffer for a consuming task. This downstream notification functionality is provided by using the a notify ( ) routine of the preferred SilverC embodiment, as follows (Example 13):
    • void notify(outpipe outpipename, int16 numberOfElementsWritten);
      In this Example 13, void indicates that there will be no return value from this routine call, outpipeName is the output pipe identifier, while numberOfElementsWritten indicates the number of new values that have been produced, and will be utilized in modifying the producer count held in the producer count table (PCT) of the producing node's HTM, and the consumer count held in the consumer count table (CCT) of the consuming node's HEM 810. For example, the consuming node's HTM 810 will check the CCT to determine that the consumer count has been increased to a predetermined value for a given task, and if so, will then trigger that consuming task by placing it in the ready-to-run queue.
  • Having written the three values shown in the above Example 12, the following call would tell its linked input pipe that three values had been written to its input buffer (Example 14):
    // code to inform linked input pipe that 3 values written to its
    buffer...
    notify(dataOut, 3);
  • The preferred SilverC embodiment does not prevent a user notification from providing incorrect information about how many values have actually been written to an input pipe buffer, although this usage is strongly discouraged. The value passed to a notify call should be equal to the number of assignments made to the output pipe since the preceding call. In addition, the synchronization used to implement the notify routine usually has a certain amount of overhead associated with it, which is why notifications are not assumed to be performed automatically by the runtime system for each assignment to an output pipe.
  • Correspondingly for data consumption, once a process associated with an input pipe has finished processing some portion of its buffered values, it must synchronize with the upstream output pipe to let it know that those slots are once again available for writing new values. The preferred SilverC embodiment utilizes a releases routine to provide this upstream notification functionality, as illustrated in the following code (Example 15):
    • void release(inpipe inpipeName, int16 nunberOfElementsRead);
      In this Example 15, void also indicates that there will be no return value from this routine call, inpipeName is the identifier of the input pipe while numberOfElementsRead indicates the number of elements in the input buffer that the consumer process wants to make available to the output pipe for subsequent writing by the producing process, and will be utilized in modifying the consumer count held in the consumer count table (CCT) of the consuming node's HTM, and the producer count held in the producer count table (PCT) of the producing node's HTM 810. For example, the producing node's HTM 810 will check the PCT to determine that the producer count has been decremented to or below a predetermined value for a given task and if so, will then trigger that producing task by placing it in the ready-to-run queue.
  • For example, if a process had read the three 0.5 values written in the output pipe of Example 12 above and would not be utilizing those data items again, it would indicate that it was done with them using the following call (Example 16):
    // code to read three values from the dataIn buffer...
    release(dataIn, 3);
  • As may be apparent from the discussion above, the synchronization functionality provided by the notify ( ) and release ( ) routines are mapped (through a compiler) directly to the functionality of the HTM 810 with its producer and consumer count tables, and correspondingly modify the CCT and PCT registers of the HTM 810 for each corresponding input or output port.
  • The preferred SilverC embodiment supports a query and initialization functionality, ready ( ), which allows a process (program) to query whether input and output pipes are ready for data to be read from them or written to them. As discussed in greater detail below, in conjunction with specification of firing (execution) conditions as part of process definitions, these functionalities have the effect of initializing the CCT and PCT to their triggering values (firing or execution conditions), i.e., the values which will cause the HTM 810 to place the corresponding task in the ready-to-run queue for execution. The exemplary query function is illustrated using the following code (Example 17):
    • int16 ready(pipetype pipeName, int16 numberOfElements);
  • In this Example 17, pipeType is a placeholder to indicate that either an inpipe or outpipe can be used with this routine. The pipeName argument is the name of the pipe to be checked, while numberOfElements indicates the number of elements to be checked for (as a necessary and/or sufficient condition for triggering the corresponding task). For an input pipe, this routine indicates whether at least numberOfElements data values are ready to be read from the pipe input buffer. For an output pipe, it indicates whether there are numberOfElements slots available for writing new values in the corresponding input pipe buffer. The routine returns a first value (0) if the readiness condition of the pipe is not met, and a second value (non-zero) otherwise.
  • The readiness of a pipe does not correspond to the number of actual values written to or read from an input pipe buffer, but rather the number of elements that have been cumulatively specified by the notify ( ) and release ( ) synchronization routines. For example, if three values were written to an output pipe, but no notification was ever made that these three values had been written (and, as a consequence, the producer and consumer counts are unchanged), the following call would return 0 for the corresponding input pipe, even though the values may very well be stored in its buffer (Example 18):
    • ready(dataIn, 3) . . .
  • To be explicit, assuming that an output pipe O is connected to an input pipe I whose buffer size is b, that n elements in total have been notified for O and that r elements have been released from I during the execution of the program, and that k open buffer elements (slots) are required for writing to memory (output) and d elements are required for reading from memory (input), then the calls to ready() would be defined as follows in the preferred SilverC embodiment (Example 19):
    ready(O, k) : returns non-zero (true) if (b − n) + r ≧ k;
    otherwise returns 0 (false)
    ready(I, d) : returns non-zero if n − ≧ d; otherwise returns 0
  • Conditional statements may also be utilized in the preferred SilverC embodiment, for example, to ensure that the three writes to the output pipe of Example 12 do not overwrite data values that they should not, such as (Example 20):
    fract16 half = 0.5;
    fract32 quarter = 0.25;
    if (ready(dataOut, 3)) {
    dataOut = 0.5;
    dataOut = half;
    dataOut = 2.0 * quarter;
    notify(dataOut, 3);
    }

    Conceptually in this Example 20, if the input memory has sufficient space to accommodate the writing of three new values, then the data will be written to the corresponding output ports, and the consuming task will be correspondingly notified.
  • Such pipe readiness is typically checked or determined within the firing conditions of a process, as described below.
  • In the preferred SilverC embodiment, the processes section of a module contains the process (method or program) definitions that define a module. A module may consist of one or more processes, which are cooperatively multitasked with each other, as well as with any other modules mapped to the same AEU 840 or other form of hardware computational element. Each such process corresponds to a task to be performed on a node 150, 800.
  • In the preferred SilverC embodiment, processes are where the bulk of the program behavior is defined and where most of the C-style code appears. Process declarations vaguely resemble C-style functions, but due to their adaptive computing nature, they take no parameters and have no return type. Instead, they are defined with associated firing conditions that indicate when the process should run (typically in terms of the readiness of one or more input and/or output pipes).
  • The general pattern for defining a process is as follows (Example 20):
    process processName when firingCondition {
    ...
    }

    In this exemplary process definition, processName is a unique identifier for the process and firingCondition indicates the condition that must be true in order for the process (corresponding task) to be executed. This is typically the logical AND of a number of pipe readiness conditions and, as indicated above, initializes the PCT and CCT values.
  • As an example, the following code declares a process for a sample module named passThrough. It is declared to fire whenever its input pipe has a block of values (of size blockSize) ready for reading and its output pipe has a block of locations (also in this example of size blocksize) free for writing (Example 21):
    module Sample<const int16 blockSize> {
    ...
    processes:
    process passThrough when (ready(dataIn, blockSize) &&
    ready(dataOut, blockSize)) {
    ...
    }
    }
  • The body of a process is preferably made up of SilverC code as it has been described, namely, traditional C or C++ language program constructs augmented with SilverC constructs, definitions, extensions, pointers, and pipe operations. The body of a process may alternately contain inline C or assembly code. Preferably, most processes begin by firing based on the readiness of their input and output pipes, perform some computations using the input data and module state, followed by assigning the results to their output pipes, and then performing notification and release calls on the pipes.
  • For a comparatively simple example, a process is declared such that it effectively copies data values from its input pipe to its output pipe without changing them, as illustrated in the following exemplary code (Example 22):
    module Sample<const int16 blockSize> {
    ...
    processes:
    process passThrough when (ready(dataIn, blockSize) &&
    ready(dataOut, blockSize)) {
    static pointer<fract16, dataIn, 1> dataInPtr;
    int16 i;
    for (i=0; i<blockSize; i++) {
    dataOut = *(dataInPtr++);
    }
    notify(dataOut, blockSize);
    release(dataIn, blockSize);
    }
    }
  • This process runs whenever a block of values (of size blockSize) is ready for reading from its input, and a block of locations (of size blocksize) are ready for writing on its output, as the firing conditions which initialize the CCT and PCT of the HTM 810. It proceeds by running a SilverC pointer (dataInPtr++) incrementally, one element at a time, across that input block of values (in a buffer corresponding to dataIn), and writing them to its output pipe. This process then notifies the downstream pipe that it has sent a block of values to it, and releases the input values so that the upstream process may overwrite them, modifying the values held in the CCT and PCT. It should be noted that these synchronization calls notify ( ) and release ( ) could be performed in any order, with the choice of order depending on which message should be delivered first.
  • Once SilverC modules have been defined in accordance with the present invention, they may be used as a new parameterized type in the language of the preferred SilverC embodiment. Declaring “variables” of these types corresponds to creating a new instantiation of the module that executes in parallel with all other module instantiations. For example, given a module definition as follows (Example 23):
    module Sample<const int16 blockSize> {
    ...
    }

    then an instantiation of the module with a blocksize parameter of “8” would appear as:
    • Sample<8> mySampleModule;
  • It should be noted that the number and types of parameters specified during the module instantiation must match the parameters declared for the module. In addition, a module may be instantiated more than once.
  • In order for modules to function to produce desired results, the preferred SilverC embodiment provides for input and output pipes of a module to be linked to the output and input pipes of other modules. This linking or connecting of pipes across modules may be performed statically or dynamically, and may be implemented repeatedly with different linking connections, such as linking “A” to “B” at one instant, followed by linking “A” to “C” at another instant. The preferred SilverC embodiment utilizes a link( ) function, which may be specified as (Example 24):
    • void link(outpipe<elementType>, inpipe<elementType, bufferSize>);
      Also in the preferred SilverC embodiment, a main ( ) function is utilized to instantiate modules and their corresponding links to each other.
  • The element types of both pipes should match one another. In this context, pipes are referred to using the identifier of the module instantiation followed by a dot (.), followed by the name of the pipe as declared within the module definition.
  • As an example, the following exemplary code fragment illustrates module definition, pipe definition, module instantiation and pipe linking (Example 25):
    module Producer<const int16 outBlockSize> {
    pipes:
    outpipe<fract16> dataOut;
    ...
    }
    module Consumer<const int16 inBlockSize> {
    pipes:
    inpipe<fract16, 2*inBlockSize> dataIn;
    ...
    }
    void main( ) {
    const int16 bufferSize = 32;
    Producer<bufferSize> myProducer;
    Consumer<bufferSize> myConsumer;
    link(myProducer.dataOut, myConsumer.dataIn);
    ...
    }

    Instantiating modules using the main ( ) function, this code declares an instance of each of the Producer and Consumer modules, as myproducer and myConsumer, respectively, similarly to the C++ declaration of an object as an instance of a class. This Example 25 then links the output pipe of the instantiated producer, dataOut, to the input pipe of the instantiated consumer, datain.
  • The language constructs of the preferred SilverC embodiment directly support the static and dynamic reconfiguration capabilities of the ACE (ACM) 100 hardware. In particular, the SilverC module, process and pipe constructs are an efficient means to specify the static and dynamic reconfiguration parameters of the ACE (ACM) 100 MIN 110 and node 800 Hardware Task Manager 810.
  • With regard to the static or dynamic reconfiguration of the MIN 110 of the ACE (ACM) 100, as mentioned above, the following information is required for configuration: a source node identifier; a source task identifier; a source port identifier; a destination node identifier; a destination task identifier; and a destination port identifier. The preferred SilverC embodiment provides the following direct mapping from the programming language domain to the ACE (ACM) 100 hardware domain:
    • f(module, process, pipe)=(node id, task id, port id)
  • The SilverC module constructs provides a direct mapping from the programming language domain to the ACE (ACM) 100 node identifier domain. The SilverC compiler assigns module instances to ACE (ACM) 100 nodes according to the node type specified in the module definition and any additional constraints applied to the module instance.
  • The SilverC process construct provides a direct mapping from the programming language domain to the ACE (ACM) 100 task identifier domain. A unique task identifier is generated for each process of each module instance.
  • The SilverC pipe construct provides a direct mapping from the programming language domain to the ACE (ACM) 100 port identifier domain. A unique unit port identifier is generated for each port of each module instance.
  • The SilverC link ( ) function provides the association between source node, task and port identifiers and destination node, task and port identifiers. It provides a direct mapping from the programming language domain to the MIN 110 connection domain of the ACE (ACM) 100.
  • With regard to the static or dynamic reconfiguration of the HTM 810 of a node 800, as discussed above, the following information is required for configuration: a task identifier; the number of input and output ports utilized by a task; and a pair of counter values for each port (initial and triggering values). The SilverC programming language provides the following direct mapping from the programming language domain to the ACE (ACM) 100 hardware domain:
    • f(process)=(inputs, {input counters}, outputs, {output counters})
  • As described above, the SilverC process construct provides a direct mapping from the programming language domain to the ACE (ACM) 100 task identifier domain. A unique task identifier is generated for each process of each module instance.
  • The SilverC ready ( ) function provides a direct mapping from the programming language domain to the HTM firing condition domain. The HTM Consumer Count Table (CCT) and Producer Count Table (PCT) are populated using the counter values specified in the ready ( ) function. The SilverC module construct plays an indirect role in this mapping, as it provides the association between processes and pipes. The SilverC pipe construct also provides an indirect role as it provides the mapping to MIN 110 ports, as described above.
  • The SilverC pipe construct provides a direct mapping from the programming language domain to the HTM initial counter value domain. For an exemplary SilverC inpipe, the initial counter value for the corresponding input port is simply—bufferSize, where bufferSize is size of the inpipe buffer as specified in its declaration. For an exemplary SilverC outpipe, the initial counter value for the corresponding output port is -(bufferSize—readyCount+1), where bufferSize is size of the buffer of the inpipe that is linked to this outpipe through a link ( ) expression, and readyCount is the firing condition associated with the output port through a ready ( ) expression. The release ( ) and notify ( ) constructs may then be utilized to increment or decrement the counter values held in the corresponding CCT and PCT of the HTM 810.
  • The system, methods and programs of the present invention may be embodied in any number of forms, such as within a computer, within a workstation, within a computer network, within an adaptive computing device such as an ACE 100, or within any other form of computing or other system used to create or contain source code. Such source code further may be compiled into some form of instructions or object code (including assembly language instructions or configuration information for adaptive computing). The source code of the present invention may be embodied as any type of software, such as C++, C#, Java, or any other type of programming language which performs the functionality discussed above, including the preferred SilverC embodiment. The source code of the present invention and any resulting bit file (object code or configuration bit sequence) may be embodied within any tangible storage medium, such as within a memory or storage device for use by a computer, a workstation, any other machine-readable medium or form, or any other storage form or medium for use in a computing system. Such storage medium, memory or other storage devices may be any type of memory device, memory integrated circuit (“IC”), or memory portion of an integrated circuit (such as the resident memory within a processor IC), including without limitation RAM, FLASH, DRAM, SRAM, MRAM, FeRAM, ROM, EPROM or E2PROM, or any other type of memory, storage medium, or data storage apparatus or circuit, depending upon the selected embodiment. For example, without limitation, a tangible medium storing computer readable software, or other machine-readable medium, may include a floppy disk, a CDROM, a CD-RW, a magnetic hard drive, an optical drive, a quantum computing storage medium or device, a transmitted electromagnetic signal (e.g., used in internet downloading), or any other type of data storage apparatus or medium.
  • In summary, the present invention provides a system, software, and method for programming an adaptive computing device which has a plurality of heterogeneous nodes coupled through a matrix interconnect network. The method embodiment comprises, in any order: creating a first program construct having a correspondence to a selected node of the plurality of heterogeneous nodes; creating a second program construct having a correspondence to an executable task of the selected node; creating a third program construct having a correspondence to at least one input port coupling the selected node to the matrix interconnect network for input data to be consumed by the executable task; and creating a fourth program construct having a correspondence to at least one output port coupling the selected node to the matrix interconnect network for output data to be produced by the executable task.
  • In the preferred SilverC embodiment, the first program construct is a module declaration, optionally having a first unique identifier, a first reference to a node type corresponding to the selected node, and a second reference to one or more configuration-time parameters. The preferred module declaration has a form comprising:
    • [nodetype] module moduleName [<parameterList>],
      in which nodeType is a placeholder for the first reference to the node type corresponding to the selected node, modulename is a placeholder for the first unique identifier, and parameterList is a placeholder for the second reference to one or more configuration-time parameters.
  • It should be noted that to be functional when compiled into configuration information, this first program construct generally includes, within the body of the construct, the second, third and fourth program constructs. The function of the first program construct, however, is merely to map or correspond to a node type.
  • In the preferred SilverC embodiment, as additional options, the module declaration further has a constants section which declares at least one constant which is global to the module; a states section which declares shared state information between module processes (such as an array of values stored in a memory); a process section having one or more process declarations, as second program constructs; and a pipes section, the pipes section having the third program construct and the fourth program construct.
  • The third program construct is preferably an inpipe declaration having a first unique identifier and further having a first parameter specifying an element type of the input data and a second parameter specifying an amount of memory to be reserved for the input data; and the fourth program construct is preferably an outpipe declaration having a second unique identifier and further having a third parameter specifying an element type of the output data. An assignment of output data to the outpipe declaration corresponds to writing output data to the output port connecting the node 800 to the MIN 100.
  • The inpipe declaration preferably has a form comprising:
    • inpipe<elemeniType1, bufferSize> inpipeName;
      in which elemenTtype1 is a placeholder for the first parameter specifying the element type of the input data, bufferSize is a placeholder for the second parameter specifying the amount of memory to be reserved for the input data, and inpipename is a placeholder for the first unique identifier. The outpipe declaration preferably has a form comprising:
    • outpipe<elementType2> outpipeName;
      in which elementType2 is a placeholder for the third parameter specifying the element type of the output data, and outpipename is a placeholder for the second unique identifier.
  • In the preferred SilverC embodiment, the second program construct is a process declaration having a unique identifier and having at least one firing condition, the firing condition capable of determining a commencement of the executable task of the selected node. The process declaration preferably has a form comprising:
    • process processName wh n firingCondition {. . . }
      in which processName is placeholder for the unique identifier, firingcondition is a placeholder for a condition to be fulfilled in order to commence performance of the executable task, and the ellipsis “. . . ” is a placeholder for specification of one or more functions or algorithmic elements comprising the executable task.
  • Synchronization of production of output data with consumption of input data is provided by creating a fifth program construct corresponding to a data producing task notifying a data consuming task of the creation of output data; and creating a sixth program construct corresponding to a data consuming task notifying a data producing task of the consumption of input data. In addition to potentially being on the same node, in some instances, the data producing task is executable on a first node of the plurality of heterogeneous nodes and the data consuming task is executable on a second node of the plurality of heterogeneous nodes.
  • In the preferred SilverC embodiment, the fifth program construct is a notify routine and has a form comprising:
    • notify(outpipeName, numberOfElementsWritten);
      wherein outpipeName is a placeholder for a first unique identifier of the fourth program construct and numberOfElementsWritten is a placeholder for an amount of output data produced. Also in the preferred SilverC embodiment, the sixth program construct is a release routine and has a form comprising:
    • release(inpipeName, numberOfElementsRead);
      wherein inpipeName is a placeholder for a second unique identifier of the third program construct and numberOfElementsRead is a placeholder for an amount of input data consumed.
  • The present invention also provides for commencement of the executable task through a seventh program construct having a correspondence to a task manager of the selected node, which may be used to and corresponds to an initialization of a producer count table of the task manager or an initialization of a consumer count table of the task manager. In the preferred SilverC embodiment, the seventh program construct is a ready routine and has a form comprising:
    • ready(pipename, numberOfElements);
      wherein pipename is a placeholder for a unique identifier of either the third program construct or the fourth program construct and numberOfElements is a placeholder for an amount of data which is sufficient for commencement of the executable task.
  • An eighth program construct is used to link the fourth program construct to the third program construct, and corresponds to a selected configuration of the matrix interconnection network to provide a communication path from a selected output port to a selected input port. In the preferred SilverC embodiment, the eighth program construct is a link routine and has a form comprising:
    • link(outpipe, inpipe);
      wherein outpipe is a placeholder for a first unique identifier of an instantiation of a first program construct and a fourth program construct, of a plurality of instantiations, and inpipe is a placeholder for a second unique identifier of an instantiation of a first program construct and a third program construct, of the plurality of instantiations.
  • A ninth program construct may also be utilized to instantiate a program construct of a plurality of program constructs, such as the first program construct, the second program construct, the third program construct, the fourth program construct, and the eighth program construct. In the preferred SilverC embodiment, the ninth program construct is a main function and has a form comprising:
    main( ) {
    ...
    }

    wherein the ellipsis “. . . ” is a placeholder for specification of a program construct to be instantiated. For example, the main() function can be utilized to instantiate a module, with all of its incorporated program constructs such as processes, pipes, and links. In addition, different module and other program construct parameters will allow different instantiations of modules and their included constructs, as mentioned above, such that each instantiation corresponds to a parameter set contained within the program construct.
  • Numerous advantages of the present invention may be readily apparent. The invention facilitates static and dynamic configuration of an adaptive computing device such as the ACE 100. While applicable to many hardware platforms and programming styles, it contains several constructs that directly support the static or dynamic reconfiguration of the MIN 110 and HTMs 810 of the ACE (ACM) 100.
  • From the foregoing, it will be observed that numerous variations and modifications may be effected without departing from the spirit and scope of the novel concept of the invention. It is to be understood that no limitation with respect to the specific methods and apparatus illustrated herein is intended or should be inferred. It is, of course, intended to cover by the appended claims all such modifications as fall within the scope of the claims.

Claims (17)

1-53. (canceled)
54. A computer based system for programmatically reconfiguring an adaptive computing engine (ACE) utilizing a programming language, the ACE having a plurality of heterogeneous nodes that are reconfigurably connected across a matrix interconnection network (MIN) connection domain, the system comprising:
a module construct for receiving a plurality of program instructions and data to be processed on the ACE, the module construct configured to provide a first direct mapping from the programming language to an ACE node identifier domain;
a process construct for instantiating the plurality of program instructions and data as a plurality of threads or tasks on the ACE, the process construct configured to provide a second direct mapping from the programming language to an ACE port identifier domain; and
a pipe construct for providing communication between the plurality of threads or tasks, the pipe construct configured to provide a third direct mapping from the programming language to the MIN connection domain.
55. The system of claim 54 wherein the module construct is configured to contain zero or more process constructs and zero or more pipe constructs.
56. The system of claim 55 wherein the process construct is configured to execute when a configured set of firing conditions are met.
57. The system of claim 56 wherein the configured set of firing conditions is selected from the group consisting of: a task identifier, a number of input ports utilized by the task, for an input port, a first counter value required to trigger the task, for an input port, an initial input counter value, a number of output ports utilized by the task, for an output port, a second counter value required to trigger the task, and for an output port, an initial output counter value.
58. The system of claim 55 wherein a plurality of process constructs are aggregated within the module construct and configured to function cooperatively to perform at least one computation.
59. The system of claim 54 wherein the pipe construct further comprises:
an inpipe for providing a conduit for data into the module construct; and
an outpipe for providing a conduit for data out of the module construct.
60. The system of claim 54 wherein the module construct is compiled into a single execution unit for executing on a hardware configuration of the ACE.
61. The system of claim 54 wherein the module construct is configurable to perform an algorithm selected from the group consisting of: a radix-2 Fast Fourier Transformation (FFT), a radix-Fast Fourier Transformation (FFI′), a radix-2 Inverse Fast Fourier Transformation (IFFT), a radix4 Inverse Fast Fourier Transformation (IFFT), a one-dimensional Discrete Cosine Transformation (DCT), a multi-dimensional Discrete Cosine Transformation (DCT), a finite impulse response (FIR) filtering, a convolutional encoding, a scrambling, a puncturing, an interleaving, a modulation mapping, a Golay correlation, an OVSF code generation, a Haddamard Transformation, a Turbo Decoding, a bit correlation, a Griffiths LMS algorithm, a variable length encoding, an uplink scrambling code generation, a downlink scrambling code generation, a downlink despreading, an uplink spreading, an uplink concatenation, a Viterbi encoding, a Viterbi decoding, a cyclic redundancy coding (CRC), a complex multiplication, a data compression, a motion compensation, a channel searching, a channel acquisition, and a multipath correlation.
62. A method for programmatically reconfiguring an adaptive computing engine (ACE) utilizing a scripting language, the ACE having a plurality of heterogeneous nodes that are reconfigurably connected across a matrix interconnection network (MIN) connection domain, the method comprising:
instantiating a module construct, the module construct having a correspondence to a selected one of the plurality of heterogeneous nodes, wherein the module construct is configured to provide a first direct mapping from the programming language to an ACE node identifier domain;
instantiating a process construct, the process construct having a correspondence to an executable task of the selected one, wherein the process construct is configured to provide a second direct mapping from the programming language to an ACE port identifier domain; and
instantiating a pipe construct, the pipe construct having a correspondence with at least one input port and at least one output port, the pipe construct configured to couple the selected one with the MIN connection domain, wherein the pipe construct configured to provide a third direct mapping from the programming language to the MIN connection domain.
63. The method of claim 62 wherein the module construct is configured to contain zero or more process constructs and zero or more pipe constructs.
64. The method of claim 63 wherein the process construct is configured to execute when a configured set of firing conditions are met.
65. The method of claim 64 wherein the configured set of firing conditions is selected from the group consisting of: a task identifier, a number of input ports utilized by the task, for an input port, a first counter value required to trigger the task, for an input port, an initial input counter value, a number of output ports utilized by the task, for an output port, a second counter value required to trigger the task, and for an output port, an initial output counter value.
66. The method of claim 63 wherein a plurality of process constructs are aggregated within the module construct and configured to function cooperatively to perform at least one computation.
67. The method of claim 62 wherein the pipe construct further comprises:
an inpipe for providing a conduit for data into the module construct; and
an outpipe for providing a conduit for data out of the module construct.
68. The method of claim 62 wherein the module construct is compiled into a single execution unit for executing on a hardware configuration of the ACE.
69. The method of claim 62 wherein the module construct is configurable to perform an algorithm selected from the group consisting of: a radix-2 Fast Fourier Transformation (FFT), a radix-Fast Fourier Transformation (FFI′), a radix-2 Inverse Fast Fourier Transformation (IFFT), a radix4 Inverse Fast Fourier Transformation (IFFT), a one-dimensional Discrete Cosine Transformation (DCT), a multi-dimensional Discrete Cosine Transformation (DCT), a finite impulse response (FIR) filtering, a convolutional encoding, a scrambling, a puncturing, an interleaving, a modulation mapping, a Golay correlation, an OVSF code generation, a Haddamard Transformation, a Turbo Decoding, a bit correlation, a Griffiths LMS algorithm, a variable length encoding, an uplink scrambling code generation, a downlink scrambling code generation, a downlink despreading, an uplink spreading, an uplink concatenation, a Viterbi encoding, a Viterbi decoding, a cyclic redundancy coding (CRC), a complex multiplication, a data compression, a motion compensation, a channel searching, a channel acquisition, and a multipath correlation.
US11/707,301 2003-08-21 2007-02-15 System, method and software for static and dynamic programming and configuration of an adaptive computing architecture Abandoned US20070157166A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/707,301 US20070157166A1 (en) 2003-08-21 2007-02-15 System, method and software for static and dynamic programming and configuration of an adaptive computing architecture

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/645,269 US7200837B2 (en) 2003-08-21 2003-08-21 System, method and software for static and dynamic programming and configuration of an adaptive computing architecture
US11/707,301 US20070157166A1 (en) 2003-08-21 2007-02-15 System, method and software for static and dynamic programming and configuration of an adaptive computing architecture

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/645,269 Continuation US7200837B2 (en) 2003-08-21 2003-08-21 System, method and software for static and dynamic programming and configuration of an adaptive computing architecture

Publications (1)

Publication Number Publication Date
US20070157166A1 true US20070157166A1 (en) 2007-07-05

Family

ID=34194293

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/645,269 Active 2025-10-14 US7200837B2 (en) 2003-08-21 2003-08-21 System, method and software for static and dynamic programming and configuration of an adaptive computing architecture
US11/707,301 Abandoned US20070157166A1 (en) 2003-08-21 2007-02-15 System, method and software for static and dynamic programming and configuration of an adaptive computing architecture

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/645,269 Active 2025-10-14 US7200837B2 (en) 2003-08-21 2003-08-21 System, method and software for static and dynamic programming and configuration of an adaptive computing architecture

Country Status (1)

Country Link
US (2) US7200837B2 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070130622A1 (en) * 2005-11-21 2007-06-07 Docomo Communications Laboratories Usa, Inc. Method and apparatus for verifying and ensuring safe handling of notifications
US20070226433A1 (en) * 2002-11-22 2007-09-27 Quicksilver Technology, Inc. External memory controller node
US20090055596A1 (en) * 2007-08-20 2009-02-26 Convey Computer Multi-processor system having at least one processor that comprises a dynamically reconfigurable instruction set
US20090064095A1 (en) * 2007-08-29 2009-03-05 Convey Computer Compiler for generating an executable comprising instructions for a plurality of different instruction sets
US20090070553A1 (en) * 2007-09-12 2009-03-12 Convey Computer Dispatch mechanism for dispatching insturctions from a host processor to a co-processor
US20100036997A1 (en) * 2007-08-20 2010-02-11 Convey Computer Multiple data channel memory module architecture
US20100037024A1 (en) * 2008-08-05 2010-02-11 Convey Computer Memory interleave for heterogeneous computing
US20100115233A1 (en) * 2008-10-31 2010-05-06 Convey Computer Dynamically-selectable vector register partitioning
WO2010144692A1 (en) * 2009-06-10 2010-12-16 Google Inc. Productive distribution for result optimization within a hierarchical architecture
WO2011091323A1 (en) * 2010-01-21 2011-07-28 Qst Holdings, Llc A method and apparatus for a general-purpose, multiple-core system for implementing stream-based computations
US8205066B2 (en) 2008-10-31 2012-06-19 Convey Computer Dynamically configured coprocessor for different extended instruction set personality specific to application program with shared memory storing instructions invisibly dispatched from host processor
US20130086553A1 (en) * 2011-09-29 2013-04-04 Mark Grechanik Systems and methods for finding project-related information by clustering applications into related concept categories
US8423745B1 (en) 2009-11-16 2013-04-16 Convey Computer Systems and methods for mapping a neighborhood of data to general registers of a processing element
US20140157247A1 (en) * 2012-11-30 2014-06-05 Oracle International Corporation Enabling Symbol Resolution of Private Symbols in Legacy Programs and Optimizing Access to the Private Symbols
US8959495B2 (en) 2012-09-14 2015-02-17 Oracle International Corporation Unifying static and dynamic compiler optimizations in source-code bases
US20160299998A1 (en) * 2013-12-12 2016-10-13 Tokyo Institute Of Technology Logic circuit generation device and method
CN106059528A (en) * 2016-06-12 2016-10-26 西安电子工程研究所 Length-variable single-rate FIR digital filter design method
US9710384B2 (en) 2008-01-04 2017-07-18 Micron Technology, Inc. Microprocessor architecture having alternative memory access paths
US10430190B2 (en) 2012-06-07 2019-10-01 Micron Technology, Inc. Systems and methods for selectively controlling multithreaded execution of executable code segments
CN110389991A (en) * 2018-04-12 2019-10-29 腾讯大地通途(北京)科技有限公司 The processing method that reports an error, device and the storage medium in map section

Families Citing this family (85)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7139743B2 (en) 2000-04-07 2006-11-21 Washington University Associative database scanning and information retrieval using FPGA devices
US6836839B2 (en) * 2001-03-22 2004-12-28 Quicksilver Technology, Inc. Adaptive integrated circuitry with heterogeneous and reconfigurable matrices of diverse and adaptive computational units having fixed, application specific computational elements
US9110692B2 (en) 2001-03-22 2015-08-18 Frederick Master Method and apparatus for a compiler and related components for stream-based computations for a general-purpose, multiple-core system
US7400668B2 (en) * 2001-03-22 2008-07-15 Qst Holdings, Llc Method and system for implementing a system acquisition function for use with a communication device
US7962716B2 (en) 2001-03-22 2011-06-14 Qst Holdings, Inc. Adaptive integrated circuitry with heterogeneous and reconfigurable matrices of diverse and adaptive computational units having fixed, application specific computational elements
US7752419B1 (en) 2001-03-22 2010-07-06 Qst Holdings, Llc Method and system for managing hardware resources to implement system functions using an adaptive computing architecture
US6577678B2 (en) 2001-05-08 2003-06-10 Quicksilver Technology Method and system for reconfigurable channel coding
US20090161568A1 (en) * 2007-12-21 2009-06-25 Charles Kastner TCP data reassembly
US20090006659A1 (en) * 2001-10-19 2009-01-01 Collins Jack M Advanced mezzanine card for digital network data inspection
US7716330B2 (en) * 2001-10-19 2010-05-11 Global Velocity, Inc. System and method for controlling transmission of data packets over an information network
US8412915B2 (en) * 2001-11-30 2013-04-02 Altera Corporation Apparatus, system and method for configuration of adaptive integrated circuitry having heterogeneous computational elements
US6986021B2 (en) * 2001-11-30 2006-01-10 Quick Silver Technology, Inc. Apparatus, method, system and executable module for configuration and operation of adaptive integrated circuitry having fixed, application specific computational elements
US7602740B2 (en) * 2001-12-10 2009-10-13 Qst Holdings, Inc. System for adapting device standards after manufacture
US20030108012A1 (en) * 2001-12-12 2003-06-12 Quicksilver Technology, Inc. Method and system for detecting and identifying scrambling codes
US7088825B2 (en) * 2001-12-12 2006-08-08 Quicksilver Technology, Inc. Low I/O bandwidth method and system for implementing detection and identification of scrambling codes
US7215701B2 (en) 2001-12-12 2007-05-08 Sharad Sambhwani Low I/O bandwidth method and system for implementing detection and identification of scrambling codes
US7231508B2 (en) * 2001-12-13 2007-06-12 Quicksilver Technologies Configurable finite state machine for operation of microinstruction providing execution enable control value
US7403981B2 (en) * 2002-01-04 2008-07-22 Quicksilver Technology, Inc. Apparatus and method for adaptive multimedia reception and transmission in communication environments
US7660984B1 (en) 2003-05-13 2010-02-09 Quicksilver Technology Method and system for achieving individualized protected space in an operating system
US7328414B1 (en) * 2003-05-13 2008-02-05 Qst Holdings, Llc Method and system for creating and programming an adaptive computing engine
US8108656B2 (en) 2002-08-29 2012-01-31 Qst Holdings, Llc Task definition for specifying resource requirements
US7937591B1 (en) 2002-10-25 2011-05-03 Qst Holdings, Llc Method and system for providing a device which can be adapted on an ongoing basis
US8276135B2 (en) * 2002-11-07 2012-09-25 Qst Holdings Llc Profiling of software and circuit designs utilizing data operation analyses
JP2006526227A (en) 2003-05-23 2006-11-16 ワシントン ユニヴァーシティー Intelligent data storage and processing using FPGA devices
US10572824B2 (en) 2003-05-23 2020-02-25 Ip Reservoir, Llc System and method for low latency multi-functional pipeline with correlation logic and selectively activated/deactivated pipelined data processing engines
US7609297B2 (en) * 2003-06-25 2009-10-27 Qst Holdings, Inc. Configurable hardware based digital imaging apparatus
US8296764B2 (en) * 2003-08-14 2012-10-23 Nvidia Corporation Internal synchronization control for adaptive integrated circuitry
US7353516B2 (en) * 2003-08-14 2008-04-01 Nvidia Corporation Data flow control for adaptive integrated circuitry
US20050108727A1 (en) * 2003-09-11 2005-05-19 Finisar Corporation Application binding in a network environment
US7602785B2 (en) * 2004-02-09 2009-10-13 Washington University Method and system for performing longest prefix matching for network address lookup using bloom filters
US7424698B2 (en) * 2004-02-27 2008-09-09 Intel Corporation Allocation of combined or separate data and control planes
US20050223110A1 (en) * 2004-03-30 2005-10-06 Intel Corporation Heterogeneous building block scalability
DE102004018976A1 (en) * 2004-04-20 2005-11-17 Gude, Michael, Dr. Improved gate array or FPGA
TWI238638B (en) * 2004-04-22 2005-08-21 Benq Corp Method and device for multimedia processing
US7233532B2 (en) * 2004-04-30 2007-06-19 Xilinx, Inc. Reconfiguration port for dynamic reconfiguration-system monitor interface
US7102555B2 (en) * 2004-04-30 2006-09-05 Xilinx, Inc. Boundary-scan circuit used for analog and digital testing of an integrated circuit
US7138820B2 (en) * 2004-04-30 2006-11-21 Xilinx, Inc. System monitor in a programmable logic device
US7109750B2 (en) * 2004-04-30 2006-09-19 Xilinx, Inc. Reconfiguration port for dynamic reconfiguration-controller
US7218137B2 (en) * 2004-04-30 2007-05-15 Xilinx, Inc. Reconfiguration port for dynamic reconfiguration
US7599299B2 (en) * 2004-04-30 2009-10-06 Xilinx, Inc. Dynamic reconfiguration of a system monitor (DRPORT)
US7126372B2 (en) * 2004-04-30 2006-10-24 Xilinx, Inc. Reconfiguration port for dynamic reconfiguration—sub-frame access for reconfiguration
US20060004902A1 (en) * 2004-06-30 2006-01-05 Siva Simanapalli Reconfigurable circuit with programmable split adder
EP1784719A4 (en) * 2004-08-24 2011-04-13 Univ Washington Methods and systems for content detection in a reconfigurable hardware
KR100663488B1 (en) * 2004-10-29 2007-01-02 삼성전자주식회사 Communication system with reconfigurable hardware structure and reconfiguration method therefor
EP1859378A2 (en) * 2005-03-03 2007-11-28 Washington University Method and apparatus for performing biosequence similarity searching
US20070106720A1 (en) * 2005-11-10 2007-05-10 Samsung Electronics Co., Ltd. Reconfigurable signal processor architecture using multiple complex multiply-accumulate units
US7954114B2 (en) * 2006-01-26 2011-05-31 Exegy Incorporated Firmware socket module for FPGA-based pipeline processing
US7735061B2 (en) * 2006-05-03 2010-06-08 Epic Games, Inc. Efficient encoding and access of mathematically precise variable precision numeric types
US7921046B2 (en) * 2006-06-19 2011-04-05 Exegy Incorporated High speed processing of financial information using FPGA devices
US8228328B1 (en) 2006-11-03 2012-07-24 Nvidia Corporation Early Z testing for multiple render targets
US7660793B2 (en) 2006-11-13 2010-02-09 Exegy Incorporated Method and system for high performance integration, processing and searching of structured and unstructured data using coprocessors
US8326819B2 (en) 2006-11-13 2012-12-04 Exegy Incorporated Method and system for high performance data metatagging and data indexing using coprocessors
US8965136B2 (en) * 2006-11-14 2015-02-24 West Virginia University Pattern detection based on fractal analysis
WO2008067125A2 (en) * 2006-11-14 2008-06-05 West Virginia University Global quantitative characterization of patterns using fractal analysis
US9008452B2 (en) * 2006-11-14 2015-04-14 West Virginia University Global quantitative characterization of patterns using fractal analysis
US8983208B2 (en) * 2006-11-14 2015-03-17 West Virginia University Pattern matching based on global quantitative characterization of patterns
US20080182021A1 (en) * 2007-01-31 2008-07-31 Simka Harsono S Continuous ultra-thin copper film formed using a low thermal budget
US9081901B2 (en) * 2007-10-31 2015-07-14 Raytheon Company Means of control for reconfigurable computers
US20090119441A1 (en) * 2007-11-06 2009-05-07 Hewlett-Packard Development Company, L.P. Heterogeneous Parallel Bus Switch
US7823092B1 (en) * 2007-11-23 2010-10-26 Altera Corporation Method and apparatus for implementing a parameterizable filter block with an electronic design automation tool
WO2009140707A1 (en) * 2008-05-21 2009-11-26 Technische Universität Wien Cross-domain soc architecture for dependable embedded applications
US8136063B2 (en) * 2008-11-14 2012-03-13 Synopsys, Inc. Unfolding algorithm in multirate system folding
US20120095893A1 (en) 2008-12-15 2012-04-19 Exegy Incorporated Method and apparatus for high-speed processing of financial market depth data
US9444757B2 (en) 2009-04-27 2016-09-13 Intel Corporation Dynamic configuration of processing modules in a network communications processor architecture
US8407707B2 (en) * 2009-05-18 2013-03-26 Lsi Corporation Task queuing in a network communications processor architecture
US9461930B2 (en) 2009-04-27 2016-10-04 Intel Corporation Modifying data streams without reordering in a multi-thread, multi-flow network processor
US8321870B2 (en) * 2009-08-14 2012-11-27 General Electric Company Method and system for distributed computation having sub-task processing and sub-solution redistribution
US8656496B2 (en) * 2010-11-22 2014-02-18 International Business Machines Corporations Global variable security analysis
US10037568B2 (en) 2010-12-09 2018-07-31 Ip Reservoir, Llc Method and apparatus for managing orders in financial markets
US8789065B2 (en) 2012-06-08 2014-07-22 Throughputer, Inc. System and method for input data load adaptive parallel processing
US9448847B2 (en) 2011-07-15 2016-09-20 Throughputer, Inc. Concurrent program execution optimization
US9047243B2 (en) 2011-12-14 2015-06-02 Ip Reservoir, Llc Method and apparatus for low latency data distribution
US9990393B2 (en) 2012-03-27 2018-06-05 Ip Reservoir, Llc Intelligent feed switch
US11436672B2 (en) 2012-03-27 2022-09-06 Exegy Incorporated Intelligent switch for processing financial market data
US10121196B2 (en) 2012-03-27 2018-11-06 Ip Reservoir, Llc Offload processing of data packets containing financial market data
US10650452B2 (en) 2012-03-27 2020-05-12 Ip Reservoir, Llc Offload processing of data packets
CN104252391B (en) * 2013-06-28 2017-09-12 国际商业机器公司 Method and apparatus for managing multiple operations in distributed computing system
TWI507989B (en) * 2013-08-08 2015-11-11 Nat Univ Tsing Hua Method of resource-oriented power analysis for embedded system
CN104615488B (en) * 2015-01-16 2018-01-19 华为技术有限公司 The method and apparatus of task scheduling in heterogeneous multi-core reconfigurable calculating platform
US10082593B2 (en) * 2016-03-01 2018-09-25 Gowell International, Llc Method and apparatus for synthetic magnetic sensor aperture using eddy current time transient measurement for downhole applications
EP3560135A4 (en) 2016-12-22 2020-08-05 IP Reservoir, LLC Pipelines for hardware-accelerated machine learning
US11886377B2 (en) 2019-09-10 2024-01-30 Cornami, Inc. Reconfigurable arithmetic engine circuit
US20220083888A1 (en) * 2020-09-11 2022-03-17 International Business Machines Corporation Mapping conditional execution logic to quantum computing resources
CN112180788B (en) * 2020-09-28 2022-03-08 西安微电子技术研究所 Control platform architecture design method, storage medium and device of dynamic association context
CN113468099B (en) * 2021-05-31 2022-02-08 深圳致星科技有限公司 Reconfigurable computing device, processor and method

Citations (99)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3938639A (en) * 1973-11-28 1976-02-17 The Cornelius Company Portable dispenser for mixed beverages
US4076145A (en) * 1976-08-09 1978-02-28 The Cornelius Company Method and apparatus for dispensing a beverage
US4143793A (en) * 1977-06-13 1979-03-13 The Cornelius Company Apparatus and method for dispensing a carbonated beverage
US4181242A (en) * 1978-05-30 1980-01-01 The Cornelius Company Method and apparatus for dispensing a beverage
US4252253A (en) * 1978-02-21 1981-02-24 Mcneil Corporation Drink dispenser having central control of plural dispensing stations
US4377246A (en) * 1977-06-13 1983-03-22 The Cornelius Company Apparatus for dispensing a carbonated beverage
US4578799A (en) * 1983-10-05 1986-03-25 Codenoll Technology Corporation Method and apparatus for recovering data and clock information from a self-clocking data stream
US4577782A (en) * 1983-05-02 1986-03-25 The Cornelius Company Beverage dispensing station
US4719056A (en) * 1984-06-25 1988-01-12 Isoworth Limited Fluid treatment
US4726494A (en) * 1986-02-10 1988-02-23 Isoworth Limited Beverage dipensing apparatus
US4800492A (en) * 1987-05-13 1989-01-24 The Coca-Cola Company Data logger for a post-mix beverage dispensing system
US4811214A (en) * 1986-11-14 1989-03-07 Princeton University Multinode reconfigurable pipeline computer
US4901887A (en) * 1988-08-08 1990-02-20 Burton John W Beverage dispensing system
US4982876A (en) * 1986-02-10 1991-01-08 Isoworth Limited Carbonation apparatus
US4993604A (en) * 1985-09-13 1991-02-19 The Coca-Cola Company Low-cost post-mix beverage dispenser and syrup supply system therefor
US5090015A (en) * 1989-02-06 1992-02-18 Motorola, Inc. Programmable array logic self-checking system
US5190189A (en) * 1990-10-30 1993-03-02 Imi Cornelius Inc. Low height beverage dispensing apparatus
US5190083A (en) * 1990-02-27 1993-03-02 The Coca-Cola Company Multiple fluid space dispenser and monitor
US5193151A (en) * 1989-08-30 1993-03-09 Digital Equipment Corporation Delay-based congestion avoidance in computer networks
US5193718A (en) * 1991-06-25 1993-03-16 Imi Cornelius Inc. Quick electronic disconnect for a beverage dispensing valve
US5280711A (en) * 1993-02-25 1994-01-25 Imi Cornelius Inc. Low cost beverage dispensing apparatus
US5297400A (en) * 1993-02-17 1994-03-29 Maytag Corporation Liquid dispensing assembly for a refrigerator
US5379343A (en) * 1993-02-26 1995-01-03 Motorola, Inc. Detection of unauthorized use of software applications in communication units
US5381546A (en) * 1987-04-13 1995-01-10 Gte Laboratories Incorporated Control process for allocating services in communications systems
US5381550A (en) * 1991-12-13 1995-01-10 Thinking Machines Corporation System and method for compiling a source code supporting data parallel variables
US5388212A (en) * 1993-02-26 1995-02-07 Motorola Inc. Detecting unauthorized modification of communication unit based on comparison between stored hardware identification code and hardware identification code generated from operational platform identification code
US5392960A (en) * 1992-11-13 1995-02-28 Wilshire Partners Postmix beverage dispenser and a method for making a beverage dispenser
US5490165A (en) * 1993-10-28 1996-02-06 Qualcomm Incorporated Demodulation element assignment in a system capable of receiving multiple signals
US5491823A (en) * 1994-01-25 1996-02-13 Silicon Graphics, Inc. Loop scheduler
US5594657A (en) * 1993-09-27 1997-01-14 Lucent Technologies Inc. System for synthesizing field programmable gate array implementations from high level circuit descriptions
US5600810A (en) * 1994-12-09 1997-02-04 Mitsubishi Electric Information Technology Center America, Inc. Scaleable very long instruction word processor with parallelism matching
US5600844A (en) * 1991-09-20 1997-02-04 Shaw; Venson M. Single chip integrated circuit system architecture for document installation set computing
US5602833A (en) * 1994-12-19 1997-02-11 Qualcomm Incorporated Method and apparatus for using Walsh shift keying in a spread spectrum communication system
US5603043A (en) * 1992-11-05 1997-02-11 Giga Operations Corporation System for compiling algorithmic language source code for implementation in programmable hardware
US5607083A (en) * 1992-05-22 1997-03-04 Imi Cornelius Inc. Beverage dispensing valve
US5608643A (en) * 1994-09-01 1997-03-04 General Programming Holdings, Inc. System for managing multiple dispensing units and method of operation
US5611867A (en) * 1995-04-12 1997-03-18 Maytag Corporation Method of selecting a wash cycle for an appliance
US5706191A (en) * 1995-01-19 1998-01-06 Gas Research Institute Appliance interface apparatus and automated residence management system
US5706976A (en) * 1995-12-21 1998-01-13 Purkey; Jay Floyd Vending machine inventory control device
US5712996A (en) * 1993-03-15 1998-01-27 Siemens Aktiengesellschaft Process for dividing instructions of a computer program into instruction groups for parallel processing
US5720002A (en) * 1993-06-14 1998-02-17 Motorola Inc. Neural network and method of using same
US5721693A (en) * 1995-01-07 1998-02-24 Lg Electronics Inc. Electric home appliance real use state information collection and analysis apparatus
US5721854A (en) * 1993-11-02 1998-02-24 International Business Machines Corporation Method and apparatus for dynamic conversion of computer instructions
US5734808A (en) * 1993-09-28 1998-03-31 Namco Ltd. Pipeline processing device, clipping processing device, three-dimensional simulator device and pipeline processing method
US5732563A (en) * 1993-09-22 1998-03-31 Imi Cornelius Inc. Electronically controlled beverage dispenser
US5860021A (en) * 1997-04-24 1999-01-12 Klingman; Edwin E. Single chip microcontroller having down-loadable memory organization supporting "shadow" personality, optimized for bi-directional data transfers over a communication channel
US5862961A (en) * 1993-10-26 1999-01-26 Imi Cornelius Inc. Connection device for dispensing fluid from a bottle
US5870427A (en) * 1993-04-14 1999-02-09 Qualcomm Incorporated Method for multi-mode handoff using preliminary time alignment of a mobile station operating in analog mode
US5873045A (en) * 1997-10-29 1999-02-16 International Business Machines Corporation Mobile client computer with radio frequency transceiver
US5881106A (en) * 1994-09-05 1999-03-09 Sgs-Thomson Microelectronics S.A. Signal processing circuit to implement a Viterbi algorithm
US5884284A (en) * 1995-03-09 1999-03-16 Continental Cablevision, Inc. Telecommunication user account management system and method
US5886537A (en) * 1997-05-05 1999-03-23 Macias; Nicholas J. Self-reconfigurable parallel processor made from regularly-connected self-dual code/data processing cells
US6016395A (en) * 1996-10-18 2000-01-18 Samsung Electronics Co., Ltd. Programming a vector processor and parallel programming of an asymmetric dual multiprocessor comprised of a vector processor and a risc processor
US6021492A (en) * 1996-10-09 2000-02-01 Hewlett-Packard Company Software metering management of remote computing devices
US6021186A (en) * 1995-04-17 2000-02-01 Ricoh Company Ltd. Automatic capture and processing of facsimile transmissions
US6023755A (en) * 1992-07-29 2000-02-08 Virtual Computer Corporation Computer with programmable arrays which are reconfigurable in response to instructions to be executed
US6023742A (en) * 1996-07-18 2000-02-08 University Of Washington Reconfigurable computing architecture for providing pipelined data paths
US6028610A (en) * 1995-08-04 2000-02-22 Sun Microsystems, Inc. Geometry instructions for decompression of three-dimensional graphics data
US6175854B1 (en) * 1996-06-11 2001-01-16 Ameritech Services, Inc. Computer system architecture and method for multi-user, real-time applications
US6175892B1 (en) * 1998-06-19 2001-01-16 Hitachi America. Ltd. Registers and methods for accessing registers for use in a single instruction multiple data system
US6181981B1 (en) * 1996-05-15 2001-01-30 Marconi Communications Limited Apparatus and method for improved vending machine inventory maintenance
US6185418B1 (en) * 1997-11-07 2001-02-06 Lucent Technologies Inc. Adaptive digital radio communication system
US6192040B1 (en) * 1999-04-16 2001-02-20 Motorola, Inc. Method and apparatus for producing channel estimate of a communication channel in a CDMA communication system
US6192388B1 (en) * 1996-06-20 2001-02-20 Avid Technology, Inc. Detecting available computers to participate in computationally complex distributed processing problem
US6192255B1 (en) * 1992-12-15 2001-02-20 Texas Instruments Incorporated Communication system and methods for enhanced information transfer
US6195788B1 (en) * 1997-10-17 2001-02-27 Altera Corporation Mapping heterogeneous logic elements in a programmable logic device
US20020010848A1 (en) * 2000-05-29 2002-01-24 Shoichi Kamano Data processing system
US20020013937A1 (en) * 1999-02-17 2002-01-31 Ostanevich Alexander Y. Register economy heuristic for a cycle driven multiple issue instruction scheduler
US20020013799A1 (en) * 2000-05-11 2002-01-31 Blaker David M. Accelerated montgomery multiplication using plural multipliers
US20020015435A1 (en) * 2000-07-31 2002-02-07 Keith Rieken Apparatus and method for configurable multi-dwell search engine for spread spectrum applications
US20020015439A1 (en) * 1996-04-25 2002-02-07 Sanjai Kohli GPS system for navigating a vehicle
US6347346B1 (en) * 1999-06-30 2002-02-12 Chameleon Systems, Inc. Local memory unit system with global access for use on reconfigurable chips
US6346824B1 (en) * 1996-04-09 2002-02-12 Xilinx, Inc. Dedicated function fabric for use in field programmable gate arrays
US6349394B1 (en) * 1999-03-31 2002-02-19 International Business Machines Corporation Performance monitoring in a NUMA computer
US20020023210A1 (en) * 2000-04-12 2002-02-21 Mark Tuomenoksa Method and system for managing and configuring virtual private networks
US20020024993A1 (en) * 1999-12-30 2002-02-28 Ravi Subramanian Method and apparatus to support multi standard, multi service base-stations for wireless voice and data networks
US20020024942A1 (en) * 2000-08-30 2002-02-28 Nec Corporation Cell search method and circuit in W-CDMA system
US20030007606A1 (en) * 2001-02-01 2003-01-09 Estech Systems, Inc. Service observing in a voice over IP telephone system
US6507947B1 (en) * 1999-08-20 2003-01-14 Hewlett-Packard Company Programmatic synthesis of processor element arrays
US20030012270A1 (en) * 2000-10-06 2003-01-16 Changming Zhou Receiver
US6510138B1 (en) * 1999-02-25 2003-01-21 Fairchild Semiconductor Corporation Network switch with head of line input buffer queue clearing
US6510510B1 (en) * 1996-01-25 2003-01-21 Analog Devices, Inc. Digital signal processor having distributed register file
US20030018446A1 (en) * 2001-06-29 2003-01-23 National Instruments Corporation Graphical program node for generating a measurement program
US20030018700A1 (en) * 2001-03-26 2003-01-23 Giroti Sudhir K. Unified XML voice and data media converging switch and application delivery system
US20030023830A1 (en) * 2001-07-25 2003-01-30 Hogenauer Eugene B. Method and system for encoding instructions for a VLIW that reduces instruction memory requirements
US20030026242A1 (en) * 1997-06-18 2003-02-06 Harri Jokinen Method for identifying base stations of a time division cellular network in a mobile station and mobile station
US20030030004A1 (en) * 2001-01-31 2003-02-13 General Electric Company Shared memory control between detector framing node and processor
US6675265B2 (en) * 2000-06-10 2004-01-06 Hewlett-Packard Development Company, L.P. Multiprocessor cache coherence system and method in which processor nodes and input/output nodes are equal participants
US20040006584A1 (en) * 2000-08-08 2004-01-08 Ivo Vandeweerd Array of parallel programmable processing engines and deterministic method of operating the same
US20040010645A1 (en) * 2002-06-25 2004-01-15 Quicksilver Technology, Inc. Uniform interface for a functional node in an adaptive computing engine
US20040015970A1 (en) * 2002-03-06 2004-01-22 Scheuermann W. James Method and system for data flow control of execution nodes of an adaptive computing engine (ACE)
US20040025159A1 (en) * 2002-06-25 2004-02-05 Quicksilver Technology, Inc. Hardware task manager
US6691148B1 (en) * 1998-03-13 2004-02-10 Verizon Corporate Services Group Inc. Framework for providing quality of service requirements in a distributed object-oriented computer system
US6836839B2 (en) * 2001-03-22 2004-12-28 Quicksilver Technology, Inc. Adaptive integrated circuitry with heterogeneous and reconfigurable matrices of diverse and adaptive computational units having fixed, application specific computational elements
US6985517B2 (en) * 2000-11-09 2006-01-10 Matsushita Electric Industrial Co., Ltd. Matched filter and correlation detection method
US6986021B2 (en) * 2001-11-30 2006-01-10 Quick Silver Technology, Inc. Apparatus, method, system and executable module for configuration and operation of adaptive integrated circuitry having fixed, application specific computational elements
US6988139B1 (en) * 2002-04-26 2006-01-17 Microsoft Corporation Distributed computing of a job corresponding to a plurality of predefined tasks
US7174432B2 (en) * 2003-08-19 2007-02-06 Nvidia Corporation Asynchronous, independent and multiple process shared memory system in an adaptive computing architecture
US7325123B2 (en) * 2001-03-22 2008-01-29 Qst Holdings, Llc Hierarchical interconnect for configuring separate interconnects for each group of fixed and diverse computational elements

Patent Citations (100)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3938639A (en) * 1973-11-28 1976-02-17 The Cornelius Company Portable dispenser for mixed beverages
US4076145A (en) * 1976-08-09 1978-02-28 The Cornelius Company Method and apparatus for dispensing a beverage
US4143793A (en) * 1977-06-13 1979-03-13 The Cornelius Company Apparatus and method for dispensing a carbonated beverage
US4377246A (en) * 1977-06-13 1983-03-22 The Cornelius Company Apparatus for dispensing a carbonated beverage
US4252253A (en) * 1978-02-21 1981-02-24 Mcneil Corporation Drink dispenser having central control of plural dispensing stations
US4181242A (en) * 1978-05-30 1980-01-01 The Cornelius Company Method and apparatus for dispensing a beverage
US4577782A (en) * 1983-05-02 1986-03-25 The Cornelius Company Beverage dispensing station
US4578799A (en) * 1983-10-05 1986-03-25 Codenoll Technology Corporation Method and apparatus for recovering data and clock information from a self-clocking data stream
US4719056A (en) * 1984-06-25 1988-01-12 Isoworth Limited Fluid treatment
US4993604A (en) * 1985-09-13 1991-02-19 The Coca-Cola Company Low-cost post-mix beverage dispenser and syrup supply system therefor
US4726494A (en) * 1986-02-10 1988-02-23 Isoworth Limited Beverage dipensing apparatus
US4982876A (en) * 1986-02-10 1991-01-08 Isoworth Limited Carbonation apparatus
US4811214A (en) * 1986-11-14 1989-03-07 Princeton University Multinode reconfigurable pipeline computer
US5381546A (en) * 1987-04-13 1995-01-10 Gte Laboratories Incorporated Control process for allocating services in communications systems
US4800492A (en) * 1987-05-13 1989-01-24 The Coca-Cola Company Data logger for a post-mix beverage dispensing system
US4901887A (en) * 1988-08-08 1990-02-20 Burton John W Beverage dispensing system
US5090015A (en) * 1989-02-06 1992-02-18 Motorola, Inc. Programmable array logic self-checking system
US5193151A (en) * 1989-08-30 1993-03-09 Digital Equipment Corporation Delay-based congestion avoidance in computer networks
US5190083A (en) * 1990-02-27 1993-03-02 The Coca-Cola Company Multiple fluid space dispenser and monitor
US5190189A (en) * 1990-10-30 1993-03-02 Imi Cornelius Inc. Low height beverage dispensing apparatus
US5193718A (en) * 1991-06-25 1993-03-16 Imi Cornelius Inc. Quick electronic disconnect for a beverage dispensing valve
US5600844A (en) * 1991-09-20 1997-02-04 Shaw; Venson M. Single chip integrated circuit system architecture for document installation set computing
US5381550A (en) * 1991-12-13 1995-01-10 Thinking Machines Corporation System and method for compiling a source code supporting data parallel variables
US5607083A (en) * 1992-05-22 1997-03-04 Imi Cornelius Inc. Beverage dispensing valve
US6023755A (en) * 1992-07-29 2000-02-08 Virtual Computer Corporation Computer with programmable arrays which are reconfigurable in response to instructions to be executed
US5603043A (en) * 1992-11-05 1997-02-11 Giga Operations Corporation System for compiling algorithmic language source code for implementation in programmable hardware
US5392960A (en) * 1992-11-13 1995-02-28 Wilshire Partners Postmix beverage dispenser and a method for making a beverage dispenser
US6192255B1 (en) * 1992-12-15 2001-02-20 Texas Instruments Incorporated Communication system and methods for enhanced information transfer
US5297400A (en) * 1993-02-17 1994-03-29 Maytag Corporation Liquid dispensing assembly for a refrigerator
US5280711A (en) * 1993-02-25 1994-01-25 Imi Cornelius Inc. Low cost beverage dispensing apparatus
US5379343A (en) * 1993-02-26 1995-01-03 Motorola, Inc. Detection of unauthorized use of software applications in communication units
US5388212A (en) * 1993-02-26 1995-02-07 Motorola Inc. Detecting unauthorized modification of communication unit based on comparison between stored hardware identification code and hardware identification code generated from operational platform identification code
US5712996A (en) * 1993-03-15 1998-01-27 Siemens Aktiengesellschaft Process for dividing instructions of a computer program into instruction groups for parallel processing
US5870427A (en) * 1993-04-14 1999-02-09 Qualcomm Incorporated Method for multi-mode handoff using preliminary time alignment of a mobile station operating in analog mode
US5720002A (en) * 1993-06-14 1998-02-17 Motorola Inc. Neural network and method of using same
US5732563A (en) * 1993-09-22 1998-03-31 Imi Cornelius Inc. Electronically controlled beverage dispenser
US5594657A (en) * 1993-09-27 1997-01-14 Lucent Technologies Inc. System for synthesizing field programmable gate array implementations from high level circuit descriptions
US5734808A (en) * 1993-09-28 1998-03-31 Namco Ltd. Pipeline processing device, clipping processing device, three-dimensional simulator device and pipeline processing method
US5862961A (en) * 1993-10-26 1999-01-26 Imi Cornelius Inc. Connection device for dispensing fluid from a bottle
US5490165A (en) * 1993-10-28 1996-02-06 Qualcomm Incorporated Demodulation element assignment in a system capable of receiving multiple signals
US5721854A (en) * 1993-11-02 1998-02-24 International Business Machines Corporation Method and apparatus for dynamic conversion of computer instructions
US5491823A (en) * 1994-01-25 1996-02-13 Silicon Graphics, Inc. Loop scheduler
US5608643A (en) * 1994-09-01 1997-03-04 General Programming Holdings, Inc. System for managing multiple dispensing units and method of operation
US5881106A (en) * 1994-09-05 1999-03-09 Sgs-Thomson Microelectronics S.A. Signal processing circuit to implement a Viterbi algorithm
US5600810A (en) * 1994-12-09 1997-02-04 Mitsubishi Electric Information Technology Center America, Inc. Scaleable very long instruction word processor with parallelism matching
US5602833A (en) * 1994-12-19 1997-02-11 Qualcomm Incorporated Method and apparatus for using Walsh shift keying in a spread spectrum communication system
US5721693A (en) * 1995-01-07 1998-02-24 Lg Electronics Inc. Electric home appliance real use state information collection and analysis apparatus
US5706191A (en) * 1995-01-19 1998-01-06 Gas Research Institute Appliance interface apparatus and automated residence management system
US5884284A (en) * 1995-03-09 1999-03-16 Continental Cablevision, Inc. Telecommunication user account management system and method
US5611867A (en) * 1995-04-12 1997-03-18 Maytag Corporation Method of selecting a wash cycle for an appliance
US6021186A (en) * 1995-04-17 2000-02-01 Ricoh Company Ltd. Automatic capture and processing of facsimile transmissions
US6028610A (en) * 1995-08-04 2000-02-22 Sun Microsystems, Inc. Geometry instructions for decompression of three-dimensional graphics data
US5706976A (en) * 1995-12-21 1998-01-13 Purkey; Jay Floyd Vending machine inventory control device
US6510510B1 (en) * 1996-01-25 2003-01-21 Analog Devices, Inc. Digital signal processor having distributed register file
US6346824B1 (en) * 1996-04-09 2002-02-12 Xilinx, Inc. Dedicated function fabric for use in field programmable gate arrays
US20020015439A1 (en) * 1996-04-25 2002-02-07 Sanjai Kohli GPS system for navigating a vehicle
US6181981B1 (en) * 1996-05-15 2001-01-30 Marconi Communications Limited Apparatus and method for improved vending machine inventory maintenance
US6175854B1 (en) * 1996-06-11 2001-01-16 Ameritech Services, Inc. Computer system architecture and method for multi-user, real-time applications
US6192388B1 (en) * 1996-06-20 2001-02-20 Avid Technology, Inc. Detecting available computers to participate in computationally complex distributed processing problem
US6023742A (en) * 1996-07-18 2000-02-08 University Of Washington Reconfigurable computing architecture for providing pipelined data paths
US6021492A (en) * 1996-10-09 2000-02-01 Hewlett-Packard Company Software metering management of remote computing devices
US6016395A (en) * 1996-10-18 2000-01-18 Samsung Electronics Co., Ltd. Programming a vector processor and parallel programming of an asymmetric dual multiprocessor comprised of a vector processor and a risc processor
US5860021A (en) * 1997-04-24 1999-01-12 Klingman; Edwin E. Single chip microcontroller having down-loadable memory organization supporting "shadow" personality, optimized for bi-directional data transfers over a communication channel
US5886537A (en) * 1997-05-05 1999-03-23 Macias; Nicholas J. Self-reconfigurable parallel processor made from regularly-connected self-dual code/data processing cells
US20030026242A1 (en) * 1997-06-18 2003-02-06 Harri Jokinen Method for identifying base stations of a time division cellular network in a mobile station and mobile station
US6195788B1 (en) * 1997-10-17 2001-02-27 Altera Corporation Mapping heterogeneous logic elements in a programmable logic device
US5873045A (en) * 1997-10-29 1999-02-16 International Business Machines Corporation Mobile client computer with radio frequency transceiver
US6185418B1 (en) * 1997-11-07 2001-02-06 Lucent Technologies Inc. Adaptive digital radio communication system
US6691148B1 (en) * 1998-03-13 2004-02-10 Verizon Corporate Services Group Inc. Framework for providing quality of service requirements in a distributed object-oriented computer system
US6175892B1 (en) * 1998-06-19 2001-01-16 Hitachi America. Ltd. Registers and methods for accessing registers for use in a single instruction multiple data system
US20020013937A1 (en) * 1999-02-17 2002-01-31 Ostanevich Alexander Y. Register economy heuristic for a cycle driven multiple issue instruction scheduler
US6510138B1 (en) * 1999-02-25 2003-01-21 Fairchild Semiconductor Corporation Network switch with head of line input buffer queue clearing
US6349394B1 (en) * 1999-03-31 2002-02-19 International Business Machines Corporation Performance monitoring in a NUMA computer
US6192040B1 (en) * 1999-04-16 2001-02-20 Motorola, Inc. Method and apparatus for producing channel estimate of a communication channel in a CDMA communication system
US6347346B1 (en) * 1999-06-30 2002-02-12 Chameleon Systems, Inc. Local memory unit system with global access for use on reconfigurable chips
US6507947B1 (en) * 1999-08-20 2003-01-14 Hewlett-Packard Company Programmatic synthesis of processor element arrays
US20020024993A1 (en) * 1999-12-30 2002-02-28 Ravi Subramanian Method and apparatus to support multi standard, multi service base-stations for wireless voice and data networks
US20020023210A1 (en) * 2000-04-12 2002-02-21 Mark Tuomenoksa Method and system for managing and configuring virtual private networks
US20020013799A1 (en) * 2000-05-11 2002-01-31 Blaker David M. Accelerated montgomery multiplication using plural multipliers
US20020010848A1 (en) * 2000-05-29 2002-01-24 Shoichi Kamano Data processing system
US6675265B2 (en) * 2000-06-10 2004-01-06 Hewlett-Packard Development Company, L.P. Multiprocessor cache coherence system and method in which processor nodes and input/output nodes are equal participants
US20020015435A1 (en) * 2000-07-31 2002-02-07 Keith Rieken Apparatus and method for configurable multi-dwell search engine for spread spectrum applications
US20040006584A1 (en) * 2000-08-08 2004-01-08 Ivo Vandeweerd Array of parallel programmable processing engines and deterministic method of operating the same
US20020024942A1 (en) * 2000-08-30 2002-02-28 Nec Corporation Cell search method and circuit in W-CDMA system
US20030012270A1 (en) * 2000-10-06 2003-01-16 Changming Zhou Receiver
US6985517B2 (en) * 2000-11-09 2006-01-10 Matsushita Electric Industrial Co., Ltd. Matched filter and correlation detection method
US20030030004A1 (en) * 2001-01-31 2003-02-13 General Electric Company Shared memory control between detector framing node and processor
US20030007606A1 (en) * 2001-02-01 2003-01-09 Estech Systems, Inc. Service observing in a voice over IP telephone system
US7325123B2 (en) * 2001-03-22 2008-01-29 Qst Holdings, Llc Hierarchical interconnect for configuring separate interconnects for each group of fixed and diverse computational elements
US6836839B2 (en) * 2001-03-22 2004-12-28 Quicksilver Technology, Inc. Adaptive integrated circuitry with heterogeneous and reconfigurable matrices of diverse and adaptive computational units having fixed, application specific computational elements
US20030018700A1 (en) * 2001-03-26 2003-01-23 Giroti Sudhir K. Unified XML voice and data media converging switch and application delivery system
US20030018446A1 (en) * 2001-06-29 2003-01-23 National Instruments Corporation Graphical program node for generating a measurement program
US20030023830A1 (en) * 2001-07-25 2003-01-30 Hogenauer Eugene B. Method and system for encoding instructions for a VLIW that reduces instruction memory requirements
US6986021B2 (en) * 2001-11-30 2006-01-10 Quick Silver Technology, Inc. Apparatus, method, system and executable module for configuration and operation of adaptive integrated circuitry having fixed, application specific computational elements
US20060031660A1 (en) * 2001-11-30 2006-02-09 Master Paul L Apparatus, method, system and executable module for configuration and operation of adaptive integrated circuitry having fixed, application specific computational elements
US20040015970A1 (en) * 2002-03-06 2004-01-22 Scheuermann W. James Method and system for data flow control of execution nodes of an adaptive computing engine (ACE)
US6988139B1 (en) * 2002-04-26 2006-01-17 Microsoft Corporation Distributed computing of a job corresponding to a plurality of predefined tasks
US20040025159A1 (en) * 2002-06-25 2004-02-05 Quicksilver Technology, Inc. Hardware task manager
US20040010645A1 (en) * 2002-06-25 2004-01-15 Quicksilver Technology, Inc. Uniform interface for a functional node in an adaptive computing engine
US7174432B2 (en) * 2003-08-19 2007-02-06 Nvidia Corporation Asynchronous, independent and multiple process shared memory system in an adaptive computing architecture

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070226433A1 (en) * 2002-11-22 2007-09-27 Quicksilver Technology, Inc. External memory controller node
US20080244197A1 (en) * 2002-11-22 2008-10-02 Qst Holdings, Llc External memory controller node
US7451280B2 (en) * 2002-11-22 2008-11-11 Qst Holdings, Llc External memory controller node
US7743220B2 (en) * 2002-11-22 2010-06-22 Qst Holdings, Llc External memory controller node
US20070130622A1 (en) * 2005-11-21 2007-06-07 Docomo Communications Laboratories Usa, Inc. Method and apparatus for verifying and ensuring safe handling of notifications
US20100036997A1 (en) * 2007-08-20 2010-02-11 Convey Computer Multiple data channel memory module architecture
WO2009026196A1 (en) * 2007-08-20 2009-02-26 Convey Computer Multi-processor system having at least one processor that comprises a dynamically reconfigurable instruction set
US9824010B2 (en) 2007-08-20 2017-11-21 Micron Technology, Inc. Multiple data channel memory module architecture
US20090055596A1 (en) * 2007-08-20 2009-02-26 Convey Computer Multi-processor system having at least one processor that comprises a dynamically reconfigurable instruction set
US9449659B2 (en) 2007-08-20 2016-09-20 Micron Technology, Inc. Multiple data channel memory module architecture
US9015399B2 (en) 2007-08-20 2015-04-21 Convey Computer Multiple data channel memory module architecture
US8156307B2 (en) 2007-08-20 2012-04-10 Convey Computer Multi-processor system having at least one processor that comprises a dynamically reconfigurable instruction set
US20090064095A1 (en) * 2007-08-29 2009-03-05 Convey Computer Compiler for generating an executable comprising instructions for a plurality of different instruction sets
US8561037B2 (en) 2007-08-29 2013-10-15 Convey Computer Compiler for generating an executable comprising instructions for a plurality of different instruction sets
US20090070553A1 (en) * 2007-09-12 2009-03-12 Convey Computer Dispatch mechanism for dispatching insturctions from a host processor to a co-processor
US8122229B2 (en) 2007-09-12 2012-02-21 Convey Computer Dispatch mechanism for dispatching instructions from a host processor to a co-processor
US11106592B2 (en) 2008-01-04 2021-08-31 Micron Technology, Inc. Microprocessor architecture having alternative memory access paths
US9710384B2 (en) 2008-01-04 2017-07-18 Micron Technology, Inc. Microprocessor architecture having alternative memory access paths
US8443147B2 (en) 2008-08-05 2013-05-14 Convey Computer Memory interleave for heterogeneous computing
US10949347B2 (en) 2008-08-05 2021-03-16 Micron Technology, Inc. Multiple data channel memory module architecture
US10061699B2 (en) 2008-08-05 2018-08-28 Micron Technology, Inc. Multiple data channel memory module architecture
US8095735B2 (en) 2008-08-05 2012-01-10 Convey Computer Memory interleave for heterogeneous computing
US20100037024A1 (en) * 2008-08-05 2010-02-11 Convey Computer Memory interleave for heterogeneous computing
US11550719B2 (en) 2008-08-05 2023-01-10 Micron Technology, Inc. Multiple data channel memory module architecture
US20100115233A1 (en) * 2008-10-31 2010-05-06 Convey Computer Dynamically-selectable vector register partitioning
US8205066B2 (en) 2008-10-31 2012-06-19 Convey Computer Dynamically configured coprocessor for different extended instruction set personality specific to application program with shared memory storing instructions invisibly dispatched from host processor
US20100318516A1 (en) * 2009-06-10 2010-12-16 Google Inc. Productive distribution for result optimization within a hierarchical architecture
CN102597979A (en) * 2009-06-10 2012-07-18 谷歌公司 Productive distribution for result optimization within a hierarchical architecture
WO2010144692A1 (en) * 2009-06-10 2010-12-16 Google Inc. Productive distribution for result optimization within a hierarchical architecture
US8423745B1 (en) 2009-11-16 2013-04-16 Convey Computer Systems and methods for mapping a neighborhood of data to general registers of a processing element
EP2526494A4 (en) * 2010-01-21 2017-02-01 SVIRAL, Inc. A method and apparatus for a general-purpose, multiple-core system for implementing stream-based computations
WO2011091323A1 (en) * 2010-01-21 2011-07-28 Qst Holdings, Llc A method and apparatus for a general-purpose, multiple-core system for implementing stream-based computations
US11055103B2 (en) 2010-01-21 2021-07-06 Cornami, Inc. Method and apparatus for a multi-core system for implementing stream-based computations having inputs from multiple streams
KR101814221B1 (en) 2010-01-21 2018-01-02 스비랄 인크 A method and apparatus for a general-purpose, multiple-core system for implementing stream-based computations
US20130086553A1 (en) * 2011-09-29 2013-04-04 Mark Grechanik Systems and methods for finding project-related information by clustering applications into related concept categories
US9804838B2 (en) 2011-09-29 2017-10-31 Accenture Global Services Limited Systems and methods for finding project-related information by clustering applications into related concept categories
US8832655B2 (en) * 2011-09-29 2014-09-09 Accenture Global Services Limited Systems and methods for finding project-related information by clustering applications into related concept categories
US9256422B2 (en) 2011-09-29 2016-02-09 Accenture Global Services Limited Systems and methods for finding project-related information by clustering applications into related concept categories
US10430190B2 (en) 2012-06-07 2019-10-01 Micron Technology, Inc. Systems and methods for selectively controlling multithreaded execution of executable code segments
US8959495B2 (en) 2012-09-14 2015-02-17 Oracle International Corporation Unifying static and dynamic compiler optimizations in source-code bases
US9417857B2 (en) 2012-09-14 2016-08-16 Oracle International Corporation Unifying static and dynamic compiler optimizations in source-code bases
US20140157247A1 (en) * 2012-11-30 2014-06-05 Oracle International Corporation Enabling Symbol Resolution of Private Symbols in Legacy Programs and Optimizing Access to the Private Symbols
US8881123B2 (en) * 2012-11-30 2014-11-04 Oracle International Corporation Enabling symbol resolution of private symbols in legacy programs and optimizing access to the private symbols
US10089426B2 (en) * 2013-12-12 2018-10-02 Tokyo Institute Of Technology Logic circuit generation device and method
US20160299998A1 (en) * 2013-12-12 2016-10-13 Tokyo Institute Of Technology Logic circuit generation device and method
CN106059528A (en) * 2016-06-12 2016-10-26 西安电子工程研究所 Length-variable single-rate FIR digital filter design method
CN110389991A (en) * 2018-04-12 2019-10-29 腾讯大地通途(北京)科技有限公司 The processing method that reports an error, device and the storage medium in map section

Also Published As

Publication number Publication date
US7200837B2 (en) 2007-04-03
US20050044344A1 (en) 2005-02-24

Similar Documents

Publication Publication Date Title
US7200837B2 (en) System, method and software for static and dynamic programming and configuration of an adaptive computing architecture
US7174432B2 (en) Asynchronous, independent and multiple process shared memory system in an adaptive computing architecture
US8010593B2 (en) Adaptive integrated circuitry with heterogeneous and reconfigurable matrices of diverse and adaptive computational units having fixed, application specific computational elements
US7353516B2 (en) Data flow control for adaptive integrated circuitry
US8296764B2 (en) Internal synchronization control for adaptive integrated circuitry
US11055103B2 (en) Method and apparatus for a multi-core system for implementing stream-based computations having inputs from multiple streams
US6732354B2 (en) Method, system and software for programming reconfigurable hardware
US7249242B2 (en) Input pipeline registers for a node in an adaptive computing engine
WO2006115635A2 (en) Automatic configuration of streaming processor architectures
US8949576B2 (en) Arithmetic node including general digital signal processing functions for an adaptive computing machine
CN101630274A (en) Method for dividing cycle task by means of software and hardware and device thereof
US11782760B2 (en) Time-multiplexed use of reconfigurable hardware
Manev Resource Elastic Dynamic Stream Processing on FPGAs Exemplified on Database Acceleration
Zhang et al. Design of coarse-grained dynamically reconfigurable architecture for DSP applications
Rettkowski et al. Application-specific processing using high-level synthesis for networks-on-chip
Theocharides et al. Hardware-enabled dynamic resource allocation for manycore systems using bidding-based system feedback
Ginosar The plural many-core architecture-high performance at low power
Wijtvliet et al. Concept of the Blocks Architecture
Gordon A stream-aware compiler for communication exposed-architectures
Lyons et al. Shrink-fit: A framework for flexible accelerator sizing
Keung et al. A placer for composable FPGA with 2D mesh network
Prashank et al. Enhancements for variable N-point streaming FFT/IFFT on REDEFINE, a runtime reconfigurable architecture
Szajek et al. Implementation of an adaptive reconfigurable group organized (ARGO) parallel architecture
CN116670644A (en) Interleaving processing method on general purpose computing core
CN117573607A (en) Reconfigurable coprocessor, chip, multi-core signal processing system and computing method

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION