What Is A Neural Network?

Work on artificial neural networks, commonly referred to as "neural networks," has been motivated right from its inception by the recognition that the human brain com putes in an entirely different way from the conventional digital computer. The brain is a highly complex, nonlinear, and parallel computer (information-processing system). It has the capability to organize its structural constituents, known as neurons, so as to perform certain computations (e.g., pattern recognition, perception, and motor con- trol) many times faster than the fastest digital computer in existence today. Consider, for example, human vision, which is an information-processing task (Marr, 1982; Levine, 1985; Churchland and Sejnowski, 1992). It is the function of the visual system to provide a representation of the environment around us and, more important, to sup ply the information we need to interact with the environment. To be specific, the brain routinely accomplishes perceptual recognition tasks (e.g., recognizing a familiar face embedded in an unfamiliar scene) in approximately 100—200 ms, whereas tasks of much lesser complexity may take days on a conventional computer.

For another example, consider the sonar of a bat. Sonar is an active echo-location system. In addition to providing information. About how far away a target (e.g., a flying insect) is, a bat sonar conveys information about the relative velocity of the target, the size of the target, the size of various features of the target, and the azimuth and elevation of the target (Suga, 1990a, b). 'The complex neural computations needed to extract all this information from the target echo occur within a brain the size of a plum. Indeed, an echo-locating bat can pursue and capture its target with a facilitv and suc cess rate that would be the envy of a radar or sonar engineer.

How, then, does a human brain or the brain of a bat do it? At birth, a brain has great structure and the ability to build up its own rules through what we usually refer to as "experience." Indeed, experience is built up over time, with the most dramatic development (i.e., hard-wiring) of the human brain taking place during the first two years from birth; but the development continues well beyond that stage.

A "developing" neuron is synonymous with a plastic brain: Plasticity permits the developing nervous system to adapt to its surrounding environment. Just as plasticity appears to be essential to the functioning of neurons as information-processing units in the human brain, so it is with neural networks made up of artificial neurons. In its most general form, a neural network is a machine that is designed to model the way in which the brain performs a particular task or function of interest; the network is usually implemented by using electronic components or is simulated in software on a digital computer. Our interest in this book is confined largely to an important class of neural networks that perform useful computations through a process of learning. To achieve good performance, neural networks employ a massive interconnection of simple com puting cells referred to as "neurons" or "processing units." We may thus offer the fol lowing definition of a neural network viewed as an adaptive machine:

A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it avail- able for use. It resembles the brain in two respects:

  1. Knowledge is acquired by the network from its environment through a learning process.
  2. Interneuron connection strengths, known as synaptic weights, are used to store the ac- quired knowledge.

The procedure used to perform the learning process is called a learning algo rithm, the function of which is to modify the synaptic weights of the network in an orderly fashion to attain a desired design objective.

The modification of synaptic weights provides the traditional method for the design of neural networks. Such an approach is the closest to linear adaptive filter the ory, which is already well established and successfully applied in many diverse fields (Widrow and Stearns, 1985; Haykin, 1996). However, it is also possible for a neural net- work to modify its own topology, which is motivated by the fact that neurons in the human brain can die and that new synaptic connections can grow.

Neural networks are also referred to in literature as neurocomputers, connection as networks, parallel distributed processors, etc. Throughout the book we use the term "neural networks"; occasionally the term "neurocomputer" or "connectionist net work" is used.

Benefits of Neural Networks

It is apparent that a neural network derives its computing power through, first, its mas- sively parallel distributed structure and. second, its ability to learn and therefore gen eralize. Generalization refers to the neural network producing reasonable outputs for inputs not encountered during training (learning). These two information-processing capabilities make it possible for neural networks to solve complex (large-scale) prob lems that are currently intractable. In practice, however, neural networks cannot pro vide the solution by working individually. Rather, they need to be integrated into a consistent system engineering approach. Specifically, a complex problem of interest is decomposed into a number of relatively simple tasks, and neural networks are assigned a subset of the tasks that match their inherent capabilities. It is important to recognize, however, that we have a long way to go (if ever) before we can build a computer archi tecture that mimics a human brain.

The use of neural networks offers the following useful properties and capabilities:

 1. Nonlinearity

An artificial neuron can be linear or nonlinear. A neural net work, made up of an interconnection of nonlinear neurons, is itself nonlinear. Moreover, the nonlinearity is of a special kind in the sense that it is distributed throughout the network. Nonlinearity is a highly important property, particularly if the underlying physical mechanism responsible for generation of the input signal (e.g., speech signal) is inherently nonlinear.

2. Input—Output Mapping

A popular paradigm of learning called learning with a teacher or supervised learning involves modification of the synaptic weights of a neural network by applying a set of labeled training samples or task examples. Each example consists of a unique input signal and a corresponding desired response. 'the network is presented with an example picked at random from the set, and the synaptic weights (free parameters) of the network are modified to minimize the difference between the desired response and the actual response of the network produced by the input signal in accordance with an appropriate statistical criterion. The training of the network is repeated for many examples in the set until the network reaches a steady state where there are no further significant changes in the synaptic weights. The previously applied training examples may be reapplied during the training session but in a different order. thus the network learns from the examples by constructing an input—output mapping for the problem at hand. Such an approach brings to mind the study of nonparametric statistical inference, which is a branch of statistics dealing with model-free estimation, or, from a biological viewpoint, tabula rasa learning (Geman et. al., 1992); the term "nonparametric" is used here to signify the fact that no prior assumptions are made on a statistical model for the input data. Consider, for example, a pattern classification task, where the requirement is to assign an input signal representing a physical object or event to one of several prespecified categories (classes). In a nonparametric approach to this problem, the requirement is to "estimate" arbitrary decision boundaries in the input signal space for the pattern-classification task using a set of examples, and to do so without invoking a probabilistic distribution model. A similar point of view is implicit in the supervised learning paradigm, which suggests a close analogy between the input—output mapping performed by a neural network and nonparametric statisti cal inference.

3. Adaptivity

Neural networks have a built-in capability to adapt their synaptic weights to changes in the surrounding environment. In particular, a neural network trained to operate in a specific environment can be easily retrained to deal with minor changes in the operating environmental conditions. Moreover, when it is operating in a nonstationary environment (i.e., one where statistics change with time), a neural net- work can be designed to change its synaptic weights in real time. The natural architec ture of a neural network for pattern classification, signal processing, and control applications, coupled with the adaptive capability of the network, make it a useful tool in adaptive pattern classification, adaptive signal processing, and adaptive control. As a general rule, it may be said that the more adaptive we make a system, all the time ensuring that the system remains stable, the more robust its performance will likely be when the system is required to operate in a nonstationary environment. It should be emphasized, however, that adaptivitv does not always lead to robustness; indeed, it may do the very opposite. For example, an adaptive system with short time constants may change rapidly and therefore tend to respond to spurious disturbances, causing a drastic degradation in system performance. To realize the full benefits of adaptivity, the principal time constants of the system should be long enough for the system to ignore spurious disturbances and yet short enough to respond to meaningful changes in the environment; the problem described here is referred to as the stability—plasticity dilemma (Grossberg, 1988b).

4. Evidential Response

In the context of pattern classification, a neural network can be designed to provide information not only about which particular pattern to select, but also about the confidence in the decision made. this latter information may be used to reject ambiguous patterns, should they arise, and thereby improve the classi fication performance of the network.

5. Contextual Information

Knowledge is represented by the very structure and activation state of a neural network. Every neuron in the network is potentially affected by the global activity of all other neurons in the network. Consequently, con- textual information is dealt with naturally by a neural network.

6. Fault Tolerance

A neural network, implemented in hardware form, has the potential to be inherently fault tolerant, or capable of robust computation, in the sense that its performance degrades gracefully under adverse operating conditions. For example, if a neuron or its connecting links are damaged, recall of a stored pat- tern is impaired in quality. However, due to the distributed nature of information stored in the network, the damage has to be extensive before the overall response of the network is degraded seriously. 'Thus, in_ principle, a neural network exhibits a graceful degradation in performance rather than catastrophic failure. There is some empirical evidence for robust computation, but usually it is uncontrolled. In order to be assured that the neural network is in fact fault tolerant, it may be necessary to take corrective measures in designing the algorithm used to train the network (Kerlirzin and Vallet, 1993).

7. VLSI Implementability

The massively parallel nature of a neural network makes it potentially fast for the computation of certain tasks. This same feature makes a neural network well suited for implementation using very-large-scale-integrated (VLSI) technology. One particular beneficial virtue of VLSI is that it provides a means of capturing truly complex behavior in a highly hierarchical fashion (Mead, 1989).

8. Uniformity of Analysis and Design 

Basically, neural networks enjoy universal ity as information processors. 'We say this in the sense that the same notation is used in all domains involving the application of neural networks. This feature manifests itself in different ways:

  • Neurons, in one form or another, represent an ingredient common to all neural networks.
  • This commonality makes it possible to share theories and learning algorithms in different applications of neural networks.
  • Modular networks can be built through a seamless integration of modules.

9. Neurobiological Analogy

The design of a neural network is motivated by analogy with the brain, which is a living proof that fault tolerant parallel processing is not only physically possible but also fast and powerful. Neurobiologists look to (arti ficial) neural networks as a research tool for the interpretation of neurobiological phenomena. On the other hand, engineers look to neurobiology for new ideas to solve problems more complex than those based on conventional hard-wired design techniques. These two viewpoints are illustrated by the following two respective examples:

  • In Anastasio (1993), linear system models of the vestibulo-ocular reflex are com- pared to neural network models based on recurrent networks that are described in Section 1.6 and discussed in detail in Chapter 15. The vestibulo-ocular reflex (VOR) is part of the oculomotor system. The function of VOR is to maintain visual (i.e., retinal) image stability by making eye rotations that are opposite to head rotations. The VOR is mediated by premotor neurons in the vestibular nuclei that receive and process head rotation signals from vestibular sensory neu rons and send the results to the eye muscle motor neurons. The VOR is well suited for modeling because its input (head rotation) and its output (eye rota tion) can be precisely specified. It is also a relatively simple reflex and the neuro physiological properties of its constituent neurons have been well described. Among the three neural types, the premotor neurons (reflex interneurons) in the vestibular nuclei are the most complex and therefore most interesting. The VOR has previously been modeled using lumped, linear system descriptors and control theory. These models were useful in explaining some of the overall properties of the VOR, but gave little insight into the properties of its constituent neurons. This situation has been greatly improved through neural network modeling. Recurrent network models of VOR (programmed using an algorithm called real-time recur- rent learning that is described in Chapter 15) can reproduce and help explain many of the static, dynamic, nonlinear, and distributed aspects of signal process ing by the neurons that mediate the VOR, especially the vestibular nuclei neu rons (Anastasio, 1993.
  • The retina, more than any other part of the brain, is where we begin to put together the relationships between the outside world represented by a visual sense, its physical image projected onto an array of receptors, and the first neural images. The retina is a thin sheet of neural tissue that lines the posterior hemi sphere of the eyeball. The retina 's task is to convert an optical image into a neural image for transmission down the optic nerve to a multitude of centers for further analysis. This is a complex task, as evidenced by the synaptic organization of the retina. In all vertebrate retinas the transformation from optical to neural image involves three stages (Sterling, 1990):

    1. Photo transduction by a layer of receptor neurons.
    2. Transmission of the resulting signals (produced in response to light) by chemical synapses to a layer of bipolar cells.
    3. Transmission of these signals, also by chemical synapses, to output neurons that are called ganglion cells,

At both synaptic stages (i.e., from receptor to bipolar cells, and from bipolar to ganglion cells), there are specialized laterally connected neurons called horizon tal cells and amacrine cells, respectively. The task of these neurons is to modify the transmission across the synaptic layers. There are also centrifugal elements called inter-plexiform cells; their task is to convey signals from the inner synaptic layer back to the outer one. A few researchers have built electronic chips that mimic the structure of the retina (Mahowald and Mead, 1989; Boahen and Ardreou, 1992; Boahen, 1996). 'These electronic chips are called neuromorphic integrated circuits, a term coined by Mead (1989).  A neuromorphic imaging sen sor consists of an array of photoreceptors combined with analog circuitry *at each picture element _ (pixel). It emulates the retina in that it can adapt locally to changes in brightness, detect edges, and detect motion. The neurobiological anal- ogy, exemplified by neuromorphic integrated circuits is useful in another impor tant way: It provides a hope and belief, and to a certain extent an existence of proof, that physical understanding of neurobiological structures could have a productive influence on the art of electronics and VLSI technology.

With inspiration from neurobiology in mind, it seems appropriate that we take a brief look at the human brain and its structural levels of organization.