Subject Index

Fundamentals of Artificial Neural Networks
Mohamad H. Hassoun
MIT Press


A

a-LMS rule, 67
m-LMS rule, 67-69, 72
f--general positon, 20-22, 25
f-mapping, 20
f-separable, 20

f-space, 20-21, 25
f-surface, 20, 24-25
A analog-to-digital (A/D) convertor, 398, 413-414
activation function, 219 activity pattern, 433
activation slope, 220, 332
ADALINE, 66, 85
adaptive linear combiner element, 66
adaptive resonance theory (ART) networks, 323-328
adjoint net, 270
admissible pattern, 7
AHK see Ho-Kashyap learning rules
AI, 51 algorithmic complexity, 51
ALVINN, 244-246
AND, 3
ambiguity, 24, 182
ambiguous response, 26
analog optical implementations, 53
analog VLSI technology, 53
annealing
see also simulated annealing anti-Hebbian learning, 100
approximation
architecture
ART, 323-328
association
associative memory, 271, 345
associative reward-penalty, 89
asymptotic stability, 268, 358
asymptotically stable points, 148
attractor state, 265, 270, 354, 366, 375
autoassociative net, 248
autoassociative clustering net, 328
autocorrelation matrix, 71, 91, 97-98, 150
automatic scaling, 325
average entropic error, 186
average generalization ability, 183
average learning equation, 147, 176
average prediction error, 183

B

backprop, 199-202, 234, 271, 455


backprop net, 318
backpropagation, 199-202
basin of attration, 328
basis function, 287
batch mode/update, 63, 81, 90, 167, 172, 290
Bayes decision theory, 112
Bayes classifier, 310
Bayes' rule, 434
bell-shaped function, 49
Bernstein polynomials, 49
bias, 58, 197, 308, 354, 376
bidirectional associative memory (BAM), 393
binary representation, 398
binomial
bit-wise complementation, 441
boarder aberration effects, 116
Boltzmann machine, 431-432
Boltzmann constant, 422, 426
Boltzmann-Gibbs distribution, 426, 431-432
Boltzmann learning, 431
Boolean functions, 3-4, 35, 42, 188
bottleneck, 250
brain-state-in-a-box (BSB) model, 331, 375-381
building block hypothesis, 446
building blocks, 446
Butz's rule, 63

C

calculus of variations, 272
calibration, 107

capacity, 17, 29, 380
cascade-correlation net (CCN), 318-322
CCN, 318-322
center of mass, 105
central limit theorem, 364, 436
cerebeller model articulation controller (CMAC), 301-304
cerebellum, 301
chaos hypothesis, 416
Chebyshev polynomial, 49
classifier, 50, 306, 311
classifier system, 461
classifiers, 461
cluster membership matrix, 167
cluster granuality, 331, 334
clusters, 107, 125, 168, 288
clustering, 106, 171, 328
CMAC, 301-304, 306
codebook, 110
combinatorial complexity, 395
combinatorial optimization, 120, 429
compact representation, 311
competitive learning, 103, 167, 290
complexity, 51
computational complexity, 269
computational energy, 52, 357
concept forming cognitive model, 328
conditional probability density, 85
conjugate gradient method, 217-218
connections
constraints satisfaction term, 396
controller, 264
convergence in the mean, 69
convergence phase, 116
convergence-inducing process, 424
convex, 70
cooling schedule, 427-428
correlation learning rule, 76, 148
correlation matrix, 91, 102, 346
correlation memory, 346
cost function, 143, 395, 397
see also criterion function
cost term, 396
Coulomb potential, 312
covariance, 320
critic, 88
critical features, 306-308
cross-correlations, 91
cross-correlation vector, 71
crossing site, 441
crossover see genetic operators
cross-validation, 187, 226-230, 290
criterion function, 58, 63, 68, 127-133, 143, 145, 155, 195, 230-234
criterion functional, 271
critical overlap, 384, 386
curve fitting, 221
Cybenko's theorem, 47
training cycle, 59
training pass, 59

D

DAM, 353-374

data compression, 109
dead-zone, 62
DEC-talk, 236
decision
deep net, 318
degrees of freedom, 21, 27, 29, 332, 425
tapped-delay lines, 254
delta learning rule, 88, 199, 289, 291, 455, 459
density-preserving feature, 173
desired associations, 346
desired response, 268
deterministic annealing, 369
deterministic unit, 87
device physics, 52
diffusion process, 421
dichotomy, 15, 25, 43
dimensionality
direct Ho-Kashyap (DHK) algorithm, 80
discrete-time states, 274
distributed representation, 250, 308-309, 323, 333
distribution
don't care states, 12
dynamic associative memory
see DAM
dynamic mutation rate, 443
dynamic slope, 332

E

eigenvalue, 92, 361, 376
eigenvector extraction, 331
elastic net, 120
electromyogram (EMG) signal, 334
EMG, 334-337
encoder, 247
energy function, 357, 362, 376, 393, 426, 420, 432
entropic loss function, 186
entropy, 433
environment
ergodic, 147
error-backpropagation network
see backpropagation
error function, 70, 143, 199, 268
see also criterion function
error function, erf(x), 364
error rate, 348
error suppressor function, 232
estimated target, 201
Euclidean norm, 288
exclusive NOR
see XNOR
exclusive OR, 5
see XOR
expected value, 69
exploration process, 424
exponential decay laws, 173
extrapolation, 295
extreme inequalities, 189
extreme points, 27-29
F

false-positive classification error, 294-295
feature map(s), 171, 241
Feigenbaum time-series, 278
Fermat's stationarity principle, 418
filter
filtering,
finite difference approximation, 272
fitness, 440
fitness function, 439, 457
fixed point method, 360
fixed points,
flat spot, 77, 86, 211, 219, 220, 231
forgetting
free parameters, 58, 288, 295, 311, 354
function approximation, 46, 294, 299, 319
function counting theorem, 17
function decomposition, 296
fundamental theorem of genetic algorithms, 443

G

GA see genetic algorithm
GA-assisted supervised learning, 454
GA-based learning methods, 452
GA-deceptive problems, 446
gain, 357
Gamba perceptron, 308
Gaussian-bar unit
see units Gaussian distribution
see distribution Gaussian unit
see unit general learning equation, 145
general position, 16-17, 41
generalization, 24, 145, 180, 186, 221, 226, 243, 295
generalized inverse, 70, 291
see also pseudo-inverse
genetic algorithm (GA), 439-447
genetic operators, 440, 442
global
Glove-Talk, 236-240
gradient descent search, 64, 200, 202, 418-419, 288, 300, 418-419
gradient-descent/ascent startegy, 420
gradient net, 394-399
gradient system, 148, 358, 376, 396-397
Gram-Schmidt orthogonalization, 217
Greville's theorem, 351
GSH model, 307, 310
guidance process, 424

H

Hamming Distance, 371
handwritten digits, 240
hard competition, 296-297
hardware annealing, 396, 438
Hassoun's rule, 161
Hebb rule, 91
Hermitian matrix, 91
Hessian matrix, 64, 70, 98, 148-149, 212, 215, 418
hexagonal array, 124
hidden layer, 197-198, 286
hidden targets, 455
hidden-target space, 455-456, 458
hidden units, 198, 286, 431
higher-order statistics, 101, 333
higher-order unit, 101, 103
Ho-Kashyap algorithm, 79, 317
Ho-Kashyap learning rules, 78-82
Hopfield model/net, 354, 396-397, 429
hybrid GA/gradient search, 453-454
hybrid learning algorithms, 218, 453
hyperbolic tangent activation function
see activation function
hyperboloid, 316
hypercube, 302, 357, 376, 396
hyperellipsoid, 316
hyperplane, 5, 16-17, 25, 59, 311
hypersurfaces, 311
hyperspheres, 315
hyperspherical classifiers, 311
hysteresis, 386-387
I

ill-conditioned, 292
image compression, 247-252
implicit parallelism, 447
incremental gradient descent, 64, 202
incremental update, 66, 88, 202
input sequence, 261
input stimuli, 174
instantaneous error function, 268
instantaneous SSE criterion function, 77, 199
interconnection
interpolation, 223, 290
Ising model, 429
isolated-word recognition, 125

J

Jacobian matrix, 149
joint distribution, 85

K

Karnaugh map (K-map), 6, 36
K-map technique, 37
k-means clusterting, 167, 290
k-nearest neighbor classifier, 204, 310, 318
Karhunen-Lo‚ve transform, 98
see also principal component analysis
kernel, 287
key input pattern, 347-348
kinks, 172-173
Kirchoff's current law, 354
Kohonen's feature map, 120, 171
Kolmogorov's theorem, 46-47
Kronecker delta, 268

L

Lagrange multiplier, 145, 272
LAM
see associative memory Langevin-type learning, 424
Laplace-like distribution, 85
lateral connections, 101
lateral inhibition, 103
lateral weights, 173
leading eigenvector, 97
learning
learning curve, 183-184, 186
learning rate/coefficient, 59, 71, 173
learning rule, 57
learning vector quantization, 111
least-mean-square (LMS) solution, 72
Levenberg-Marquardt optimization, 218
Liapunov
limit cycle, 362, 367, 371, 378
linear array, 120
linear associative memory, 346
linear matched filter, 157
linear programming, 301, 316-317
linear threshold gate (LTG), 2, 304, 429
linear separability, 5, 61, 80
linear unit, 50, 68, 99, 113, 286, 346
linearly dependent, 350
linearly separable mappings, 188, 455
Linsker's rule, 95-97
LMS rule, 67-69, 150, 304, 330
local
local fit, 295
local maximum, 64, 420
local minimum, 109, 203, 417
local property, 157
locality property, 289
locally tuned
log-likelihood function, 85
logic sell array, 304
logistic function, 77, 288
also see activation function
lossless amplifiers, 397
lower bounds, 41, 43
LTG, 2, 304
LTG-realizable, 5
LVQ, 111

M

Mackey-Glass time-series, 281, 294, 322
Manhattan norm, 84
matrix differential calculus, 156
margin, 62, 81
MAX operation, 442
max selector, 407
maximum likelihood, 84
Mays's rule, 66
McCulloch-Pitts unit
see linear threshold gate
medical diagnosis expert net, 246
mean-field
mean transitions, 436
mean-valued approximation, 436
memorization, 222
memory
see associative memory
memory vectors, 347
Metropolis algorithm, 426-427
minimal disturbance principle, 66
minimal PTG realization, 21-24
minimum Euclidean norm solution, 350
minimum energy configuration, 425
minimum MSE solution, 72
minimum SSE solution, 71, 76, 144, 150, 291
Minkowski-r criterion function, 83, 230
see criterion functions
Minkowski-r weight update rule, 231- 232
minterm, 3
misclassified, 61, 65
momentum, 213-214
motor unit,
moving-target problem, 318
multilayer feedforward networks
see architecture
multiple point crossover, 451, 456
multiple recording passes, 352
multipoint search strategy, 439
multivatiate function, 420
MUP, 337
mutation
see genetic operators

N

NAND, 6
natural selection, 439
neighborhood function, 113, 171, 173, 177-180
NETtalk, 234-236
neural field, 173
neural network architecture
see architecture
neural net emulator, 264
Newton's method, 64, 215-216
nonlinear activation function, 102
nonlinear dynamical system, 264
nonlinear repreasentations, 252
nonlinear separability, 5, 63, 80
nonlinearly separable
nonstationary process, 154
nonstationary, input distribution, 326
nonthreshold function, 5, 12
NOR, 6
nonlinear PCA, 101, 332
NP-complete,
normal distribution, 84
nonuniversality, 307
Nyquist's sampling criterion, 292

O

objective function, 63, 143, 395, 417, 419
see also criterion function
off-line training, 272
Oja's rule
Oja's unit, 157
OLAM
see associative memory
on-center off-surround, 173-174
on-line classifier, 311
on-line implementations, 322
on-line training, 295
optical interconnections, 53
optimal learning step, 212, 279
optimization, 417, 439
OR, 3
ordering phase, 116
orthonormal vectors, 347
orthonormal set, 15
outlier data (points), 168, 231, 317 overfitting, 145, 294, 311, 322
overlap, 371, 382
overtraining, 230

P

parity function, 35-37, 308, 458
partial recurrence, 264
partial reverse dynamics, 383
partial reverse method, 385
partition of unity, 296
pattern completion, 435
pattern ratio, 349, 367, 370
pattern recognition, 51
PCA net, 99, 163, 252
penalty term, 225
perceptron criterion function, 63, 65, 86
perceptron learning rule, 58-60, 304
perfect recall, 347, 350
phase diagram, 367
phonemes, 235
piece-wise linear operator, 376
phonotopic map, 124
plant identification, 256, 264
Polack Ribi‚re rule, 217
polynomial, 15, 144, 224
polynomial complexity, 318, 395
polynomial threshold gate (PTG), 8, 287, 306
ploynomial-time classifier (PTC), 316-318
population, 440
positive definite, 70, 149, 361, 418
positive semidefinite, 150
postprocessing, 301
potential
power dissipation, 52
power method, 92, 212, 330
prediction error, 85
prediction set, 230
premature convergence, 428
premature saturation, 211
preprocessing, 11
principal component(s), 98, 101, 252
principal directions, 98
principal eigenvector, 92, 164
principal manifolds, 252
probability of ambiguous response, 26-27
prototype extraction, 328
prototype unit, 322, 324
pruning, 225, 301
pseudo-inverse, 70, 79, 353
pseudo-orthogonal, 347
PTC net, 316-318
PTG
see polynomial threshold gate

Q

QTG
see quadratic threshold gate
quadratic form, 155, 361
quadratic function, 357
quadratic threshold gate (QTG), 7, 316
quadratic unit, 102
quantization, 250
quickporp, 214

R

radial basis functions, 285-287
radial basis function (RBF) network, 286-294
radially symmetric function, 287
radius of attraction, 390
random motion, 422
random problems, 51
Rayleigh quotient, 155
RBF net, 285, 294
RCE, 312-315
RCE classifier, 315
real-time learning, 244
real-time recurrent learning (RTRL) method, 274-275
recall region, 367
receptive field, 287, 292
recombination mechanism, 440
reconstruction vector, 109
recording recipe, 346
recurrent backpropagation, 265-271
recurrent net, 271 , 274
see also architecture
region of influence, 311, 313, 316
regression function, 73
regularization
reinforcement learning, 57, 87
reinforcement signal, 88, 90, 165 relative entropy error measure, 85, 230, 433
see also criterion function
relaxation method, 359
repeller, 207
replica method, 366
representation layer, 250
reproduction
see genetic operators
resonance, 325
see also adaptive resonance theory
resting potential, 174
restricted Coulomb energy (RCE) network, 312-315
retinotopic map, 113
retrieval
Riccati differential equation, 179
RMS error, 202
robust decision surface, 78
robust regression, 85
robustness, 62
robustness preserving feature, 307, 309
root-mean-square (RMS) error, 202
Rosenblatt's perceptron, 304
roulette wheel method, 440, 442
row diagonal dominant matrix, 379

S

saddle point, 358
Sanger's PCA net, 100
Sanger's rule, 99, 163
search space, 452
self-connections, 388
self-coupling terms, 371
self-scaling, 325
self-stabilization property, 323
sensitized units, 106
sensor units, 302
scaling
see computational complexity
schema (schemata), 443, 447
search,
second-order search method, 215
secong-order statistics, 101
self-organization, 112
self-organizing
semilocal activation, 299
sensory mapping,
separating capacity, 17
separating hyperplane, 18
separating surface, 21
sequence
shortcut connections, 321
sigmoid function, 48, 144
see also activation functions
sign function
see activation functions
signal-to-noise ratio, 375, 402
similarity
simulated annealing, 425-426
single-unit training, 58
Smoluchowski-Kramers equation, 422
smoothness regularization, 296
SOFM, 125
see also self-organizing feature map
soft competition, 296, 298
soft weight sharing, 233
solution vector, 60
somatosensory map, 113, 120
space-filling curve, 120
sparse binary vectors, 409
spatial associations, 265
specialized associations, 352
speech processing, 124
speech recognition, 125, 255
spin glass, 367
spin glass region. 367
spurious cycle, 393
spurious memory
see associative memory
SSE
see criterion function
stability-plasticity dilemma, 323
stable categorization, 323
state space, 375
state variable, 263
static mapping, 265
statistical mechanics, 425, 429
steepest gardient descent method, 64
Sterling's approximation, 42
stochastic
Stone-Weierstrass theorem, 48
storage capacity
see capacity
storage recipes, 345
strict interpolation, 291
string, 439
strongly mixing process, 152, 176, 179
supervised learning, 57, 88, 455
also see learning rules
sum-of-products, 3
sum of squared error, 68
superimposed patterns, 328
survival of the fittest, 440
switching
symmetry-breaking mechanism, 211
synapse, 52
synaptic efficaces, 90
synaptic signal
synchronous dynamics
see dynamics

T

Taken's theorem, 256
tapped delay lines, 254-259
teacher forcing, 275
temperature, 422, 425
templates, 109
temporal association, 254 , 259, 262-265, 391
temporal associative memory, 391
temporal learning, 253- 275
terminal repeller, 207
thermal energy, 426
thermal equilibrium, 425, 434, 437
threshold function, 5
threshold gates, 2
threshold logic, 1, 37
tie breaker factor, 326
time-delay neural network, 254-259
time-series prediction, 255, 273
topographic map, 112
topological ordering, 116
trace, 351
training error, 185, 227
training with rubbish, 295
training set, 59, 144, 230
transition probabilities, 426
travelling salesman problem (TSP), 120
truck backer-upper problem, 263
truth table, 3
TSP
see travelling salesman problem
tunneling, 206, 208, 421
Turing machine, 275
twists, 172-173
two-spiral problem, 321

U

unfolding, 261
unit
unit allocating net, 310, 318
unit elimination, 221
unity function, 296
universal approximation, 198, 221, 290, 304, 454
universal approximator, 15, 48, 287, 308
universal classifier, 50
universal logic, 36
unsupervised learning, 57, 90
update
nous, 360
upper bound, 43

V

validation error, 226
validation set, 227, 230
Vandermonde's determinant, 32
VC dimension, 185
vector energy, 60, 352
vector quantization, 103, 109-110
Veitch diagram, 6
vigilance parameter, 325-328, 333
visible units, 431
VLSI, 304
Voronoi
vowel recognition, 298

W

Weierstrass's approximation theorem, 15
weight, 2
weight-sharing interconnections, 240
weighted sum, 2-3, 288, 382, 428
Widrow-Hoff rule, 66, 330
Wiener weight vector, 72
winner-take-all, 103
winner unit, 113, 324

X

XNOR, 14
XOR, 5

Y

Yuille et al. rule, 92, 158

Z

ZIP code recognition, 240-243

Back to Main Menu