9.1, 9.2, 9.3, 9.4 and 9.5

9.1 Introduction

Motivation

Benchmark: Human brains are exceptionally efficient and powerful for learning and classification tasks.
Goal: Design machine learning tools that simulate or mimic the functionality of human brains.
Supporting Evidence: Encouraged by research findings in cognitive science and biology regarding the human brain.

Human Brain Structure & Function

Scale: Composed of tens of billions ( $10$ s of billions) of neurons.
Interconnection: Neurons form a highly sophisticated, interconnected network.
Organization: Neurons are organized into functional units or regions specialized for tasks like:
- Visual processing
- Auditory processing
- Motion control
- Reasoning
- Speech, etc.

Individual Biological Neuron Function (Reference: Fig. 9.1)

Inputs: Received through dendrites.
Processing: Occurs in the cell body.
Output Transmission: Signal sent out to other neurons via the axon.
Input Types: Inputs can be:
- Excitatory: Tend to activate the neuron.
- Inhibitory: Tend to prevent the neuron from activating.
Activation Condition:
- If (Total Excitatory Input > Total Inhibitory Input): The neuron is activated, and a signal is transmitted through the axon.
- Otherwise: No signal is generated.
Figure 9.1 Description: Shows a biological neuron with a central Cell body, branching Dendrites receiving inputs, a long Axon transmitting the output, and Terminals at the end of the axon to connect to other neurons.

Layered Organization in the Brain

Structure: Neurons in many brain regions are organized into layers, forming a layered network.
Connectivity: Neurons typically receive inputs from neurons in an adjacent layer.
Directionality: Connections between layers are often unidirectional, flowing from:
- Low-level sensory layers (e.g., eyes, ears)
- To higher-level coordination and reasoning layers.

From Biology to Artificial Networks

These understandings of human brain structure and neuron function provide the foundation for designing:
- Artificial Neurons: Simplified computational models of biological neurons.
- Artificial Neural Networks (ANNs): Networks composed of interconnected artificial neurons.

9.2 Artificial Neurons

Modeling the Neuron

The design of an ANN starts with modeling the artificial neuron.
A neuron is fundamentally a unit that receives inputs and generates an output.

Mapping Biological to Electronic Components

Biological Neuron Components:
1. Inputs (Dendrites)
2. Activation/Processing Unit (Cell Body)
3. Output (Axon)
Electronic Representation:
1. Inputs $\to$ A set of input values $x_{i}$ .
2. Activation/Processing $\to$ A weighted sum of inputs ( $Σ$ ) and a threshold ( $T$ ).
3. Output $\to$ An output signal based on the threshold comparison.
Figure 9.2 Description: Illustrates the alignment. Dendrites map to inputs ( $x_{1}, ..., x_{n}$ ) with associated weights ( $w_{1}, ..., w_{n}$ ). The Cell Body maps to the summation unit ( $Σ$ ) and an activation/threshold mechanism (represented by a step function symbol $\int$ ). The Axon maps to the output line, leading to Terminals.

Simplified Artificial Neuron Model (Reference: Fig. 9.3)

The weighted sum ( $Σ$ ) and the thresholding operation ( $T$ ) are typically merged into a single activation unit.
The axon is conceptually replaced by the output signal ( $y$ ).
Figure 9.3 Description: Shows inputs $x_{1}, ..., x_{n}$ with weights $w_{1}, ..., w_{n}$ feeding into an “Activation” block. This block performs the weighted sum and compares it to the threshold, indicated by the label " $Σ > T ?$ ". The output is a single signal $y$ .

Functionality: Binary Linear Classifier

An artificial neuron, in this basic form, acts as a binary linear classifier.
Process:
1. Given an input vector $x = (x_{1}, x_{2}, ..., x_{n})$ .
2. Calculate the weighted sum $Σ$ .
3. Compare $Σ$ with a threshold $T$ .

Mathematical Formulation

Weighted Sum Calculation (Eq. 9.1): $Σ = w_{1} x_{1} + w_{2} x_{2} + \dots + w_{n} x_{n}$ The condition for activation is $Σ > T$ .
Activation Rule / Output Generation (Eq. 9.2): $y = {10 if Σ > T if Σ < T$ (Note: The behavior at $Σ = T$ is often undefined or assigned to one class).
Combining Sum and Threshold: It’s convenient to combine $Σ$ and $T$ into a single decision function $D$ .
Decision Function (Eq. 9.3): $D = Σ - T = w_{1} x_{1} + w_{2} x_{2} + \dots + w_{n} x_{n} - T$ Rearranging: $D = - T + w_{1} x_{1} + w_{2} x_{2} + \dots + w_{n} x_{n}$
Bias Term Notation: Introduce a constant input $x_{0} = 1$ and a corresponding weight $w_{0} = - T$ . $w_{0}$ is often called the bias.
Decision Function with Bias (Eq. 9.4): $D = w_{0} x_{0} + w_{1} x_{1} + w_{2} x_{2} + \dots + w_{n} x_{n}$
Revised Activation Rule (Eq. 9.5): The output $y$ is determined by the sign of $D$ . $y = {10 if D > 0 if D < 0$

Applications

This simple artificial neuron model can perform basic linear classifications, including simulating logic gates like AND, OR, NAND, NOR.

9.2.1 An AND Neuron

AND Function

The logical AND function outputs $1$ only if all inputs are $1$ .
Truth Table (Table 9.1):

Input $x_{1}$	Input $x_{2}$	Weighted Sum ( $Σ = x_{1} + x_{2}$ )	Output $y$
$0$	$0$	$0$	$0$
$0$	$1$	$1$	$0$
$1$	$0$	$1$	$0$
$1$	$1$	$2$	$1$

Implementing AND with a Neuron

Weights: Let $w_{1} = 1$ and $w_{2} = 1$ . The weighted sum is $Σ = 1 \cdot x_{1} + 1 \cdot x_{2} = x_{1} + x_{2}$ .
Threshold Selection: Choose a threshold $T$ such that only the input $(1, 1)$ (where $Σ = 2$ ) activates the neuron. A value like $T = 1.5$ works (any $T$ where $1 < T \leq 2$ ).
Decision Boundary: Using the $D > 0$ formulation (Eq 9.4, 9.5) with $w_{0} = - T = - 1.5$ , $w_{1} = 1$ , $w_{2} = 1$ : $D = - 1.5 \cdot x_{0} + 1 \cdot x_{1} + 1 \cdot x_{2}$ With $x_{0} = 1$ , the condition for output $y = 1$ is: $- 1.5 + x_{1} + x_{2} > 0$
Hyperplane / Separating Line (Eq. 9.7): The boundary between the two output classes is where $D = 0$ : $- 1.5 + x_{1} + x_{2} = 0$ (The text’s Eq 9.6, $y = - 1.5 + x_{1} + x_{2}$ , likely refers to the value before thresholding or the equation of the separating line itself, not the final binary output $y$ ).

Visual Representation

Figure 9.4 Description: Shows a 2D plot with axes $x_{1}$ and $x_{2}$ . The four input points $(0, 0), (0, 1), (1, 0), (1, 1)$ are plotted. Points $(0, 0), (0, 1), (1, 0)$ are labeled with output $0$ . Point $(1, 1)$ is labeled with output $1$ . The line $- 1.5 + x_{1} + x_{2} = 0$ is drawn, correctly separating $(1, 1)$ from the other points.
Figure 9.5 Description: Depicts the AND neuron structure. Inputs $x_{1}$ and $x_{2}$ connect with weights $w_{1} = 1$ and $w_{2} = 1$ . A bias input $x_{0} = 1$ (implicitly understood or sometimes drawn) connects with weight $w_{0} = - 1.5$ . These feed into the neuron body (circle), which contains the threshold logic (represented by $- 1.5$ inside or associated with it). The output is $y$ .

9.2.2 An OR Neuron

OR Function

The logical OR function outputs $1$ if at least one input is $1$ .
Truth Table (Implied):

Input $x_{1}$	Input $x_{2}$	Weighted Sum ( $Σ = x_{1} + x_{2}$ )	Output $y$
$0$	$0$	$0$	$0$
$0$	$1$	$1$	$1$
$1$	$0$	$1$	$1$
$1$	$1$	$2$	$1$

Implementing OR with a Neuron

Weights: Again, let $w_{1} = 1$ and $w_{2} = 1$ . The weighted sum is $Σ = x_{1} + x_{2}$ .
Threshold Selection: Choose $T$ such that inputs $(0, 1), (1, 0), (1, 1)$ (where $Σ \geq 1$ ) activate the neuron, but $(0, 0)$ (where $Σ = 0$ ) does not. A value like $T = 0.5$ works (any $T$ where $0 < T \leq 1$ ).
Decision Boundary: Using $w_{0} = - T = - 0.5$ , $w_{1} = 1$ , $w_{2} = 1$ : $D = - 0.5 + x_{1} + x_{2}$ The condition for output $y = 1$ is: $- 0.5 + x_{1} + x_{2} > 0$
Hyperplane / Separating Line: $- 0.5 + x_{1} + x_{2} = 0$

Visual Representation

Figure 9.6 Description: Shows a 2D plot similar to Fig 9.4, but for the OR function. Point $(0, 0)$ is labeled with output $0$ . Points $(0, 1), (1, 0), (1, 1)$ are labeled with output $1$ . The line $- 0.5 + x_{1} + x_{2} = 0$ is drawn, correctly separating $(0, 0)$ from the other points.
Figure 9.7 Description: Depicts the OR neuron structure. Inputs $x_{1}, x_{2}$ connect with weights $w_{1} = 1, w_{2} = 1$ . The bias weight is $w_{0} = - 0.5$ . The threshold logic inside the neuron circle is represented by $- 0.5$ . The output is $y$ .

Terminology

An artificial neuron is also called a node in an ANN and is typically represented by a circle.

9.3 Perceptron

Limitations of Fixed Neurons

The AND/OR neurons discussed above perform binary linear classification, but their weights ( $w_{i}$ ) and threshold ( $T$ , or bias $w_{0}$ ) were predetermined by a human designer.
Biological neurons can learn from experience and memorize.

Goal: Learnable Artificial Neurons

Desirable for artificial neurons to also learn and memorize.
Mechanism: Feed the neuron a set of known (pre-labeled) data (training set) and use an algorithm to learn the weights ( $w_{i}$ ) automatically.
Learning Criterion: Typically involves minimizing the total error between the neuron’s output and the known correct output.

The Perceptron Model

The Perceptron is an early model of a learnable artificial neuron.
Decision Function Recap (Eq. 9.8): The output before thresholding (often denoted $Y$ in learning contexts) is the weighted sum including the bias: $Y = w_{0} x_{0} + w_{1} x_{1} + w_{2} x_{2} + \dots + w_{n} x_{n}$ (Assuming $x_{0} = 1$ , $w_{0}$ is the bias).

Training the Perceptron

1. Training Data

Collect a training set $S = {(x_{q}, y_{q}) ∣ q = 1, 2, \dots, N}$ .
$x_{q} = (x_{q 1}, x_{q 2}, \dots, x_{q n})$ is the feature vector of the $q^{t h}$ training sample.
$y_{q}$ is the desired (correct/target) output for sample $x_{q}$ . (Often binary: $0$ or $1$ , or sometimes $- 1$ or $1$ ).

2. Error Function

The goal is to minimize the error between the predicted output ( $Y_{q}$ ) and the desired output ( $y_{q}$ ).
Common error measure: Mean Squared Error (MSE) or Sum of Squared Errors (SSE).
MSE Formula (Eq. 9.9): $E = E (w_{0}, w_{1}, \dots, w_{n}) = \frac{1}{2} \sum_{q = 1}^{N} (Y_{q} - y_{q})^{2}$ Where $Y_{q} = w_{0} + \sum_{i = 1}^{n} w_{i} x_{q i}$ is the neuron’s output before thresholding for sample $q$ . (The $\frac{1}{2}$ simplifies the derivative).

3. Minimization Algorithm: Steepest Descent (Gradient Descent)

Adjust weights $w_{i}$ to minimize the error $E$ .
Follow the direction opposite to the gradient of the error function.
Gradient Vector (Eq. 9.10): The direction of steepest ascent is given by: $\nabla E = (\frac{\partial E}{\partial w _{0}}, \frac{\partial E}{\partial w _{1}}, \dots, \frac{\partial E}{\partial w _{n}})$
Weight Update Rule (General): To decrease error, move in the opposite direction of the gradient: $w_{i} \leftarrow w_{i} - c \frac{\partial E}{\partial w _{i}}$ where $c$ is a small positive constant called the learning rate.

4. Calculating the Gradient

The partial derivative of the error $E$ with respect to a weight $w_{i}$ is needed.
Using the chain rule on Eq. 9.9: $\frac{\partial E}{\partial w _{i}} = \frac{\partial}{\partial w _{i}} [\frac{1}{2} \sum_{q = 1}^{N} (Y_{q} - y_{q})^{2}]$ $= \sum_{q = 1}^{N} (Y_{q} - y_{q}) \frac{\partial Y _{q}}{\partial w _{i}}$
Since $Y_{q} = w_{0} + \sum_{j = 1}^{n} w_{j} x_{q j}$ , the derivative $\frac{\partial Y _{q}}{\partial w _{i}} = x_{q i}$ (assuming $x_{q 0} = 1$ for the bias term $w_{0}$ ).
So, $\frac{\partial E}{\partial w _{i}} = \sum_{q = 1}^{N} (Y_{q} - y_{q}) x_{q i}$ . This is for the total error over all samples (Batch Gradient Descent).

5. Online/Stochastic Gradient Descent

Often, weights are updated after processing each training sample $q$ . The error for a single sample is $E_{q} = \frac{1}{2} (Y_{q} - y_{q})^{2}$ .
Partial Derivative for Single Sample (Eq. 9.11): $\frac{\partial E _{q}}{\partial w _{i}} = (Y_{q} - y_{q}) x_{q i} for i = 0, 1, \dots, n$
Weight Update Rule (Online): $w_{i} \leftarrow w_{i} - c (Y_{q} - y_{q}) x_{q i}$

Perceptron Learning Algorithm (MSE version, Online)

Initialization: Choose an initial weight vector $(w_{0}, w_{1}, \dots, w_{n})$ (e.g., small random values) and a positive learning rate $c$ .
Iteration: For each training sample $(x_{q}, y_{q})$ : a. Compute the predicted output (before thresholding): $Y_{q} = \sum_{i = 0}^{n} w_{i} x_{q i}$ (where $x_{q 0} = 1$ ). b. Update each weight: $w_{i} \leftarrow w_{i} - c (Y_{q} - y_{q}) x_{q i}$ for $i = 0, 1, \dots, n$ .
Repeat: Repeat Step 2 (cycling through all training samples, potentially multiple times - called epochs) until the weights $(w_{0}, w_{1}, \dots, w_{n})$ stop changing significantly or another stopping criterion is met.

(Note: The original Perceptron algorithm by Rosenblatt used a slightly different update rule based directly on the thresholded output, only updating weights if misclassification occurred. The version presented here uses gradient descent on the squared error of the unthresholded output $Y_{q}$ .)

9.4 Nonlinear Neural Network

Limitations of Single Perceptrons

A perceptron (essentially a two-layer network: input layer, output layer) can only learn linear decision boundaries (lines in 2D, planes in 3D, hyperplanes in higher dimensions).
It can only classify data that is linearly separable.
Cannot Classify Non-linear Data: Problems like XOR or datasets with convex/complex shapes cannot be solved by a single perceptron.
- Figure 9.8 Description (XOR): Shows the XOR problem in 2D ( $x_{1}, x_{2}$ axes). Points $(0, 0)$ and $(1, 1)$ have output $0$ . Points $(0, 1)$ and $(1, 0)$ have output $1$ . No single straight line can separate the $0$ s from the $1$ s.
- Figure 9.9 Description (Convex Region): Illustrates a scenario where one class (‘o’) forms a cluster in the center, surrounded by another class (‘x’). A single straight line cannot isolate the central cluster.

Overcoming Limitations: Multi-Layer Networks

Representing Complex Regions

Convex Regions: The boundary of a convex region can be approximated by the intersection (ANDing) of multiple half-planes. Each half-plane is defined by a linear inequality (a line/hyperplane).
Non-Convex Regions: A non-convex region can be approximated by the union (ORing) of multiple convex regions.

Example: Solving XOR with Multiple Neurons

The XOR function is non-linear. It can be solved by combining the outputs of multiple linear units.
Strategy: Use two lines to separate the space.
- Line 1 (OR-like): Separates $(0, 0)$ from the rest. Inequality: $- 0.5 + x_{1} + x_{2} > 0$ (Activates if $x_{1} + x_{2} > 0.5$ ).
- Line 2 (NAND-like): Separates $(1, 1)$ from the rest. Inequality: $1.5 - x_{1} - x_{2} > 0$ (Equivalent to $x_{1} + x_{2} < 1.5$ . Activates if $x_{1} + x_{2} < 1.5$ ).
Combining Lines (Eq. 9.13 based): The XOR output is $1$ if Line 1 is active AND Line 2 is active. This occurs only for inputs $(1, 0)$ and $(0, 1)$ .
- Neuron 1 implements: $y_{1} = 1$ if $- 0.5 + x_{1} + x_{2} > 0$ .
- Neuron 2 implements: $y_{2} = 1$ if $1.5 - x_{1} - x_{2} > 0$ .
- Final Output Neuron implements: $y = y_{1} AND y_{2}$ .

Three-Layer Network for XOR (Reference: Fig. 9.10)

Structure:
- Input Layer: Nodes for $x_{1}, x_{2}$ (and an implicit bias node $x_{0} = 1$ ).
- Hidden Layer: Contains neurons that create the separating lines (Neuron H1, Neuron H2). Includes a bias node.
- Output Layer: Contains a neuron that combines the hidden layer outputs (AND function).
Figure 9.10 Description: Shows the network.
- Inputs $x_{1}, x_{2}$ . Bias input $1$ .
- Hidden Neuron H1 (Circle): Receives inputs $1, x_{1}, x_{2}$ with weights $[- 0.5, 1, 1]$ . Output $y_{H 1}$ . (Implements $x_{1} + x_{2} > 0.5$ ).
- Hidden Neuron H2 (Circle): Receives inputs $1, x_{1}, x_{2}$ with weights $[1.5, - 1, - 1]$ . Output $y_{H 2}$ . (Implements $1.5 - x_{1} - x_{2} > 0$ ).
- Output Neuron (Circle labeled AND): Receives inputs $1, y_{H 1}, y_{H 2}$ with weights $[- 1.5, 1, 1]$ (implementing AND with $T = 1.5$ ). Output $y$ .
- This network correctly computes XOR.

Classifying General Convex Data (Reference: Fig. 9.11)

Use a three-layer network.
Hidden Layer: Contains multiple neurons. Each neuron learns a linear boundary (half-plane). The combination of these boundaries approximates the convex region. More hidden neurons allow for smoother/more complex convex boundaries.
Output Layer: Typically a single neuron performing an AND-like operation on the hidden layer outputs.
Figure 9.11 Description: Generalizes Fig 9.10. Input layer ( $x_{1} .. x_{n}$ , bias 1). Hidden layer with $M$ neurons ( $y_{1} .. y_{M}$ , bias 1). Output layer (1 neuron labeled AND). Connections have generic weights ( $w_{01}, w_{n M}$ , etc.).

Classifying Non-Convex Data (Reference: Fig. 9.12, 9.13)

Requires combining multiple convex regions using OR logic.
Use a four-layer network.
Structure:
- Layer 1: Input Layer.
- Layer 2 (Hidden Layer 1): Neurons create the linear boundaries (half-planes) needed for all the convex regions.
- Layer 3 (Hidden Layer 2): Neurons act as AND gates, combining outputs from Layer 2 to represent individual convex regions.
- Layer 4: Output Layer: A single neuron acts as an OR gate, combining the outputs from Layer 3 (the convex region detectors).
Figure 9.12 Description: Shows a non-convex region (black dots) that can be covered by the union of three simpler (convex) regions.
Figure 9.13 Description: Generalizes the four-layer structure. Input layer ( $x_{1} .. x_{n}$ , bias 1). Hidden Layer 1 ( $y_{11} .. y_{1 M}$ , bias 1). Hidden Layer 2 ( $y_{21} .. y_{2 N}$ , representing convex regions). Output Layer (1 neuron labeled OR). Connections have generic weights.

Conclusion on Network Depth

This structure indicates that any non-linear classification problem can theoretically be solved by a four-layer neural network. (In practice, networks with fewer layers might suffice, especially with more complex activation functions).

9.5 Activation and Inhibition

Importance of Activation Functions

The activation (or threshold) function is crucial.
A network without a non-linear activation function (i.e., only using weighted sums) is equivalent to a simple two-layer linear network, regardless of how many layers it has. It cannot separate non-linear data.

Problems with Binary Threshold

Discontinuity: The binary step function ( $0$ or $1$ ) is not continuous.
Learning Issues: This discontinuity poses problems for gradient-based learning (like the MSE method in Section 9.3):
- The gradient is zero almost everywhere, except at the threshold point where it’s undefined.
- Small changes in weights might not be enough to change the neuron’s output (flip it across the threshold).
- This can lead to very slow convergence or failure to converge.

9.5.1 Sigmoid Activation

Need for Continuous Activation

Desirable to have a continuous activation function that approximates the $0$ -to- $1$ jump smoothly.

The Sigmoid Function

A widely used continuous activation function. It has an S-shape.
Formula (Eq. 9.14): $R (s) = \frac{1}{1 + e ^{- s}}$ Where $s$ is the net input to the neuron (the weighted sum including bias: $s = \sum w_{i} x_{i}$ ).
Figure 9.14 Description: Plots the function $R (s)$ versus $s$ . The curve smoothly transitions from $0$ (for large negative $s$ ) to $1$ (for large positive $s$ ), passing through $R (0) = 0.5$ .

Properties of Sigmoid Function $R (s)$

Limit at $- \infty$ : $lim_{s \to - \infty} R (s) = 0$
Limit at $+ \infty$ : $lim_{s \to \infty} R (s) = 1$
Value at Origin: $R (0) = \frac{1}{1 + e ^{0}} = \frac{1}{1 + 1} = 0.5$ (or $1/2$ )
Derivative (Eq. 9.15): The derivative is conveniently expressed in terms of the function itself: $\frac{d R}{d s} = R (s) (1 - R (s))$
Derivative Behavior: The derivative $R (1 - R)$ is maximal at $s = 0$ (where $R = 0.5$ ) and approaches $0$ as $s \to \pm \infty$ (where $R$ approaches $1$ or $0$ ). This property is important for backpropagation (covered later), as it means learning slows down when the neuron’s output is saturated (very close to $0$ or $1$ ).

9.5.2 Shunting Inhibition

Biological Background

Biological neurons can be excitatory or inhibitory.
An inhibitory signal prevents or reduces the likelihood of the receiving neuron firing.

Mathematical Modeling: Shunting Inhibition (SI)

Represents the inhibitory effect by division, effectively reducing the excitatory potential.
Mechanism: The neuron learns two sets of weights:
- Excitatory weights ( $w_{j}$ )
- Inhibitory weights ( $D_{j}$ )
The final output involves dividing the result from the excitatory pathway by a term derived from the inhibitory pathway.

SI Neuron Structure (Reference: Fig. 9.15)

Dual Pathways: Input signals $x_{j}$ feed into two parallel processing pathways.
- Excitatory Pathway (Top):
  1. Compute weighted sum: $\sum_{j = 1}^{n} w_{j} x_{j} + b$ (where $b = w_{0} x_{0}$ is the excitatory bias).
  2. Apply activation function $g$ : $g (\sum w_{j} x_{j} + b)$ .
- Inhibitory Pathway (Bottom):
  1. Compute weighted sum: $\sum_{j = 1}^{n} D_{j} x_{j} + a$ (where $a = D_{0} x_{0}$ is the inhibitory bias).
  2. Apply activation function $f$ : $f (\sum D_{j} x_{j} + a)$ .
  3. Add a passive decay rate $d$ : $d + f (\sum D_{j} x_{j} + a)$ .
Final Output (Division): The output of the excitatory pathway is divided by the result from the inhibitory pathway (including decay).
Figure 9.15 Description: Shows inputs $x_{0} .. x_{n}$ . The top path uses weights $w_{0} .. w_{n}$ , sums ( $Σ$ ), passes through activation $g$ . The bottom path uses weights $D_{0} .. D_{n}$ , sums ( $Σ$ ), passes through activation $f$ , then adds decay $d$ (using a ’+’ node). A final division node ( $\div$ ) takes the output of $g$ (numerator) and the output of ( $f + d$ ) (denominator) to produce the final output $z$ .

Mathematical Formula for SI Neuron Output (Eq. 9.16)

$z = \frac{g ( \sum _{j = 1}^{n} w _{j} x _{j} + b )}{d + f ( \sum _{j = 1}^{n} D _{j} x _{j} + a )}$

Where:
- $z$ : Output of the shunting inhibitory neuron.
- $x_{j}$ : The $j^{t h}$ input ( $j = 1, \dots, n$ ).
- $w_{j}$ : Excitatory connection weight for the $j^{t h}$ input.
- $D_{j}$ : Inhibitory connection weight for the $j^{t h}$ input.
- $b = w_{0} x_{0}$ : Excitatory bias (assuming $x_{0} = 1$ ).
- $a = D_{0} x_{0}$ : Inhibitory bias (assuming $x_{0} = 1$ ).
- $d$ : Passive decay rate (a positive constant).
- $f, g$ : Activation functions (can be different).
- $n$ : Number of inputs (excluding bias).
- Condition: The denominator must be positive: $d + f (\sum_{j = 1}^{n} D_{j} x_{j} + a) > 0$ .

Shunting Inhibitory ANN (SIANN)

An ANN constructed using SI neurons.
Key Feature: Often uses different activation functions $f$ and $g$ in a layer. This allows the network dynamics to selectively activate only the neurons receiving the strongest relative excitatory input compared to inhibitory input.
Example: Using $f (s) = tanh (s)$ (hyperbolic tangent) and $g (s) = e^{s}$ (exponential function) has shown good convergence properties in experiments.

Quartz 4

Explorer

9.1, 9.2, 9.3, 9.4 and 9.5

9.1 Introduction

Motivation

Human Brain Structure & Function

Individual Biological Neuron Function (Reference: Fig. 9.1)

Layered Organization in the Brain

From Biology to Artificial Networks

9.2 Artificial Neurons

Modeling the Neuron

Mapping Biological to Electronic Components

Simplified Artificial Neuron Model (Reference: Fig. 9.3)

Functionality: Binary Linear Classifier

Mathematical Formulation

Applications

9.2.1 An AND Neuron

AND Function

Implementing AND with a Neuron

Visual Representation

9.2.2 An OR Neuron

OR Function

Implementing OR with a Neuron

Visual Representation

Terminology

9.3 Perceptron

Limitations of Fixed Neurons

Goal: Learnable Artificial Neurons

The Perceptron Model

Training the Perceptron

1. Training Data

2. Error Function

3. Minimization Algorithm: Steepest Descent (Gradient Descent)

4. Calculating the Gradient

5. Online/Stochastic Gradient Descent

Perceptron Learning Algorithm (MSE version, Online)

9.4 Nonlinear Neural Network

Limitations of Single Perceptrons

Overcoming Limitations: Multi-Layer Networks

Representing Complex Regions

Example: Solving XOR with Multiple Neurons

Three-Layer Network for XOR (Reference: Fig. 9.10)

Classifying General Convex Data (Reference: Fig. 9.11)

Classifying Non-Convex Data (Reference: Fig. 9.12, 9.13)

Conclusion on Network Depth

9.5 Activation and Inhibition

Importance of Activation Functions

Problems with Binary Threshold

9.5.1 Sigmoid Activation

Need for Continuous Activation

The Sigmoid Function

Properties of Sigmoid FunctionR(s)

9.5.2 Shunting Inhibition

Biological Background

Mathematical Modeling: Shunting Inhibition (SI)

SI Neuron Structure (Reference: Fig. 9.15)

Mathematical Formula for SI Neuron Output (Eq. 9.16)

Shunting Inhibitory ANN (SIANN)

Graph View

Table of Contents

Backlinks

Properties of Sigmoid Function $R (s)$