9.1 Introduction
Motivation
- Benchmark: Human brains are exceptionally efficient and powerful for learning and classification tasks.
- Goal: Design machine learning tools that simulate or mimic the functionality of human brains.
- Supporting Evidence: Encouraged by research findings in cognitive science and biology regarding the human brain.
Human Brain Structure & Function
- Scale: Composed of tens of billions (s of billions) of neurons.
- Interconnection: Neurons form a highly sophisticated, interconnected network.
- Organization: Neurons are organized into functional units or regions specialized for tasks like:
- Visual processing
- Auditory processing
- Motion control
- Reasoning
- Speech, etc.
Individual Biological Neuron Function (Reference: Fig. 9.1)
- Inputs: Received through dendrites.
- Processing: Occurs in the cell body.
- Output Transmission: Signal sent out to other neurons via the axon.
- Input Types: Inputs can be:
- Excitatory: Tend to activate the neuron.
- Inhibitory: Tend to prevent the neuron from activating.
- Activation Condition:
- If (Total Excitatory Input > Total Inhibitory Input): The neuron is activated, and a signal is transmitted through the axon.
- Otherwise: No signal is generated.
- Figure 9.1 Description: Shows a biological neuron with a central
Cell body, branchingDendritesreceiving inputs, a longAxontransmitting the output, andTerminalsat the end of the axon to connect to other neurons.
Layered Organization in the Brain
- Structure: Neurons in many brain regions are organized into layers, forming a layered network.
- Connectivity: Neurons typically receive inputs from neurons in an adjacent layer.
- Directionality: Connections between layers are often unidirectional, flowing from:
- Low-level sensory layers (e.g., eyes, ears)
- To higher-level coordination and reasoning layers.
From Biology to Artificial Networks
- These understandings of human brain structure and neuron function provide the foundation for designing:
- Artificial Neurons: Simplified computational models of biological neurons.
- Artificial Neural Networks (ANNs): Networks composed of interconnected artificial neurons.
9.2 Artificial Neurons
Modeling the Neuron
- The design of an ANN starts with modeling the artificial neuron.
- A neuron is fundamentally a unit that receives inputs and generates an output.
Mapping Biological to Electronic Components
- Biological Neuron Components:
- Inputs (Dendrites)
- Activation/Processing Unit (Cell Body)
- Output (Axon)
- Electronic Representation:
- InputsA set of input values.
- Activation/ProcessingA weighted sum of inputs () and a threshold ().
- OutputAn output signal based on the threshold comparison.
- Figure 9.2 Description: Illustrates the alignment. Dendrites map to inputs () with associated weights (). The Cell Body maps to the summation unit () and an activation/threshold mechanism (represented by a step function symbol). The Axon maps to the output line, leading to Terminals.
Simplified Artificial Neuron Model (Reference: Fig. 9.3)
- The weighted sum () and the thresholding operation () are typically merged into a single activation unit.
- The axon is conceptually replaced by the output signal ().
- Figure 9.3 Description: Shows inputswith weightsfeeding into an “Activation” block. This block performs the weighted sum and compares it to the threshold, indicated by the label "". The output is a single signal.
Functionality: Binary Linear Classifier
- An artificial neuron, in this basic form, acts as a binary linear classifier.
- Process:
- Given an input vector.
- Calculate the weighted sum.
- Comparewith a threshold.
Mathematical Formulation
- Weighted Sum Calculation (Eq. 9.1): The condition for activation is.
- Activation Rule / Output Generation (Eq. 9.2): (Note: The behavior atis often undefined or assigned to one class).
- Combining Sum and Threshold: It’s convenient to combineandinto a single decision function.
- Decision Function (Eq. 9.3): Rearranging:
- Bias Term Notation: Introduce a constant inputand a corresponding weight.is often called the bias.
- Decision Function with Bias (Eq. 9.4):
- Revised Activation Rule (Eq. 9.5): The outputis determined by the sign of.
Applications
- This simple artificial neuron model can perform basic linear classifications, including simulating logic gates like AND, OR, NAND, NOR.
9.2.1 An AND Neuron
AND Function
- The logical AND function outputsonly if all inputs are.
- Truth Table (Table 9.1):
| Input | Input | Weighted Sum () | Output |
|---|---|---|---|
Implementing AND with a Neuron
- Weights: Letand. The weighted sum is.
- Threshold Selection: Choose a thresholdsuch that only the input(where) activates the neuron. A value likeworks (anywhere).
- Decision Boundary: Using theformulation (Eq 9.4, 9.5) with,,: With, the condition for outputis:
- Hyperplane / Separating Line (Eq. 9.7): The boundary between the two output classes is where: (The text’s Eq 9.6,, likely refers to the value before thresholding or the equation of the separating line itself, not the final binary output).
Visual Representation
- Figure 9.4 Description: Shows a 2D plot with axesand. The four input pointsare plotted. Pointsare labeled with output. Pointis labeled with output. The lineis drawn, correctly separatingfrom the other points.
- Figure 9.5 Description: Depicts the AND neuron structure. Inputsandconnect with weightsand. A bias input(implicitly understood or sometimes drawn) connects with weight. These feed into the neuron body (circle), which contains the threshold logic (represented byinside or associated with it). The output is.
9.2.2 An OR Neuron
OR Function
- The logical OR function outputsif at least one input is.
- Truth Table (Implied):
| Input | Input | Weighted Sum () | Output |
|---|---|---|---|
Implementing OR with a Neuron
- Weights: Again, letand. The weighted sum is.
- Threshold Selection: Choosesuch that inputs(where) activate the neuron, but(where) does not. A value likeworks (anywhere).
- Decision Boundary: Using,,: The condition for outputis:
- Hyperplane / Separating Line:
Visual Representation
- Figure 9.6 Description: Shows a 2D plot similar to Fig 9.4, but for the OR function. Pointis labeled with output. Pointsare labeled with output. The lineis drawn, correctly separatingfrom the other points.
- Figure 9.7 Description: Depicts the OR neuron structure. Inputsconnect with weights. The bias weight is. The threshold logic inside the neuron circle is represented by. The output is.
Terminology
- An artificial neuron is also called a node in an ANN and is typically represented by a circle.
9.3 Perceptron
Limitations of Fixed Neurons
- The AND/OR neurons discussed above perform binary linear classification, but their weights () and threshold (, or bias) were predetermined by a human designer.
- Biological neurons can learn from experience and memorize.
Goal: Learnable Artificial Neurons
- Desirable for artificial neurons to also learn and memorize.
- Mechanism: Feed the neuron a set of known (pre-labeled) data (training set) and use an algorithm to learn the weights () automatically.
- Learning Criterion: Typically involves minimizing the total error between the neuron’s output and the known correct output.
The Perceptron Model
- The Perceptron is an early model of a learnable artificial neuron.
- Decision Function Recap (Eq. 9.8): The output before thresholding (often denotedin learning contexts) is the weighted sum including the bias: (Assuming,is the bias).
Training the Perceptron
1. Training Data
- Collect a training set.
- is the feature vector of thetraining sample.
- is the desired (correct/target) output for sample. (Often binary:or, or sometimesor).
2. Error Function
- The goal is to minimize the error between the predicted output () and the desired output ().
- Common error measure: Mean Squared Error (MSE) or Sum of Squared Errors (SSE).
- MSE Formula (Eq. 9.9): Whereis the neuron’s output before thresholding for sample. (Thesimplifies the derivative).
3. Minimization Algorithm: Steepest Descent (Gradient Descent)
- Adjust weightsto minimize the error.
- Follow the direction opposite to the gradient of the error function.
- Gradient Vector (Eq. 9.10): The direction of steepest ascent is given by:
- Weight Update Rule (General): To decrease error, move in the opposite direction of the gradient: whereis a small positive constant called the learning rate.
4. Calculating the Gradient
- The partial derivative of the errorwith respect to a weightis needed.
- Using the chain rule on Eq. 9.9:
- Since, the derivative(assumingfor the bias term).
- So,. This is for the total error over all samples (Batch Gradient Descent).
5. Online/Stochastic Gradient Descent
- Often, weights are updated after processing each training sample. The error for a single sample is.
- Partial Derivative for Single Sample (Eq. 9.11):
- Weight Update Rule (Online):
Perceptron Learning Algorithm (MSE version, Online)
- Initialization: Choose an initial weight vector(e.g., small random values) and a positive learning rate.
- Iteration: For each training sample: a. Compute the predicted output (before thresholding):(where). b. Update each weight:for.
- Repeat: Repeat Step 2 (cycling through all training samples, potentially multiple times - called epochs) until the weightsstop changing significantly or another stopping criterion is met.
(Note: The original Perceptron algorithm by Rosenblatt used a slightly different update rule based directly on the thresholded output, only updating weights if misclassification occurred. The version presented here uses gradient descent on the squared error of the unthresholded output.)
9.4 Nonlinear Neural Network
Limitations of Single Perceptrons
- A perceptron (essentially a two-layer network: input layer, output layer) can only learn linear decision boundaries (lines in 2D, planes in 3D, hyperplanes in higher dimensions).
- It can only classify data that is linearly separable.
- Cannot Classify Non-linear Data: Problems like XOR or datasets with convex/complex shapes cannot be solved by a single perceptron.
- Figure 9.8 Description (XOR): Shows the XOR problem in 2D (axes). Pointsandhave output. Pointsandhave output. No single straight line can separate thes from thes.
- Figure 9.9 Description (Convex Region): Illustrates a scenario where one class (‘o’) forms a cluster in the center, surrounded by another class (‘x’). A single straight line cannot isolate the central cluster.
Overcoming Limitations: Multi-Layer Networks
Representing Complex Regions
- Convex Regions: The boundary of a convex region can be approximated by the intersection (ANDing) of multiple half-planes. Each half-plane is defined by a linear inequality (a line/hyperplane).
- Non-Convex Regions: A non-convex region can be approximated by the union (ORing) of multiple convex regions.
Example: Solving XOR with Multiple Neurons
- The XOR function is non-linear. It can be solved by combining the outputs of multiple linear units.
- Strategy: Use two lines to separate the space.
- Line 1 (OR-like): Separatesfrom the rest. Inequality:(Activates if).
- Line 2 (NAND-like): Separatesfrom the rest. Inequality:(Equivalent to. Activates if).
- Combining Lines (Eq. 9.13 based): The XOR output isif Line 1 is active AND Line 2 is active. This occurs only for inputsand.
- Neuron 1 implements:if.
- Neuron 2 implements:if.
- Final Output Neuron implements:.
Three-Layer Network for XOR (Reference: Fig. 9.10)
- Structure:
- Input Layer: Nodes for(and an implicit bias node).
- Hidden Layer: Contains neurons that create the separating lines (Neuron H1, Neuron H2). Includes a bias node.
- Output Layer: Contains a neuron that combines the hidden layer outputs (AND function).
- Figure 9.10 Description: Shows the network.
- Inputs. Bias input.
- Hidden Neuron H1 (Circle): Receives inputswith weights. Output. (Implements).
- Hidden Neuron H2 (Circle): Receives inputswith weights. Output. (Implements).
- Output Neuron (Circle labeled AND): Receives inputswith weights(implementing AND with). Output.
- This network correctly computes XOR.
Classifying General Convex Data (Reference: Fig. 9.11)
- Use a three-layer network.
- Hidden Layer: Contains multiple neurons. Each neuron learns a linear boundary (half-plane). The combination of these boundaries approximates the convex region. More hidden neurons allow for smoother/more complex convex boundaries.
- Output Layer: Typically a single neuron performing an AND-like operation on the hidden layer outputs.
- Figure 9.11 Description: Generalizes Fig 9.10. Input layer (, bias 1). Hidden layer withneurons (, bias 1). Output layer (1 neuron labeled AND). Connections have generic weights (, etc.).
Classifying Non-Convex Data (Reference: Fig. 9.12, 9.13)
- Requires combining multiple convex regions using OR logic.
- Use a four-layer network.
- Structure:
- Layer 1: Input Layer.
- Layer 2 (Hidden Layer 1): Neurons create the linear boundaries (half-planes) needed for all the convex regions.
- Layer 3 (Hidden Layer 2): Neurons act as AND gates, combining outputs from Layer 2 to represent individual convex regions.
- Layer 4: Output Layer: A single neuron acts as an OR gate, combining the outputs from Layer 3 (the convex region detectors).
- Figure 9.12 Description: Shows a non-convex region (black dots) that can be covered by the union of three simpler (convex) regions.
- Figure 9.13 Description: Generalizes the four-layer structure. Input layer (, bias 1). Hidden Layer 1 (, bias 1). Hidden Layer 2 (, representing convex regions). Output Layer (1 neuron labeled OR). Connections have generic weights.
Conclusion on Network Depth
- This structure indicates that any non-linear classification problem can theoretically be solved by a four-layer neural network. (In practice, networks with fewer layers might suffice, especially with more complex activation functions).
9.5 Activation and Inhibition
Importance of Activation Functions
- The activation (or threshold) function is crucial.
- A network without a non-linear activation function (i.e., only using weighted sums) is equivalent to a simple two-layer linear network, regardless of how many layers it has. It cannot separate non-linear data.
Problems with Binary Threshold
- Discontinuity: The binary step function (or) is not continuous.
- Learning Issues: This discontinuity poses problems for gradient-based learning (like the MSE method in Section 9.3):
- The gradient is zero almost everywhere, except at the threshold point where it’s undefined.
- Small changes in weights might not be enough to change the neuron’s output (flip it across the threshold).
- This can lead to very slow convergence or failure to converge.
9.5.1 Sigmoid Activation
Need for Continuous Activation
- Desirable to have a continuous activation function that approximates the-to-jump smoothly.
The Sigmoid Function
- A widely used continuous activation function. It has an S-shape.
- Formula (Eq. 9.14): Whereis the net input to the neuron (the weighted sum including bias:).
- Figure 9.14 Description: Plots the functionversus. The curve smoothly transitions from(for large negative) to(for large positive), passing through.
Properties of Sigmoid Function
- Limit at:
- Limit at:
- Value at Origin:(or)
- Derivative (Eq. 9.15): The derivative is conveniently expressed in terms of the function itself:
- Derivative Behavior: The derivativeis maximal at(where) and approachesas(whereapproachesor). This property is important for backpropagation (covered later), as it means learning slows down when the neuron’s output is saturated (very close toor).
9.5.2 Shunting Inhibition
Biological Background
- Biological neurons can be excitatory or inhibitory.
- An inhibitory signal prevents or reduces the likelihood of the receiving neuron firing.
Mathematical Modeling: Shunting Inhibition (SI)
- Represents the inhibitory effect by division, effectively reducing the excitatory potential.
- Mechanism: The neuron learns two sets of weights:
- Excitatory weights ()
- Inhibitory weights ()
- The final output involves dividing the result from the excitatory pathway by a term derived from the inhibitory pathway.
SI Neuron Structure (Reference: Fig. 9.15)
- Dual Pathways: Input signalsfeed into two parallel processing pathways.
- Excitatory Pathway (Top):
- Compute weighted sum:(whereis the excitatory bias).
- Apply activation function:.
- Inhibitory Pathway (Bottom):
- Compute weighted sum:(whereis the inhibitory bias).
- Apply activation function:.
- Add a passive decay rate:.
- Excitatory Pathway (Top):
- Final Output (Division): The output of the excitatory pathway is divided by the result from the inhibitory pathway (including decay).
- Figure 9.15 Description: Shows inputs. The top path uses weights, sums (), passes through activation. The bottom path uses weights, sums (), passes through activation, then adds decay(using a ’+’ node). A final division node () takes the output of(numerator) and the output of () (denominator) to produce the final output.
Mathematical Formula for SI Neuron Output (Eq. 9.16)
- Where:
- : Output of the shunting inhibitory neuron.
- : Theinput ().
- : Excitatory connection weight for theinput.
- : Inhibitory connection weight for theinput.
- : Excitatory bias (assuming).
- : Inhibitory bias (assuming).
- : Passive decay rate (a positive constant).
- : Activation functions (can be different).
- : Number of inputs (excluding bias).
- Condition: The denominator must be positive:.
Shunting Inhibitory ANN (SIANN)
- An ANN constructed using SI neurons.
- Key Feature: Often uses different activation functionsandin a layer. This allows the network dynamics to selectively activate only the neurons receiving the strongest relative excitatory input compared to inhibitory input.
- Example: Using(hyperbolic tangent) and(exponential function) has shown good convergence properties in experiments.