Artificial Neural Networks

Advanced Modeling and Control

Introduction

Artificial Intelligence (AI): systems capable of performing tasks that typically require human intelligence
- Typical tasks: learning from experience, understanding natural language, recognizing patterns, solving problems, and making decisions
- Encompasses techniques like machine learning, neural networks, natural language processing, and robotics
Machine Learning (ML): A subset of artificial intelligence that involves training algorithms to make predictions or decisions based on data
- Regression, Artificial Neural Networks (ANN), decision trees, and clustering
Deep Learning (DL): A specialized subset of ML focused on neural networks with many layers (“deep”)
- Excels at processing unstructured data like images, text, and audio
Artificial Neural Networks (ANN): A specific model within ML, inspired by the human brain’s structure
- Consists of layers of interconnected “neurons.”
- Basic building block used in both traditional ML and deep learning

Introduction

ANNs are a class of machine learning models inspired by the human brain’s structure and function. They consist of interconnected units called neurons, organized in layers
Brains and computers operate fundamentally differently from each other
- Brains: No CPU, lots of slightly smart memory cells
  - Recognizing faces, retrieving information based on partial descriptions, organizing information: The more info available, the better the brain operates
- Computers: One very smart CPU, lots of extremely dumb memory cells
  - Arithmetic, deductive logic (e.g., \frac{p \rightarrow q, \; p}{\therefore q}), retrieving information based on arbitrary features
The brain is composed of neurons
- Neurons convey and transform information through electrical signals
- Information in neurons is represented by “activation” (a scalar value)
ANNs are constructed and implemented to model the human brain

Biological Neuron

Dendrite
- Receives and integrates synaptic signals from other neurons
Cell Body
- Makes decisions based on integrated signals
Axon
- Passes signals to other neurons
- Includes axon hillock (decision point) and axon collaterals (branches to nearby cells)

Many neuron types exist.
Interactions can be electrical or chemical.
Different types of neuron connections (e.g., axodendritic, dendrodendritic).
Key Characteristics of Neuronal Processing:
- Slow signal propagation.
- Large number of neurons (10^{10} – 10^{11}).
- No central controller (CPU).
- High connectivity (10^{4} connections per neuron).
- Information conveyed by neuron firing rates (activation).

ANN basics: modeling single neuron

Universal Approximation Theorem

Any continuous function h(x) can be approximated by an ANN with one hidden layer, given appropriate non-linearity and sufficient neurons. Thus ANNs can model any complex, continuous function.

Neurons receive multiple inputs.
Inputs are modified by weights.
Neurons sum weighted inputs.
Neurons transmit output signals.
Outputs connect to other neurons.
Local Processing: Information is processed locally within the neuron.
Distributed Memory: Short-term = signals; Long-term = weights.
Learning: Weights adjust through experience.
Flexibility: Neurons can generalize and are fault-tolerant.

Elements of neural networks

ANNs consist of hidden layers with neurons (i.e., computational units)
A single neuron maps a set of inputs into an output number, or f:R^K \rightarrow R

z = a_1 w_1 + a_2 w_2 + \ldots + a_k w_k + b; \qquad a = \sigma(z)

Elements of neural networks

ANN with one hidden layer and one output layer

Hidden layer

h = \sigma (w_1 x + b_1)
Output layer

y = \sigma(w_2 h + b_2)
Neurons: 6
- Hidden layer: 4
- output layer: 2
Weights: 20
- Hidden layer: 3 x 4
- Output layer: 4 x 2
Biases: 6
- Hidden layer: 4
- Output layer: 2
Total 26 learning parameters

Activation function

Mathematical function in neural networks determining the output of a neuron.
Introduces non-linearity to model complex patterns.
Without it, neural networks would only perform linear transformations.
Common Activation Functions
- Sigmoid: Outputs: 0 to 1, used in binary classification.
  
  \sigma(x) = \frac{1}{1 + e^{-x}}
- ReLU (Rectified Linear Unit): Outputs: Input if positive; otherwise 0. Common in deep networks.
  
  f(x) = \text{max}(0, x)
- Tanh (Hyperbolic Tangent): Outputs: -1 to 1, centers data around zero.
  
  \text{tanh}(x) = \frac{2}{1 + e^{-2x}} - 1

Important terminologies of ANNs

Weights
- Parameters that transform input data in the network; each connection has an associated weight.
- Weights determine the influence of each input on the output; higher weights mean stronger influence.

Bias
- An additional parameter that shifts the activation function, helping the model fit the data better.
- Similar to the y-intercept in a linear equation, it allows the function to adjust left or right.

Threshold
- The value at which a neuron’s activation function decides whether to fire (produce an output).
- In binary threshold functions, if the weighted sum exceeds the threshold, the neuron activates (outputs 1); otherwise, it doesn’t (outputs 0).

Important terminologies of ANNs

Learning Rate
- Controls how much weights are adjusted in response to errors during training.
- Determines step size in gradient descent; smaller values mean more precise adjustments but slower convergence.

Momentum Factor
- Added to the weight update formula to accelerate gradient descent and reduce oscillations.
- Incorporates previous updates to smooth the optimization path, speeding up convergence and avoiding local minima.

Loss function
- Measures the difference between predictions and actual values.
- Training aims to minimize this loss for better accuracy. Common types: Mean Squared Error (MSE) for regression, Cross-Entropy Loss for classification.

Training an ANN: Forward and Backpropagation

Training
- The model learns the relationship between input and output data (supervisory learning)
- Weights and biases are fine-tuned during training. The goal is to minimize the error between predicted and actual outputs.
Forward Propagation

Passing input data through the network to produce an output, utilizing weights, biases, and activation functions.
Backpropagation

The process of refining the network by adjusting weights and biases based on the error, enabling the model to learn and improve over time (iterations).
Adjusting weights and biases

Move in the negative direction of the error (cost) function’s slope until a minimum error value is reached

\Rightarrow Gradient descent

Gradient Descent

Optimization algorithm used to minimize the cost function in machine learning
Cost function (loss function or objective function)
- Measures the difference between the predicted outputs and the actual target values.
- Mean Squared Error (MSE): Common for regression tasks
- Cross-Entropy Loss: Often used in classification tasks, it measures the difference between two probability distributions – the predicted probabilities and the actual labels
Calculate the gradient (partial derivatives) of the cost function with respect to each parameter
Adjust the parameters in the opposite direction of the gradient to decrease the cost function. Iterate until the cost function reaches a minimum
Variants
- Stochastic Gradient Descent (SGD): Uses a random subset of data per iteration to speed up computation.
- Adaptive Gradient Descent: Adjusts step size for different data components (useful for text, image data).
- Momentum Gradient Descent: Uses previous gradients to build momentum, accelerating convergence.

Overfitting

Overfitting occurs when a machine learning model learns not only the underlying patterns in the training data but also the noise and random fluctuations.
The model fits the training data too closely, capturing even the minor details and noise.
The model performs very well on the training data but poorly on new, unseen data

Causes
- Too Complex Model: Using a model with too many parameters or features relative to the amount of training data.
- Insufficient Training Data: When there isn’t enough data to represent the true underlying patterns, the model may learn noise as if it were a pattern.
- Too Long Training: Training the model for too many iterations, allowing it to adjust to even the smallest noise in the data.

Overfitting mitigation strategies

Simplifying the Model: Reducing the number of features or parameters in the model.
Cross-Validation: ensure the model generalizes well to unseen data.

k-fold:
- divide the data into k subset
- Train the model k times. Each time one subset is used for validation and the rest are used for training
- Average the results
Regularization: Adding a penalty to the cost function for large coefficients
- L1: Helps the model focus on the most important features by setting some weights to zero.
- L2: Keeps all features but reduces the impact of any single feature by making all weights smaller.
Early Stopping: Halting the training process when performance on a validation set starts to degrade, even if the training performance is still improving.

Network Architectures

Perceptron & Feed-Forward Networks (FFN)

Basic prediction tasks, such as property estimation, reaction rate predictions, and process optimization.
Recurrent Neural Networks (RNN) & Long Short-Term Memory (LSTM)

Modeling dynamic systems, time-series data, and sequential processes, such as reactor dynamics and control systems.
Convolutional Neural Networks (CNN)

Primarily in image analysis related to process monitoring (e.g., analyzing images from reactors, identifying patterns in images of materials).
Autoencoders & Variational Autoencoders (VAE)

Dimensionality reduction, feature extraction, fault detection, and process monitoring.
Generative Adversarial Networks (GAN)

Generating synthetic data for simulations, improving the robustness of models, and process optimization.

The neural network zoo

Summary

ANNs are powerful tools for modeling complex, non-linear relationships in data.
ANNs are applied to a wide range of chemical engineering problems, from static predictions to dynamic process control.
Key Concepts:
- Activation Functions: Introduce non-linearity to capture complex patterns (e.g., Sigmoid, ReLU, Tanh).
- Training: Involves forward propagation (prediction) and backpropagation (learning from errors).
- Optimization with Gradient Descent: Minimizes the error between predictions and actual values through iterative updates.
Overfitting can be prevented by simplifying the model, using cross-validation, applying regularization, and early stopping to prevent overtraining
Several network architectures exist

\Rightarrow Successfully designing, training, and deploying ANNs often requires deep domain knowledge and experience in machine learning.