Advanced Modeling and Control
Artificial Intelligence (AI): systems capable of performing tasks that typically require human intelligence
Typical tasks: learning from experience, understanding natural language, recognizing patterns, solving problems, and making decisions
Encompasses techniques like machine learning, neural networks, natural language processing, and robotics
Machine Learning (ML): A subset of artificial intelligence that involves training algorithms to make predictions or decisions based on data
Deep Learning (DL): A specialized subset of ML focused on neural networks with many layers (“deep”)
Artificial Neural Networks (ANN): A specific model within ML, inspired by the human brain’s structure
ANNs are a class of machine learning models inspired by the human brain’s structure and function. They consist of interconnected units called neurons, organized in layers
Brains and computers operate fundamentally differently from each other
The brain is composed of neurons
ANNs are constructed and implemented to model the human brain
Many neuron types exist.
Interactions can be electrical or chemical.
Different types of neuron connections (e.g., axodendritic, dendrodendritic).
Key Characteristics of Neuronal Processing:
Universal Approximation Theorem
Any continuous function h(x) can be approximated by an ANN with one hidden layer, given appropriate non-linearity and sufficient neurons. Thus ANNs can model any complex, continuous function.
Neurons receive multiple inputs.
Inputs are modified by weights.
Neurons sum weighted inputs.
Neurons transmit output signals.
Outputs connect to other neurons.
Local Processing: Information is processed locally within the neuron.
Distributed Memory: Short-term = signals; Long-term = weights.
Learning: Weights adjust through experience.
Flexibility: Neurons can generalize and are fault-tolerant.
ANNs consist of hidden layers with neurons (i.e., computational units)
A single neuron maps a set of inputs into an output number, or f:R^K \rightarrow R
z = a_1 w_1 + a_2 w_2 + \ldots + a_k w_k + b; \qquad a = \sigma(z)
Hidden layer
h = \sigma (w_1 x + b_1)
Output layer
y = \sigma(w_2 h + b_2)
Neurons: 6
Weights: 20
Biases: 6
Total 26 learning parameters
Mathematical function in neural networks determining the output of a neuron.
Introduces non-linearity to model complex patterns.
Without it, neural networks would only perform linear transformations.
Common Activation Functions
Sigmoid: Outputs: 0 to 1, used in binary classification.
\sigma(x) = \frac{1}{1 + e^{-x}}
ReLU (Rectified Linear Unit): Outputs: Input if positive; otherwise 0. Common in deep networks.
f(x) = \text{max}(0, x)
Tanh (Hyperbolic Tangent): Outputs: -1 to 1, centers data around zero.
\text{tanh}(x) = \frac{2}{1 + e^{-2x}} - 1
Weights
Parameters that transform input data in the network; each connection has an associated weight.
Weights determine the influence of each input on the output; higher weights mean stronger influence.
Bias
An additional parameter that shifts the activation function, helping the model fit the data better.
Similar to the y-intercept in a linear equation, it allows the function to adjust left or right.
Threshold
The value at which a neuron’s activation function decides whether to fire (produce an output).
In binary threshold functions, if the weighted sum exceeds the threshold, the neuron activates (outputs 1); otherwise, it doesn’t (outputs 0).
Learning Rate
Momentum Factor
Loss function
Training
Forward Propagation
Passing input data through the network to produce an output, utilizing weights, biases, and activation functions.
Backpropagation
The process of refining the network by adjusting weights and biases based on the error, enabling the model to learn and improve over time (iterations).
Adjusting weights and biases
Move in the negative direction of the error (cost) function’s slope until a minimum error value is reached
\Rightarrow Gradient descent
Optimization algorithm used to minimize the cost function in machine learning
Cost function (loss function or objective function)
Calculate the gradient (partial derivatives) of the cost function with respect to each parameter
Adjust the parameters in the opposite direction of the gradient to decrease the cost function. Iterate until the cost function reaches a minimum
Variants
Overfitting occurs when a machine learning model learns not only the underlying patterns in the training data but also the noise and random fluctuations.
The model fits the training data too closely, capturing even the minor details and noise.
The model performs very well on the training data but poorly on new, unseen data
Causes
Too Complex Model: Using a model with too many parameters or features relative to the amount of training data.
Insufficient Training Data: When there isn’t enough data to represent the true underlying patterns, the model may learn noise as if it were a pattern.
Too Long Training: Training the model for too many iterations, allowing it to adjust to even the smallest noise in the data.
Simplifying the Model: Reducing the number of features or parameters in the model.
Cross-Validation: ensure the model generalizes well to unseen data.
k-fold:
Regularization: Adding a penalty to the cost function for large coefficients
Early Stopping: Halting the training process when performance on a validation set starts to degrade, even if the training performance is still improving.
Perceptron & Feed-Forward Networks (FFN)
Basic prediction tasks, such as property estimation, reaction rate predictions, and process optimization.
Recurrent Neural Networks (RNN) & Long Short-Term Memory (LSTM)
Modeling dynamic systems, time-series data, and sequential processes, such as reactor dynamics and control systems.
Convolutional Neural Networks (CNN)
Primarily in image analysis related to process monitoring (e.g., analyzing images from reactors, identifying patterns in images of materials).
Autoencoders & Variational Autoencoders (VAE)
Dimensionality reduction, feature extraction, fault detection, and process monitoring.
Generative Adversarial Networks (GAN)
Generating synthetic data for simulations, improving the robustness of models, and process optimization.
ANNs are powerful tools for modeling complex, non-linear relationships in data.
ANNs are applied to a wide range of chemical engineering problems, from static predictions to dynamic process control.
Key Concepts:
Overfitting can be prevented by simplifying the model, using cross-validation, applying regularization, and early stopping to prevent overtraining
Several network architectures exist
\Rightarrow Successfully designing, training, and deploying ANNs often requires deep domain knowledge and experience in machine learning.
Advanced Modeling and Control