Objectives
- Demonstrate the use of the System Identification Toolbox in MATLAB to generate and evaluate time series models.
- Develop and compare different time series models (AR, ARX, ARMAX) for forecasting process variables.
- Quantify the forecasting accuracy of the models under step and random inputs.
Problem Statement
A distillation column produces three products: bottom stream, side stream, and distillate. The bottom product impurity must be kept below 5%. If the impurity exceeds this limit, the product is rejected.
The major disturbance affecting the bottom product quality is feed composition fluctuations. Based on a plant step test, the transfer function relating feed composition to bottom impurity is:
Accurate short-term forecasting of impurity is essential to anticipate when the 5% specification may be violated. In this lab, time-series models such as AR, ARX, and ARMAX will be developed and compared for their ability to predict impurity trajectories from process data.
Methodology
- Data Generation
- Simulate the system response using MATLAB/Simulink with a sampling period of
unit, over a horizon of –80 units. - Apply the following step changes in feed composition:
- Up by 1 unit at
- Down by 2 units at
- Up by 1 unit at
- Up by 1 unit at
- Use data up to
as the training dataset for model identification.
- Use data beyond
as the validation dataset.
- Simulate the system response using MATLAB/Simulink with a sampling period of
- Model Development
- Using the System Identification Toolbox, develop three model structures:
- Autoregressive (AR)
- Autoregressive with Exogenous Input (ARX)
- Autoregressive Moving Average with Exogenous Input (ARMAX)
- Autoregressive (AR)
- Test different model orders and delays, guided by the known dead time of 2 units.
- Compare models using appropriate criteria.
- Using the System Identification Toolbox, develop three model structures:
- Forecasting (Step Input Case)
- Forecast impurity values at
.
- Compute prediction errors for each model.
- Identify the best model for forecasting accuracy.
- Forecast impurity values at
- Forecasting (Random Input Case)
- Generate a random excitation signal (PRBS or band-limited white noise) as the disturbance input.
- Use the previously identified models to forecast impurity at
.
- Compare model predictions and explain differences.
- Generate a random excitation signal (PRBS or band-limited white noise) as the disturbance input.
- Model Improvement
- Suggest at least two ways to improve accuracy, e.g., using longer training data, optimizing model order, or applying noise modeling.
Model Evaluation Metrics
When developing time-series models, several criteria are used to assess their quality. These can be grouped into three main categories:
1. Error-Based Measures (Prediction Accuracy)
These directly compare the predicted output
Mean Squared Error (MSE):
Where,
is the number of data points. This gives the average squared error between measured and predicted outputs. Smaller values indicate better accuracy but large errors are penalized heavily. A smaller MSE indicates higher prediction accuracy, though the value can be influenced by outliers.goodnessOfFit(yhat, y, 'MSE')
Normalized Mean Squared Error (NMSE):
Where
is the mean of the measured output This measure normalizes the squared error with respect to the variance of the measured data. An NMSE of zero means perfect prediction, while a value of one means the model performs no better than simply using the mean. Values greater than one imply the model is worse than the mean.goodnessOfFit(yhat, y, 'NMSE')
Normalized Root Mean Squared Error (NRMSE):
This expresses prediction error relative to the signal range, making it easy to compare across variables with different magnitudes. Lower NRMSE values indicate better predictive performance.
goodnessOfFit(yhat, y, 'NRMSE')
Fit Percentage:
This is the default metric reported by MATLAB when using the compare function. A value of 100% is a perfect fit, 0% is equivalent to predicting the mean, and negative values mean the model is worse than the mean.
compare(data, model) % or fit = goodnessOfFit(yhat, y, 'NRMSE') * 100;
2. Information Criteria (Model Structure Selection)
Even if two models fit the data well, they may differ in complexity. To avoid overfitting, information criteria are used:
Akaike Information Criterion (AIC):
where
is the number of estimated parameters and is the likelihood of the model given the data. AIC provides a trade-off between model accuracy and complexity, with lower values indicating models that fit well without unnecessary parameters.aicValue = aic(model);
Final Prediction Error (FPE):
where
is the variance of residuals, is the number of samples, and is the number of parameters. A lower FPE means the model is likely to predict future data more accurately.fpeValue = fpe(model);
3. Residual Analysis (Adequacy Check)
A good model should capture all the dynamics of the system, leaving only random noise in the residuals
- Zero-mean
- Uncorrelated with each other (no autocorrelation)
- Uncorrelated with past inputs (no leftover dynamics)
Residual whiteness tests, such as autocorrelation plots or the Ljung–Box test, are used to check this. If residuals are “white” (statistically independent noise), the model is considered adequate. If not, it suggests that some system dynamics remain unmodeled.
resid(data, model);
Report Format
Your report (5 pages maximum) should include the following:
Submission Details Include a brief table at the beginning of the report with the following information:
Lab Title: Lab 04 - Time Series Analysis Student Name ID Unit: CHEN4011 Student 1 12345678 Date: 12 August 2025 Student 2 87654321 Objective & Problem Statement
- Explain the importance of time series modeling in process control.
- Summarize the distillation problem and its disturbances.
- Methodology & Implementation
- Show the Simulink model used to generate the dataset.
- Describe steps for building AR, ARX, and ARMAX models.
- Explain the selection of model order and toolbox functions.
- Results
- Present plots of:
- Generated step input and output data.
- Predictions from AR, ARX, and ARMAX models.
- Generated step input and output data.
- Show forecasting at
(step case) and (random case).
- Include tables of prediction errors and accuracy measures.
- Analysis and Discussion
- Compare performance of AR, ARX, and ARMAX models.
- Discuss forecasting accuracy for both step and random inputs.
- Explain how model order and structure affect accuracy.
- Comment on the best-performing model for each case.
- Suggest improvements for accuracy and reliability.
- Conclusion
- Summarize key learnings about time series modeling.
- State which model type is most suitable for forecasting impurity in this system.
- Discuss broader applications of system identification in chemical process control.
Assessment Rubric (20 Marks Total)
No | Section | Marks | Evaluation basis |
---|---|---|---|
1. | Objectives & Problem | 2 | Clarity of problem definition; articulation of objectives |
2. | Methodology and Implementation | 4 | Correctness and clarity of Simulink model; explanation of AR/ARX/ARMAX modeling |
3. | Results | 4 | Quality, relevance, and labeling of plots; completeness of forecasting data |
4. | Analysis and Discussion | 6 | Insightful interpretation; comparison of models; comments on robustness |
5. | Conclusion and Presentation | 4 | Coherent summary; quality of writing, formatting, and visual presentation |
Citation
@online{utikar2023,
author = {Utikar, Ranjeet},
title = {Lab 04: {Time} {Series} {Analysis}},
date = {2023-08-13},
url = {https://amc.smilelab.dev/content/labs/lab-04/},
langid = {en}
}