Performance of Machine Learning Advanced Techniques in Statistical Arbitrag
Abstract
The thesis deals with machine learning-based algorithmic trading in currency markets. It addresses the financial machine learning optimisation recommendations of Marcos López De Prado (2018), emphasising the two main areas of improvement in machine learning by feature selection and meta-labelling. The study extends to the statistical arbitrage strategy using machine learning. By applying these techniques to statistical arbitrage, the study aims to identify and mitigate overfitting biases that commonly lead to algorithmic trading failures.
The methodology employs a comprehensive framework with a novel approach to currency pair selection using dimensionality reduction, clustering techniques, and cointegration testing. Using data from 82 currency pairs across G7, Major Cross, and Minor Cross categories from January 2019 to December 2023, the research implements Clustered Feature Importance (CFI) to optimise feature selection. Primary machine learning models (Logistic Regression, Random Forest, and Gradient Boosting) are then enhanced through meta-labelling to improve trading signal performance.
Empirical results demonstrate significant performance improvements across the five selected currency pairs (EURNOK/DKKZAR, EURPLN/DKKPLN, SEKNOK/SEKZAR, EURSGD/DKKSGD, and NZDCHF/USDZAR), with meta-labelled models showing improved risk-adjusted returns (Sharpe ratios increasing to 1.86 for the EURNOK/DKKZAR pair), substantial volatility reduction, and enhanced precision in trading signals (40.27% improvement). The framework proves particularly effective for Nordic and European currency pairs while maintaining stability across various market conditions.
The findings validate De Prado's recommendations when applied to statistical arbitrage in currency markets, offering theoretical contributions to financial machine learning and practical implications for quantitative trading strategies. This research provides valuable insights for portfolio managers and algorithmic traders seeking to improve performance through advanced machine learning techniques while addressing the challenges of overfitting and false discovery in trading model development.
Key Words: Meta-Labelling, Feature Selection, Machine Learning, Statistical Arbitrage, Currency, Mean-Reverting , Portfolio Construction