20 Days of Exploration: What I’ve Been Up To

explanation
Stepping into the world of algorithmic trading
Author

Tris

Published

September 26, 2024

What Have I Been Up To?

Since my first post, I have to admit—things didn’t go exactly as planned. Initially, I intended to steadily work through tutorials, books, and courses, documenting every step of the way here on this blog. Well, I kept up with the learning, but the documenting… not so much!

Reinforcement Learning

My primary goal was to finish the first three chapters of Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto. For those who aren’t familiar, it’s a foundational book in the field—a go-to resource for understanding reinforcement learning (RL) concepts. So, I started there, and it has been eye-opening.

Chapter 1: An Overview of Reinforcement Learning

The first chapter sets the foundation by introducing what RL is and where it fits in the larger context of machine learning. It covers key terminology and ideas, such as:

  • The Concept of Agents: At its core, RL is about agents taking actions in an environment to maximise some notion of cumulative reward.
  • Value and Reward Functions: These are crucial concepts in RL. Value functions help an agent determine the “value” of a state (or state-action pair), while reward functions represent the feedback from the environment based on the agent’s actions.
  • History and Context: The chapter also touches upon the history of RL, tying it back to the roots of AI, psychology, and neuroscience, showing how RL is not just a computational concept but inspired by how living beings learn to interact with the world.

Chapter 2: Multi-Armed Bandits

This chapter dives into one of the simplest forms of reinforcement learning problems—the multi-armed bandit. If you’ve never heard of a multi-armed bandit, imagine walking into a casino with slot machines (or “bandits”), each with different probabilities of paying out. Your goal is to figure out which machine to play to maximise returns, balancing exploration (trying out different machines) with exploitation (sticking to the best-performing machine). Techniques covered include:

  • Epsilon-Greedy Strategy: A straightforward approach that allows for both exploitation of known rewards and exploration of potentially better options. With a small probability, you pick a random option; otherwise, you go with the current best.
  • Upper Confidence Bound (UCB): A more sophisticated method that considers uncertainty in estimates and prioritises options that could yield higher rewards while balancing exploration and exploitation.
  • Gradient Bandit Algorithms: Instead of estimating action values, this approach focuses on learning preferences over actions and updating them in a way that encourages the agent to try higher-reward options more frequently.

Chapter 3: Markov Decision Processes (MDPs)

Chapter 3 builds on foundational concepts to introduce Markov Decision Processes (MDPs), which form the backbone of most RL problems. An MDP provides a mathematical framework for modelling decision-making in situations where outcomes are partly random and partly under the agent’s control. Key ideas include:

  • The Agent-Environment Interface: The MDP framework views learning as a loop where an agent takes actions that influence its state within an environment, and the environment provides feedback via rewards.
  • Goals and Rewards: The chapter emphasises the importance of designing appropriate reward signals. A reward isn’t just about getting the agent to learn to achieve a goal; it’s also about ensuring it learns to achieve it in the “right” way. Proper reward design plays a significant role in guiding the agent’s learning and behaviour.

Diving into Algorithmic Trading

After reading through the first three chapters above, I found myself at a crossroads—deciding what to study next. Reinforcement learning was (and still is) fascinating, but I wanted to apply some of the concepts more practically. One of my main goals is to build an algorithmic trading system, and I realised that focusing purely on theory before diving in wasn’t the most effective approach. There’s a saying that you learn best by doing, and I wanted to embrace that.

Rather than waiting to learn everything I “needed” before getting started, I decided to just jump in, get my hands dirty, and start building. The theoretical learning will always be there, but there’s nothing like a hands-on project to test, apply, and deepen your understanding.

Building My Own Algorithmic Trading System: A Journey So Far

I had previously built a basic framework for a trading system, so I decided to start improving on that. It wasn’t the most polished code, but it was a good place to start. The system essentially:

  • Loaded 1-Minute Stock Price Data into Polars: Polars is a data manipulation library similar to Pandas but written in Rust, which makes it extremely fast for handling large datasets.
  • Transformed the Data: Within Polars, I manipulated stock data to create signals for a trading strategy—adding indicators, defining conditions for entry and exit, and so on.
  • Exported Data to Numpy Arrays: To efficiently simulate trades, I exported data to Numpy arrays and looped through each row, as if the trades were happening live.
  • Output and Visualisation: Finally, I collected trades and outcomes in a DataFrame for analysis and visualisation.

Why Polars?

The choice to use Polars was strategic. One of its standout features is lazy evaluation, which allows it to optimise processing. For example, say you load 20 years of stock price data, manipulate it, and only need the last year for analysis. Polars will evaluate these operations at the last possible moment, potentially avoiding the need to load the first 19 years into memory. It also optimises operations across multiple threads, offering significant performance boosts when working with large datasets—essential for backtesting trading strategies over extensive periods.

The Learning Curve: Insights and Challenges

Refactoring for Modularity

The first step was to clean up the code for better organisation and modularity. Initially, much of the backtesting logic was intertwined with strategy-specific details, so I aimed to decouple these. Now, each strategy is a contained function, which can be passed as an argument to a generalised backtesting function. This separation allowed me to:

  • Isolate Strategy Implementation: Define the indicators, entry and exit rules, take profit (TP), and stop loss (SL) within each strategy.
  • Generalise Backtesting Logic: Simulate trades based on any strategy provided, improving code reusability and ensuring consistency for both backtesting and future live trading.

Backtesting with 1-Minute Data

After restructuring the code, I backtested a simple strategy on the last 10 years of 1-minute data from the NDX. Initially, each backtest took around 30 seconds to complete—a significant delay when running multiple tests. My goal was to reduce this to as close to 1 second as possible.

The backtesting process was:

  • Load the Data: Import stock price data at 1-minute intervals.
  • Run Strategy through Backtest: Feed the strategy function into the backtesting function, along with the data and any relevant parameters.
  • Strategy Logic: Calculate indicators, such as a 30-period exponential moving average (EMA) on the 5-minute timeframe, and define entry/exit rules (e.g., entry signal if the close price is above EMA, exit if it drops below).
  • TP/SL Columns: Define take profit and stop loss levels (e.g., TP at 5% above close price, SL at 5% below), and output a DataFrame with the original data and added columns for indicators, signals, and TP/SL levels.

Balancing Vectorisation and Looping

Initially, I tried a vectorised approach to find the next entry, exit, or TP/SL hit, isolating each trade’s range. If a TP/SL was hit before the exit signal, the trade would close; if not, it would close at the exit. This approach worked partially but struggled when accounting for dependent trade sequences. For example, a trade entered at 8am might exit at 9am due to hitting a stop loss, but another entry signal at 9:05am might trigger a re-entry.

Eventually, I adopted a hybrid approach: exporting all columns from Polars to Numpy arrays and iterating through each row to simulate trades sequentially. Using Numba—a high-performance Python compiler—I could speed up this process dramatically, reducing the backtesting time to just under 1 second.

  • Strategy Manipulation & Numpy Export: About 0.6 seconds.
  • Looping through Data in Numba: A mere 0.002 seconds per backtest.

By carefully optimising data processing and the timing of Polars evaluations, I managed to achieve my goal, making it possible to iterate quickly on strategy ideas.

Looking Ahead: Diving Deeper into Quantitative Trading

While I was able to backtest a single strategy in under a second, I started wondering: what if I want to test a strategy on thousands of stocks at once? Simply repeating this process for 4,000 stocks could take over an hour to complete.

I began exploring whether I could load all the data into one massive DataFrame or break it into chunks for efficient processing. The first step was creating 4,000 stock data files. I took a single file with 20 years of 1-minute AAPL stock data (around 60MB as a Parquet file), randomly altered values, and wrote new files. However, I quickly realised I didn’t have enough local storage. Writing these files to an external USB3 SSD was significantly slower, raising questions about the feasibility of reading such large volumes of data quickly.

The scale—over 250GB—also brought into question how I’d handle the memory limitations of my machine (a modest 8GB of RAM). Around this time, I stumbled upon Python for Algorithmic Trading Cookbook, which discusses using ArcticDB, SQLite, and HDF5 for storing and accessing large datasets. This made me realise that my current approach of using Parquet files and Polars might not be scalable.

Exploring New Storage Approaches

My current focus is on researching these alternative storage solutions and their implications for my system. Before investing too heavily into one method, I want to explore various options to find the best fit for efficient backtesting and live trading. There’s a lot to learn, and I’m excited to dive in and share what I discover.

Embracing the Learning Process

The past 20 days have been a whirlwind of learning, coding, and adapting to new challenges. Though my journey has diverged from my initial plan, the hands-on experience has deepened my understanding of algorithmic trading, data manipulation, and backtesting optimisation. From exploring the inner workings of Polars to speeding up backtesting with Numba, I’ve discovered the importance of diving into practical challenges headfirst.

The next phase is to dive deeper into scalable data storage, refine backtesting across larger datasets, and ensure my approach is robust enough for live trading. I’m excited to share this evolving journey with you, and if you have any thoughts, questions, or tips based on your own experience, please reach out—I’d love to learn together!

Thank you for reading, you can find me on X, and stay tuned for more updates!