Financial markets are complex systems, often exhibiting behavior that can seem random. However, underlying these fluctuations, assets can sometimes be observed to revert to an underlying “fair value.” Identifying this fair value and capitalizing on deviations from it is the essence of mean reversion trading. The Kalman Filter, a powerful mathematical tool, offers an elegant way to estimate this unobserved fair value and its dynamics.
This article explores a Python-based strategy that employs a Kalman Filter to model the fair value and slope (trend) of a financial asset, specifically the EUR/USD exchange rate. It then generates trading signals when the observed price deviates significantly from this estimated fair value, anticipating a reversion.
The Kalman Filter is a recursive algorithm that estimates the internal state of a dynamic system from a series of noisy measurements. In our context:
[trend, slope]
.
trend_t = trend_{t-1} + slope_{t-1}
slope_t = slope_{t-1}
The filter works in a predict-update cycle. It predicts the next state based on the current estimate and then updates this prediction using the new measurement. Key to its operation are the process noise covariance (Q), which represents the uncertainty in our state model (how much the trend and slope can change on their own), and the measurement noise variance (R), which represents the uncertainty in our price observations.
Before diving into the filter, we need to import necessary libraries,
define our parameters, and fetch the historical price data. The script
uses yfinance
to download data and pykalman
for the Kalman Filter implementation.
We’ll focus on the EUR/USD exchange rate (EURUSD=X
) from
the beginning of 2019 to the end of 2024. Critical parameters include
factors for determining the process noise (q_trend_factor
,
q_slope_factor
) relative to the estimated measurement
noise, and multipliers for setting entry and exit thresholds based on
the standard deviation of the residuals.
import yfinance as yf
import pandas as pd
import numpy as np
from pykalman import KalmanFilter
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import warnings
"ignore", category=UserWarning) # PyKalman can throw these
warnings.filterwarnings(
# --- Parameters ---
= "EURUSD=X"
ticker = "2019-01-01"
start_date = "2024-12-31"
end_date
# Kalman Filter Parameters
= 1e-5 # How much the trend can deviate (variance relative to R)
q_trend_factor = 1e-7 # How much the slope can change (variance relative to R)
q_slope_factor
# Trading Parameters
= 2.0 # k: Number of standard deviations for entry
entry_std_dev_multiplier = 0.5 # Revert closer to mean for exit
exit_std_dev_multiplier
# User preferences for yfinance
= False # As per user preference
yf_auto_adjust
print(f"--- Strategy: Kalman Filter Fair-Value Reversion ---")
print(f"Asset: {ticker}")
print(f"Period: {start_date} to {end_date}")
print("-----------------------------------------------------\n")
# --- 1. Download Data ---
print("--- 1. Downloading Data ---")
# Using user preferences for yfinance download
= yf.download(ticker, start=start_date, end=end_date, auto_adjust=yf_auto_adjust)
df if isinstance(df.columns, pd.MultiIndex): # Check if columns are MultiIndex
= df.droplevel(1, axis=1) # Drop the lower level of column index if it exists
df
if 'Close' not in df.columns:
raise ValueError("Close column not found.")
= df.copy()
df_analysis print(f"Data downloaded. Shape: {df_analysis.shape}")
print(df_analysis.head(3))
print("-----------------------------------------------------\n")
This snippet sets up our environment and downloads the necessary
Close
prices for EUR/USD. Note the
auto_adjust=False
and the subsequent droplevel
call for the yfinance
download, aligning with specific data
handling preferences.
With the data in hand, we initialize and run the Kalman Filter. The
measurement noise variance R
is estimated from the variance
of daily price differences. The process noise covariance Q
is then set relative to this R
, using our predefined
factors.
The transition_matrix_F
defines how the state (fair
value and slope) evolves, and the observation_matrix_H
links the state to the observed price.
# --- 2. Initialize and Run Kalman Filter ---
print("--- 2. Initializing and Running Kalman Filter ---")
= df_analysis['Close'].values
observed_prices
# Estimate Measurement Noise Variance (R)
= np.var(np.diff(observed_prices))
measurement_noise_R_variance print(f"Estimated Measurement Noise Variance (R): {measurement_noise_R_variance:.4e}")
# State Transition Matrix (F)
= [[1, 1], [0, 1]]
transition_matrix_F # Observation Matrix (H)
= [[1, 0]]
observation_matrix_H
# Process Noise Covariance (Q)
= measurement_noise_R_variance * q_trend_factor
process_noise_Q_trend_var = measurement_noise_R_variance * q_slope_factor
process_noise_Q_slope_var = np.diag([process_noise_Q_trend_var, process_noise_Q_slope_var])
transition_covariance_Q
# Initial state
= [observed_prices[0], 0]
initial_state_mean = [[measurement_noise_R_variance, 0], [0, measurement_noise_R_variance * 1e-2]]
initial_state_covariance
= KalmanFilter(
kf =transition_matrix_F,
transition_matrices=observation_matrix_H,
observation_matrices=transition_covariance_Q,
transition_covariance=measurement_noise_R_variance,
observation_covariance=initial_state_mean,
initial_state_mean=initial_state_covariance
initial_state_covariance
)
print("Running Kalman Filter...")
= kf.filter(observed_prices)
(filtered_state_means, filtered_state_covariances)
'Estimated_Fair_Value'] = filtered_state_means[:, 0]
df_analysis['Residual'] = df_analysis['Close'] - df_analysis['Estimated_Fair_Value']
df_analysis[# Std dev of the error in the trend estimate
'Fair_Value_Error_Std'] = np.sqrt(filtered_state_covariances[:, 0, 0])
df_analysis[
print("\nKalman Filter estimates generated (Tail):")
print(df_analysis[['Close', 'Estimated_Fair_Value', 'Residual', 'Fair_Value_Error_Std']].tail())
print("-----------------------------------------------------\n")
After running the filter, df_analysis
will contain the
Estimated_Fair_Value
and the Residual
(the
difference between the closing price and this fair value).
Fair_Value_Error_Std
gives us an idea of the filter’s
uncertainty about its fair value estimate, which is crucial for setting
our trading bands.
Trading signals are generated based on how far the observed price
(via the residual) deviates from the estimated fair value. We use the
Fair_Value_Error_Std
from the Kalman Filter’s output to
define dynamic entry and exit thresholds.
-entry_std_dev_multiplier * Fair_Value_Error_Std
), we go
long (buy), expecting it to rise. If it’s significantly above (residual
is very positive), we go short (sell), expecting it to fall.-exit_std_dev_multiplier * Fair_Value_Error_Std
).
Similarly, we exit a short position if the price reverts sufficiently
downwards.The script uses lagged residuals and thresholds to ensure decisions are made on data available at the end of the previous day.
# --- 3. Generate Trading Signals ---
print("--- 3. Generating Trading Signals ---")
'Position'] = 0 # -1 for Short, 0 for Cash, 1 for Long
df_analysis[
'Upper_Threshold'] = entry_std_dev_multiplier * df_analysis['Fair_Value_Error_Std']
df_analysis['Lower_Threshold'] = -entry_std_dev_multiplier * df_analysis['Fair_Value_Error_Std']
df_analysis['Exit_Upper_Threshold'] = exit_std_dev_multiplier * df_analysis['Fair_Value_Error_Std']
df_analysis['Exit_Lower_Threshold'] = -exit_std_dev_multiplier * df_analysis['Fair_Value_Error_Std']
df_analysis[
# Use lagged residuals and thresholds for signal on current day
'Lagged_Residual'] = df_analysis['Residual'].shift(1)
df_analysis['Lagged_Upper_Threshold'] = df_analysis['Upper_Threshold'].shift(1)
df_analysis['Lagged_Lower_Threshold'] = df_analysis['Lower_Threshold'].shift(1)
df_analysis['Lagged_Exit_Upper_Threshold'] = df_analysis['Exit_Upper_Threshold'].shift(1)
df_analysis['Lagged_Exit_Lower_Threshold'] = df_analysis['Exit_Lower_Threshold'].shift(1)
df_analysis[
for i in range(1, len(df_analysis)):
= df_analysis.index[i]
current_idx = df_analysis.index[i-1]
prev_idx = df_analysis.loc[prev_idx, 'Position']
current_position
= df_analysis.loc[current_idx, 'Lagged_Residual']
residual = df_analysis.loc[current_idx, 'Lagged_Upper_Threshold']
upper_entry = df_analysis.loc[current_idx, 'Lagged_Lower_Threshold']
lower_entry = df_analysis.loc[current_idx, 'Lagged_Exit_Upper_Threshold']
upper_exit = df_analysis.loc[current_idx, 'Lagged_Exit_Lower_Threshold']
lower_exit
'Position'] = current_position # Hold by default
df_analysis.loc[current_idx,
if pd.notna(residual) and pd.notna(upper_entry): # Ensure data is available
if current_position == 0: # If flat, check for entry
if residual < lower_entry:
'Position'] = 1 # Enter Long
df_analysis.loc[current_idx, elif residual > upper_entry:
'Position'] = -1 # Enter Short
df_analysis.loc[current_idx, elif current_position == 1: # If long, check for exit
if residual >= lower_exit:
'Position'] = 0
df_analysis.loc[current_idx, elif current_position == -1: # If short, check for exit
if residual <= upper_exit:
'Position'] = 0
df_analysis.loc[current_idx,
'Signal'] = df_analysis['Position']
df_analysis[print("Trading Signals generated (Tail):")
print(df_analysis[['Close', 'Estimated_Fair_Value', 'Residual', 'Signal']].tail(10))
print("-----------------------------------------------------\n")
This logic iterates through the data, updating the
Position
column based on the mean reversion rules.
After generating signals, the script calculates daily strategy returns by multiplying the signal (position: +1 for long, -1 for short, 0 for cash) by the asset’s daily percentage change. These are then compounded to get cumulative returns. Performance metrics like annualized return, volatility, and Sharpe ratio are calculated for both the strategy and a simple buy-and-hold approach.
Finally, a series of plots help visualize:
These visualizations are crucial for understanding how the strategy behaves and whether it offers an advantage over simply holding the asset.
The Kalman Filter provides a sophisticated framework for estimating an asset’s underlying fair value in the face of noisy market data. By modeling this fair value and its trend, a mean reversion strategy can be built to capitalize on perceived mispricings. The provided Python script demonstrates a complete workflow, from data acquisition and filter application to signal generation and performance evaluation.
However, it’s important to remember that the success of such a
strategy heavily depends on the correct parameterization of the Kalman
Filter (especially the Q
and R
noise
covariances) and the trading thresholds. These often require careful
tuning and backtesting across various market conditions and assets. This
approach is a powerful tool in the quantitative trader’s arsenal but,
like all models, it’s an approximation of reality and should be used
with a thorough understanding of its assumptions and limitations.