In quantitative finance, combining statistical filtering techniques with machine learning can provide robust insights into market dynamics. In this article, we explore two powerful tools—Neural Networks and the Kalman Filter—and show how they can be used together to predict the direction of asset price movements. We then outline a trading strategy that uses these predictions, backtests its performance, and compares it to a simple buy-and-hold approach.
Neural networks are a class of machine learning models inspired by biological neural structures. They consist of layers of interconnected nodes (neurons) that transform inputs into outputs through a series of linear and nonlinear operations.
A basic multilayer perceptron (MLP) can be mathematically described as follows:
Input Layer:
The network receives an input vector: \[\mathbf{x} = [x_1, x_2, \dots,
x_n]^T\]
Hidden Layers:
Each hidden layer performs a linear transformation followed by a
nonlinear activation function (e.g., ReLU or sigmoid). For layer \(l\): \[\mathbf{h}^{(l)} = \sigma \left( \mathbf{W}^{(l)}
\mathbf{h}^{(l-1)} + \mathbf{b}^{(l)} \right)\]
where:
Output Layer:
The final layer computes the output: \[\hat{\mathbf{y}} = \text{softmax} \left(
\mathbf{W}^{(L)} \mathbf{h}^{(L-1)} + \mathbf{b}^{(L)}
\right)\]
The softmax function is often used for classification tasks to convert raw scores into probabilities.
Training via Backpropagation:
The network parameters \(\{\mathbf{W}^{(l)},
\mathbf{b}^{(l)}\}\) are optimized by minimizing a loss function
\(L(\hat{\mathbf{y}}, \mathbf{y})\)
(e.g., cross-entropy for classification) using gradient descent: \[\theta \leftarrow \theta - \eta \frac{\partial
L}{\partial \theta}\]
where \(\theta\) represents all the parameters and \(\eta\) is the learning rate.
In our code, we use Python’s MLPClassifier from
scikit-learn, which implements a multilayer perceptron with hidden
layers (in our case, with sizes 32 and 16 neurons) to predict the
direction of asset price movements.
The Kalman filter is a recursive algorithm used for estimating the state of a dynamic system from noisy observations. It is especially useful in financial applications where price signals are noisy.
The filter works in two main steps: prediction and update.
Prediction Step:
Here, \(\mathbf{F}\) is the state transition model, and \(\mathbf{Q}\) is the process noise covariance.
Update Step:
In these equations, \(\mathbf{H}\) is the observation model, \(\mathbf{R}\) is the measurement noise covariance, and \(\mathbf{z}_k\) is the measurement at time \(k\).
In our code, we use a custom KalmanFilter class to
smooth the price series. The filter produces two outputs: a
smoothed price and an estimated rate of
change, which serve as features for the neural network.
The core idea is to predict the price direction for the next week (7 days) using the neural network. The target is defined as:
\[\text{direction} = \begin{cases} 1 & \text{if } \text{close}_{t+7} > \text{close}_t \\ -1 & \text{otherwise} \end{cases}\]
Long Position (+1):
If the model predicts a 1, the strategy goes long by buying at the
current close and selling 7 days later.
Short Position (-1):
If the model predicts -1, the strategy goes short by selling (or taking
a short position) at the current close and buying back 7 days
later.
No Trade (0):
If the model’s confidence is below a threshold (e.g., 80%), the signal
is set to 0, meaning no position is taken.
For backtesting:
The backtest aggregates these returns, computes cumulative performance (equity curve), and then compares the strategy to a buy-and-hold approach.
Below we break down key sections of the code, explaining how each component contributes to the overall strategy.
from binance.client import Client
import pandas as pd
import numpy as np
import ta
# Download price data from Binance
client = Client()
pair = 'ETHUSDC'
data = pd.DataFrame(client.get_historical_klines(pair, '1d', '1 year ago'))
data.columns = ['timestamp', 'open', 'high', 'low', 'close', 'volume',
'close_time', 'quote_asset_volume', 'trades',
'taker_buy_base', 'taker_buy_quote', 'ignore']
data['timestamp'] = pd.to_datetime(data['timestamp'], unit='ms')
ohlcv_columns = ['open', 'high', 'low', 'close', 'volume']
data[ohlcv_columns] = data[ohlcv_columns].astype(float)
data.set_index('timestamp', inplace=True)
# Shift data to avoid lookahead bias in indicator calculations
data = data.shift()Explanation:
.shift() function is used to avoid using current
day data for calculations that would normally be computed using past
data.from KalmanFilter import KalmanFilter
kf = KalmanFilter(delta_t=1, process_var=1e-7, measurement_var=1e-1)
data[['kalman_price', 'kalman_rate']] = kf.filter(data['close'])Explanation:
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from tqdm import tqdm
# Features used by the neural network (in this case, the Kalman outputs)
features = ['kalman_price', 'kalman_rate']
rolling_window = 30
nn_predictions = []
nn_probabilities = []
actuals = []
prediction_dates = []
# Set up scaler and MLP neural network
scaler = StandardScaler()
mlp = MLPClassifier(hidden_layer_sizes=(32, 16), max_iter=500, random_state=42)
# Rolling window loop: Train and predict
for i in tqdm(range(rolling_window, len(data) - 1)):
# Training data for past window
X_train = data[features].iloc[i - rolling_window:i]
y_train = data['direction'].iloc[i - rolling_window:i]
X_train_scaled = scaler.fit_transform(X_train)
# Train the network on the rolling window
mlp.fit(X_train_scaled, y_train)
# Predict for the next interval
X_next = data[features].iloc[i].values.reshape(1, -1)
X_next_scaled = scaler.transform(X_next)
nn_pred = mlp.predict(X_next_scaled)[0]
nn_prob = mlp.predict_proba(X_next_scaled)[0]
nn_predictions.append(nn_pred)
nn_probabilities.append(nn_prob)
# Save the actual direction and prediction time
actuals.append(data['direction'].iloc[i])
prediction_dates.append(data.index[i])Explanation:
StandardScaler to ensure that features are on the same
scale.# Use probabilities to adjust predictions
for i in range(len(nn_predictions)):
prob = nn_probabilities[i]
nn_predictions[i] = -1 if prob[0] > prob[1] else 1
# Only accept predictions with high confidence (>= 80%)
for i in range(len(nn_predictions)):
pred = nn_predictions[i]
confidence = nn_probabilities[i][0] if pred == -1 else nn_probabilities[i][1]
if confidence < 0.8:
nn_predictions[i] = 0
data['nn_predictions'] = 0
data.loc[prediction_dates, 'nn_predictions'] = nn_predictionsExplanation:
# Calculate 7-day returns for the base asset
data['7d_return'] = (data['close'].shift(-7) / data['close']) - 1
# Calculate strategy returns based on NN predictions
data['nn_7d_return'] = data['nn_predictions'] * data['7d_return']
# Filter rows with valid predictions and returns
predicted_data = data[(data['nn_predictions'] != 0) & (data['7d_return'].notna())]
# Print success rate of NN predictions
predictions = data[['nn_predictions', 'direction']][data['nn_predictions'] != 0]
success_rate = np.where(predictions['nn_predictions'] == predictions['direction'], 1, 0).mean() * 100
print("Neural Network Success Rate: {:.2f}%".format(success_rate))Explanation:
# Initialize an equity column and starting capital
data['nn_equity'] = np.nan
equity = 1.0 # Starting capital
i = 0
# Simulate non-overlapping trades (skip 8 days after each trade)
while i < len(data) - 7:
idx_entry = data.index[i]
data.at[idx_entry, 'nn_equity'] = equity
signal = data['nn_predictions'].iloc[i]
entry_price = data['close'].iloc[i]
exit_price = data['close'].iloc[i + 7]
if signal == 1:
trade_return = (exit_price - entry_price) / entry_price
elif signal == -1:
trade_return = (entry_price - exit_price) / entry_price
else:
trade_return = 0.0
equity *= (1 + trade_return)
idx_exit = data.index[i + 7]
data.at[idx_exit, 'nn_equity'] = equity
i += 8
# Fill missing equity values
data['nn_equity'].ffill(inplace=True)
data['nn_equity'].bfill(inplace=True)
# Convert equity to percent profit
data['nn_equity_pct'] = (data['nn_equity'] - 1.0) * 100Explanation:
# Buy and hold strategy: calculate daily returns and cumulative product
data['bh_return'] = data['close'].pct_change()
data['bh_equity'] = (1 + data['bh_return']).cumprod()
data['bh_equity_pct'] = (data['bh_equity'] - 1.0) * 100Explanation:
import matplotlib.pyplot as plt
plt.figure(figsize=(12,6))
plt.plot(data.index, data['bh_equity_pct'], label='Buy & Hold')
plt.plot(data.index, data['nn_equity_pct'], label='NN Strategy')
plt.title('Buy & Hold vs. NN Strategy (Percent Profit)')
plt.xlabel('Date')
plt.ylabel('Percent Profit (%)')
plt.legend()
plt.show()Explanation:
In this article, we explored how neural networks and the Kalman filter can be integrated into a trading strategy:
This framework is a starting point for further research and refinement. Future enhancements might include improved feature engineering, more sophisticated risk management, and alternative model architectures. As always, caution is advised when applying these techniques to live trading due to the challenges of market dynamics and overfitting.