← Back to Home
RandomForest-Enhanced Parabolic SAR Strategy A Backtrader Implementation

RandomForest-Enhanced Parabolic SAR Strategy A Backtrader Implementation

The Parabolic SAR (Stop and Reverse) indicator is a popular tool for identifying trend direction and potential reversal points. It provides clear stop-loss and entry signals, often appearing as a series of dots above or below the price bars. When the dots flip from one side to the other, it signals a potential change in trend. While simple and effective in strong trends, Parabolic SAR can generate whipsaws in choppy or ranging markets.

To mitigate false signals and improve the strategy’s robustness, we can integrate a Machine Learning (ML) model, specifically a RandomForestClassifier. This article will explore a RandomForest-Enhanced Parabolic SAR trading strategy, optimized for short-term (3-month) data, implemented using the Backtrader framework. The RandomForest model will act as an intelligent filter, validating SAR signals based on a broader context of market features.

1. Setting Up the Environment: Libraries and Configuration

First, we need to import the necessary Python libraries for data handling, backtesting, machine learning, and plotting.

import backtrader as bt
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore")

plt.rcParams['figure.figsize'] = (12, 8)

2. The RandomForestEnhancedParabolicSARStrategy Class

This class defines our complete trading strategy, encompassing indicator calculations, machine learning integration, and trade execution logic within the Backtrader ecosystem.

class RandomForestEnhancedParabolicSARStrategy(bt.Strategy):
    """
    RandomForest-Enhanced Parabolic SAR Strategy optimized for 3-month data
    """
    params = (
        # EXACT original parameters
        ('af', 0.02),            # Acceleration factor
        ('afmax', 0.1),          # Maximum acceleration factor
        ('rsi_period', 14),      # RSI period for momentum
        ('rsi_overbought', 70),  # RSI overbought level
        ('rsi_oversold', 30),    # RSI oversold level
        ('stop_loss_pct', 0.02), # stop loss
        # RandomForest parameters only
        ('rf_threshold', 0.65),  # RF confidence threshold for 3-month data
        ('retrain_frequency', 25), # Retrain every 25 bars for 3-month data
    )

2.1. Strategy Parameters (params)

The params tuple makes the strategy highly configurable:


2.2. Initialization (__init__)

The __init__ method sets up all the indicators and internal variables needed for the strategy.

    def __init__(self):
        # EXACT original Parabolic SAR
        self.psar = bt.indicators.ParabolicSAR(
            af=self.params.af, 
            afmax=self.params.afmax
        )
        
        # EXACT original momentum confirmation with RSI
        self.rsi = bt.indicators.RSI(period=self.params.rsi_period)
        
        # EXACT original SAR position signals
        self.sar_long = self.data.close > self.psar  # Price above SAR = bullish
        self.sar_short = self.data.close < self.psar # Price below SAR = bearish
        
        # EXACT original SAR direction changes
        self.sar_signal = bt.indicators.CrossOver(self.sar_long, 0.5)
        
        # EXACT original order tracking
        self.order = None
        self.stop_order = None
        
        # Minimal additional indicators for RF features (optimized for 3-month data)
        self.sma = bt.indicators.SMA(period=10)      # Fast SMA
        self.ema = bt.indicators.EMA(period=12)      # Fast EMA
        self.atr = bt.indicators.ATR(period=7)       # Fast ATR
        self.volume_ma = bt.indicators.SMA(self.data.volume, period=10)
        self.momentum = bt.indicators.Momentum(period=5)
        self.bb = bt.indicators.BollingerBands(period=15) # Fast BB
        
        # RandomForest components (optimized for 3-month data)
        self.rf_model = None
        self.scaler = StandardScaler()
        self.feature_buffer = []
        self.label_buffer = []
        self.last_retrain = 0
        self.rf_ready = False
        
        # Performance tracking
        self.total_signals = 0
        self.rf_filtered_signals = 0

2.3. Machine Learning Integration

This section defines how we prepare data for the RandomForest model, train it, and use its predictions to filter trade signals.

calculate_features(): Data for the RandomForest Model

    def calculate_features(self):
        """Calculate features optimized for Parabolic SAR + RandomForest + 3-month data"""
        if (len(self.psar) < 10 or
            len(self.rsi) < self.params.rsi_period):
            return None
        
        try:
            features = []
            
            # Core Parabolic SAR features
            sar_distance = (self.data.close[0] - self.psar[0]) / self.data.close[0] if self.data.close[0] > 0 else 0
            sar_distance_normalized = sar_distance / (self.atr[0] / self.data.close[0]) if self.atr[0] > 0 else 0
            price_above_sar = 1 if self.data.close[0] > self.psar[0] else 0
            price_below_sar = 1 if self.data.close[0] < self.psar[0] else 0
            features.extend([sar_distance, sar_distance_normalized, price_above_sar, price_below_sar])
            
            # SAR signal and momentum
            sar_signal_strength = self.sar_signal[0] if not np.isnan(self.sar_signal[0]) else 0
            sar_trend_consistency = 1 if (
                len(self.sar_long) > 3 and 
                ((self.sar_long[0] and self.sar_long[-1] and self.sar_long[-2]) or
                 (not self.sar_long[0] and not self.sar_long[-1] and not self.sar_long[-2]))
            ) else 0
            features.extend([sar_signal_strength, sar_trend_consistency])
            
            # RSI momentum features
            rsi_norm = self.rsi[0] / 100 if not np.isnan(self.rsi[0]) else 0.5
            rsi_overbought = 1 if self.rsi[0] > self.params.rsi_overbought else 0
            rsi_oversold = 1 if self.rsi[0] < self.params.rsi_oversold else 0
            rsi_momentum = (self.rsi[0] - self.rsi[-3]) / 100 if len(self.rsi) > 3 else 0
            rsi_in_range = 1 if self.params.rsi_oversold < self.rsi[0] < self.params.rsi_overbought else 0
            features.extend([rsi_norm, rsi_overbought, rsi_oversold, rsi_momentum, rsi_in_range])
            
            # SAR-RSI signal combinations (original strategy logic)
            sar_bullish_confirmed = 1 if (self.sar_signal[0] > 0 and self.rsi[0] < self.params.rsi_overbought) else 0
            sar_bearish_confirmed = 1 if (self.sar_signal[0] < 0 and self.rsi[0] > self.params.rsi_oversold) else 0
            features.extend([sar_bullish_confirmed, sar_bearish_confirmed])
            
            # Price momentum and trend
            price_change_1 = (self.data.close[0] - self.data.close[-1]) / self.data.close[-1] if len(self.data) > 1 else 0
            price_change_3 = (self.data.close[0] - self.data.close[-3]) / self.data.close[-3] if len(self.data) > 3 else 0
            momentum_norm = self.momentum[0] / self.data.close[0] if self.data.close[0] > 0 else 0
            features.extend([price_change_1, price_change_3, momentum_norm])
            
            # Trend confirmation indicators
            sma_distance = (self.data.close[0] - self.sma[0]) / self.sma[0] if self.sma[0] > 0 else 0
            ema_distance = (self.data.close[0] - self.ema[0]) / self.ema[0] if self.ema[0] > 0 else 0
            sma_ema_alignment = 1 if ((self.data.close[0] > self.sma[0] and self.data.close[0] > self.ema[0]) or
                                      (self.data.close[0] < self.sma[0] and self.data.close[0] < self.ema[0])) else 0
            features.extend([sma_distance, ema_distance, sma_ema_alignment])
            
            # Volatility context
            atr_norm = self.atr[0] / self.data.close[0] if self.data.close[0] > 0 else 0
            bb_position = (self.data.close[0] - self.bb.mid[0]) / (self.bb.top[0] - self.bb.bot[0]) if (self.bb.top[0] - self.bb.bot[0]) > 0 else 0.5
            bb_width = (self.bb.top[0] - self.bb.bot[0]) / self.bb.mid[0] if self.bb.mid[0] > 0 else 0
            features.extend([atr_norm, bb_position, bb_width])
            
            # Volume confirmation
            volume_ratio = self.data.volume[0] / self.volume_ma[0] if self.volume_ma[0] > 0 else 1.0
            volume_change = (self.data.volume[0] - self.data.volume[-1]) / self.data.volume[-1] if len(self.data) > 1 and self.data.volume[-1] > 0 else 0
            features.extend([volume_ratio, volume_change])
            
            # SAR effectiveness context
            sar_effectiveness = abs(sar_distance) * (1 if sar_trend_consistency else 0.5)
            features.append(sar_effectiveness)
            
            # Market condition features
            trending_market = 1 if (abs(sma_distance) > 0.01 and sma_ema_alignment) else 0
            features.append(trending_market)
            
            # Signal quality indicators
            signal_quality = sar_signal_strength * rsi_in_range * trending_market
            features.append(signal_quality)
            
            # Price action features
            high_low_ratio = (self.data.high[0] - self.data.low[0]) / self.data.close[0] if self.data.close[0] > 0 else 0
            close_position = (self.data.close[0] - self.data.low[0]) / (self.data.high[0] - self.data.low[0]) if (self.data.high[0] - self.data.low[0]) > 0 else 0.5
            features.extend([high_low_ratio, close_position])
            
            # Clean features
            features = [0 if np.isnan(x) or np.isinf(x) else x for x in features]
            return np.array(features)
            
        except:
            return None

This comprehensive method generates a rich set of numerical features for the RandomForest model based on the current bar’s data and indicator values. These features are designed to capture various aspects of market behavior relevant to SAR signals and trend confirmation. They include:

calculate_target_label(): What the RandomForest Predicts

    def calculate_target_label(self):
        """Calculate target for RF training - next 3 bars return for 3-month data"""
        if len(self.data) < 5:
            return 0
        
        try:
            # Use next 3 bars return for 3-month data
            future_return = (self.data.close[-3] - self.data.close[0]) / self.data.close[0]
            return future_return
        except:
            return 0

The target label for our RandomForest model is the percentage return of the closing price three bars into the future. This allows the model to learn patterns that precede significant price movements (up or down).

train_random_forest(): Training the RandomForest Model

    def train_random_forest(self):
        """Train RandomForest model for 3-month data"""
        if len(self.feature_buffer) < 25:  # Minimum samples for 3-month data
            return False
        
        try:
            X = np.array(self.feature_buffer)
            y = np.array(self.label_buffer)
            
            # Remove invalid data
            valid_mask = np.isfinite(X).all(axis=1) & np.isfinite(y)
            X, y = X[valid_mask], y[valid_mask]
            
            if len(X) < 20:
                return False
            
            # Binary classification: top 50% returns are good for 3-month data
            threshold = np.percentile(np.abs(y), 50)
            y_binary = (np.abs(y) > threshold).astype(int)
            
            if len(np.unique(y_binary)) < 2:
                return False
            
            # Simple split for small datasets
            if len(X) < 35:
                X_train, X_test, y_train, y_test = X, X, y_binary, y_binary
            else:
                X_train, X_test, y_train, y_test = train_test_split(
                    X, y_binary, test_size=0.3, random_state=42
                )
            
            # Scale features
            X_train_scaled = self.scaler.fit_transform(X_train)
            if len(X_test) > 0:
                X_test_scaled = self.scaler.transform(X_test)
            
            # RandomForest optimized for Parabolic SAR and 3-month data
            self.rf_model = RandomForestClassifier(
                n_estimators=40,            # Moderate trees for SAR signals
                max_depth=5,                # Moderate depth for small datasets
                min_samples_split=5,        # Prevent overfitting
                min_samples_leaf=2,         # Ensure leaf nodes have enough samples
                max_features='sqrt',        # Square root of features
                random_state=42,
                class_weight='balanced',    # Handle class imbalance
                bootstrap=True,             # Bootstrap sampling
                oob_score=True              # Out-of-bag score for validation
            )
            
            self.rf_model.fit(X_train_scaled, y_train)
            
            # Evaluate if we have test data
            if len(X_test) > 0:
                accuracy = accuracy_score(y_test, self.rf_model.predict(X_test_scaled))
            else:
                accuracy = accuracy_score(y_train, self.rf_model.predict(X_train_scaled))
            
            self.rf_ready = True
            print(f"RandomForest trained - Accuracy: {accuracy:.3f}, OOB Score: {self.rf_model.oob_score_:.3f}, Samples: {len(X)}")
            return True
            
        except Exception as e:
            print(f"RandomForest training failed: {e}")
            return False

This method is responsible for training the RandomForestClassifier:

get_rf_confidence(): Getting a Prediction from RandomForest

    def get_rf_confidence(self, features):
        """Get RandomForest prediction confidence"""
        if not self.rf_ready or self.rf_model is None or features is None:
            return 0.5  # Neutral when RF not ready
        
        try:
            features_scaled = self.scaler.transform(features.reshape(1, -1))
            proba = self.rf_model.predict_proba(features_scaled)[0]
            return proba[1] if len(proba) > 1 else 0.5
        except:
            return 0.5

This method queries the trained RandomForest model for its prediction confidence. It takes the current bar’s features, scales them using the fitted scaler, and then calls predict_proba. This returns the probability of the current market state belonging to class 1 (a “good” signal with a significant future move). This probability serves as our RandomForest confidence score.


2.4. Order Management (notify_order)

    def notify_order(self, order):
        """EXACT original notify_order function"""
        if order.status in [order.Completed]:
            if order.isbuy() and self.position.size > 0:
                stop_price = order.executed.price * (1 - self.params.stop_loss_pct)
                self.stop_order = self.sell(exectype=bt.Order.Stop, price=stop_price)
            elif order.issell() and self.position.size < 0:
                stop_price = order.executed.price * (1 + self.params.stop_loss_pct)
                self.stop_order = self.buy(exectype=bt.Order.Stop, price=stop_price)
        
        if order.status in [order.Completed, order.Canceled, order.Rejected]:
            self.order = None
            if order == self.stop_order:
                self.stop_order = None

This backtrader callback is triggered when an order’s status changes. It’s crucial for:


2.5. The next() Method: The Strategy’s Brain

The next() method is the core logic loop, executed for each new bar of data. This is where the strategy evaluates indicators, performs ML filtering, and makes trading decisions.

    def next(self):
        # Collect features for RF training (minimal overhead)
        features = self.calculate_features()
        if features is not None and len(self.data) > 20:
            target = self.calculate_target_label()
            self.feature_buffer.append(features)
            self.label_buffer.append(target)
            
            # Keep buffer small for 3-month data
            if len(self.feature_buffer) > 70:
                self.feature_buffer = self.feature_buffer[-50:]
                self.label_buffer = self.label_buffer[-50:]
        
        # Retrain RF frequently for 3-month data
        if len(self.data) - self.last_retrain >= self.params.retrain_frequency:
            if self.train_random_forest():
                self.last_retrain = len(self.data)
        
        # EXACT original logic with RF enhancement
        if self.order is not None:
            return
        
        # Get RF confidence
        rf_confidence = self.get_rf_confidence(features)
        
        # EXACT original SAR breakout signals with RF enhancement
        if self.sar_signal > 0:  # SAR turns bullish (price crosses above SAR)
            self.total_signals += 1
            # EXACT original RSI confirmation with RF filter
            if self.rsi < self.params.rsi_overbought:
                # RF ENHANCEMENT: Add RF filter
                if not self.rf_ready or rf_confidence > self.params.rf_threshold:
                    if self.position.size < 0:  # Close short
                        if self.stop_order is not None:
                            self.cancel(self.stop_order)
                        self.order = self.close()
                        print(f"SAR BULLISH: Closing short at {self.data.close[0]:.2f} (RF: {rf_confidence:.3f})")
                    elif not self.position:  # Go long
                        self.order = self.buy()
                        print(f"SAR BULLISH: Going long at {self.data.close[0]:.2f} (RF: {rf_confidence:.3f})")
                else:
                    self.rf_filtered_signals += 1
                    print(f"SAR BULLISH signal filtered - RF confidence {rf_confidence:.3f} < {self.params.rf_threshold}")
            else:
                print(f"SAR BULLISH signal rejected - RSI overbought: {self.rsi[0]:.1f}")
                    
        elif self.sar_signal < 0:  # SAR turns bearish (price crosses below SAR)
            self.total_signals += 1
            # EXACT original RSI confirmation with RF filter
            if self.rsi > self.params.rsi_oversold:
                # RF ENHANCEMENT: Add RF filter
                if not self.rf_ready or rf_confidence > self.params.rf_threshold:
                    if self.position.size > 0:  # Close long
                        if self.stop_order is not None:
                            self.cancel(self.stop_order)
                        self.order = self.close()
                        print(f"SAR BEARISH: Closing long at {self.data.close[0]:.2f} (RF: {rf_confidence:.3f})")
                    elif not self.position:  # Go short
                        self.order = self.sell()
                        print(f"SAR BEARISH: Going short at {self.data.close[0]:.2f} (RF: {rf_confidence:.3f})")
                else:
                    self.rf_filtered_signals += 1
                    print(f"SAR BEARISH signal filtered - RF confidence {rf_confidence:.3f} < {self.params.rf_threshold}")
            else:
                print(f"SAR BEARISH signal rejected - RSI oversold: {self.rsi[0]:.1f}")

The next() method orchestrates the strategy’s real-time decision-making process:

  1. ML Data Management: It continuously calls calculate_features() and calculate_target_label() to populate the feature_buffer and label_buffer. It then checks if it’s time to retrain the RandomForest model based on retrain_frequency and calls train_random_forest(). The buffers are kept relatively small to manage memory and ensure the model adapts quickly to recent market dynamics, which is crucial for 3-month data.
  2. Order Check: It ensures no orders are currently pending execution to avoid placing duplicate or conflicting orders.
  3. RandomForest Confidence: It obtains the rf_confidence score from the RandomForest model for the current bar’s features.
  4. SAR Signal Detection:
    • Bullish Signal: If sar_signal turns positive (price crosses above SAR), indicating a bullish reversal or continuation. total_signals is incremented.
    • Bearish Signal: If sar_signal turns negative (price crosses below SAR), indicating a bearish reversal or continuation. total_signals is incremented.
  5. RSI Confirmation: Both bullish and bearish SAR signals are further filtered by RSI:
    • For bullish SAR, RSI must be below rsi_overbought (not already overextended).
    • For bearish SAR, RSI must be above rsi_oversold (not already oversold).
  6. RandomForest Filtering (The Enhancement): This is the core ML integration. If the RSI confirmation is met:
    • If the RandomForest model is ready (rf_ready) AND its rf_confidence is below the rf_threshold, the signal is filtered out, and rf_filtered_signals is incremented. This means the strategy abstains from trades that the RandomForest model deems less likely to be profitable.
    • If the rf_confidence is sufficient, the strategy takes action: either closes an existing opposing position or opens a new position (buy for bullish, sell for bearish). An initial fixed stop-loss is placed via notify_order.

2.6. stop() Method: Post-Backtest Summary

    def stop(self):
        """Enhanced stop function with RF statistics"""
        filter_rate = (self.rf_filtered_signals / self.total_signals * 100) if self.total_signals > 0 else 0
        
        print(f'\n=== RANDOM FOREST ENHANCED PARABOLIC SAR RESULTS ===')
        print(f'Total SAR Signals: {self.total_signals}')
        print(f'RF Filtered Signals: {self.rf_filtered_signals} ({filter_rate:.1f}%)')

The stop() method is automatically called at the very end of the backtest. It prints a concise summary: the total number of raw Parabolic SAR signals and, crucially, how many of these signals were filtered out by the RandomForest model, along with the filtering rate. This provides a direct measure of the ML model’s impact on trade selection.


3. Running the Backtest: Main Execution Block

This if __name__ == '__main__': block sets up the Backtrader environment, loads the data, configures the broker, runs the backtest, and displays the results.

if __name__=='__main__':
    # Test with 3-month data
    data = yf.download('ETH-USD', '2024-01-01', '2024-04-01', auto_adjust=False).droplevel(axis=1, level=1) # 3 months
    data_feed = bt.feeds.PandasData(dataname=data)
    
    cerebro = bt.Cerebro()
    cerebro.addstrategy(RandomForestEnhancedParabolicSARStrategy)
    cerebro.adddata(data_feed)
    cerebro.addsizer(bt.sizers.PercentSizer, percents=95)
    cerebro.broker.setcash(100000)
    cerebro.broker.setcommission(commission=0.001)
    
    print(f'Start: ${cerebro.broker.getvalue():,.2f}')
    results = cerebro.run()
    final_value = cerebro.broker.getvalue()
    total_return = ((final_value / 100000) - 1) * 100
    print(f'End: ${final_value:,.2f}')
    print(f'Return: {total_return:.2f}%')
    
    

Results

The Rolling backtests shows significant improvement in performance over the base strategy. I tried 3-month window periods from 2020 to 2025 for Bitcoin:

Base Strategy:

Pasted image 20250709124234.png

ML-Enhanced:

Pasted image 20250709124241.png

Conclusion 💡

This RandomForest-Enhanced Parabolic SAR strategy demonstrates a powerful synergy between a classic trend-following indicator and a robust machine learning classifier. By leveraging the RandomForest model to filter traditional SAR signals, the strategy aims to reduce the impact of false signals and improve overall trading performance, especially in volatile markets or during periods where the SAR might typically generate whipsaws.

The design, featuring frequent retraining of the RandomForest model and a comprehensive set of input features, is tailored to adapt to the dynamics of short-term (3-month) data. While this implementation provides a strong foundation, further advancements could include:

Integrating machine learning into traditional indicator-based strategies represents a significant step towards building more intelligent, adaptive, and potentially more profitable algorithmic trading systems.