Article

RandomForest-Enhanced Parabolic SAR Strategy A Backtrader Implementation

The Parabolic SAR (Stop and Reverse) indicator is a popular tool for identifying trend direction and potential reversal points. It provides clear stop-loss and entry signals, often appearing as a series of dots above or below the price bars. When the dots flip from one side to the other, it signals a potential change in trend. While simple and effective in strong trends, Parabolic SAR can generate whipsaws in choppy or ranging markets.

To mitigate false signals and improve the strategy’s robustness, we can integrate a Machine Learning (ML) model, specifically a RandomForestClassifier. This article will explore a RandomForest-Enhanced Parabolic SAR trading strategy, optimized for short-term (3-month) data, implemented using the Backtrader framework. The RandomForest model will act as an intelligent filter, validating SAR signals based on a broader context of market features.

1. Setting Up the Environment: Libraries and Configuration

First, we need to import the necessary Python libraries for data handling, backtesting, machine learning, and plotting.

import backtrader as bt
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore")

plt.rcParams['figure.figsize'] = (12, 8)

backtrader: The primary backtesting framework for creating and testing trading strategies.
yfinance: Used to download historical financial data from Yahoo Finance.
pandas & numpy: Essential for data manipulation and numerical operations, especially with time series data.
matplotlib.pyplot: For generating plots to visualize strategy performance.
sklearn.ensemble.RandomForestClassifier: Our chosen machine learning model, an ensemble method known for its robustness and ability to handle various feature types.
sklearn.preprocessing.StandardScaler: A preprocessor to normalize features, which helps machine learning models perform better.
sklearn.model_selection.train_test_split: For splitting datasets into training and testing subsets, though here it often uses the whole small buffer for training due to frequent retraining.
sklearn.metrics.accuracy_score: Used to evaluate the classification accuracy of the RandomForest model during training.
warnings: To suppress minor warnings that might appear during execution.

2. The `RandomForestEnhancedParabolicSARStrategy` Class

This class defines our complete trading strategy, encompassing indicator calculations, machine learning integration, and trade execution logic within the Backtrader ecosystem.

class RandomForestEnhancedParabolicSARStrategy(bt.Strategy):
    """
    RandomForest-Enhanced Parabolic SAR Strategy optimized for 3-month data
    """
    params = (
        # EXACT original parameters
        ('af', 0.02),            # Acceleration factor
        ('afmax', 0.1),          # Maximum acceleration factor
        ('rsi_period', 14),      # RSI period for momentum
        ('rsi_overbought', 70),  # RSI overbought level
        ('rsi_oversold', 30),    # RSI oversold level
        ('stop_loss_pct', 0.02), # stop loss
        # RandomForest parameters only
        ('rf_threshold', 0.65),  # RF confidence threshold for 3-month data
        ('retrain_frequency', 25), # Retrain every 25 bars for 3-month data
    )

2.1. Strategy Parameters (`params`)

The params tuple makes the strategy highly configurable:

af (Acceleration Factor): The initial acceleration factor for Parabolic SAR. This value increases with each new high/low, making the SAR trail price more closely.
afmax (Maximum Acceleration Factor): The ceiling for the acceleration factor.
rsi_period: The lookback period for the Relative Strength Index (RSI), used as a momentum filter.
rsi_overbought / rsi_oversold: Levels indicating overbought or oversold conditions based on RSI.
stop_loss_pct: A fixed percentage for an initial stop-loss order placed upon entry.
rf_threshold: The minimum confidence (probability) score required from the RandomForest model to confirm a trade signal.
retrain_frequency: How often (in bars) the RandomForest model is retrained. For a 3-month dataset, frequent retraining (e.g., every 25 bars, roughly a month of daily data) is vital for the model to adapt to recent market dynamics.

2.2. Initialization (`init`)

The __init__ method sets up all the indicators and internal variables needed for the strategy.

    def __init__(self):
        # EXACT original Parabolic SAR
        self.psar = bt.indicators.ParabolicSAR(
            af=self.params.af, 
            afmax=self.params.afmax
        )
        
        # EXACT original momentum confirmation with RSI
        self.rsi = bt.indicators.RSI(period=self.params.rsi_period)
        
        # EXACT original SAR position signals
        self.sar_long = self.data.close > self.psar  # Price above SAR = bullish
        self.sar_short = self.data.close < self.psar # Price below SAR = bearish
        
        # EXACT original SAR direction changes
        self.sar_signal = bt.indicators.CrossOver(self.sar_long, 0.5)
        
        # EXACT original order tracking
        self.order = None
        self.stop_order = None
        
        # Minimal additional indicators for RF features (optimized for 3-month data)
        self.sma = bt.indicators.SMA(period=10)      # Fast SMA
        self.ema = bt.indicators.EMA(period=12)      # Fast EMA
        self.atr = bt.indicators.ATR(period=7)       # Fast ATR
        self.volume_ma = bt.indicators.SMA(self.data.volume, period=10)
        self.momentum = bt.indicators.Momentum(period=5)
        self.bb = bt.indicators.BollingerBands(period=15) # Fast BB
        
        # RandomForest components (optimized for 3-month data)
        self.rf_model = None
        self.scaler = StandardScaler()
        self.feature_buffer = []
        self.label_buffer = []
        self.last_retrain = 0
        self.rf_ready = False
        
        # Performance tracking
        self.total_signals = 0
        self.rf_filtered_signals = 0

psar: The Parabolic SAR indicator instance. Its dots track price and flip direction at trend reversals.
rsi: An RSI indicator for momentum confirmation, helping filter out weak SAR signals.
sar_long / sar_short: Boolean indicators; sar_long is true when the close price is above the SAR, indicating an uptrend. sar_short is true when the close is below SAR, indicating a downtrend.
sar_signal: A CrossOver indicator that detects when the SAR flips direction (i.e., sar_long crosses above 0.5 for a bullish flip, or below for a bearish flip).
Order Tracking: self.order and self.stop_order are used to manage pending orders and prevent multiple orders for the same trade.
ML-Specific Indicators: For the RandomForest model, additional fast-period indicators are included: SMA, EMA, ATR, volume_ma, Momentum, and BollingerBands. These provide a richer feature set for the ML model to learn from.
RandomForest Components:
- self.rf_model: Will store the trained RandomForestClassifier.
- self.scaler: An instance of StandardScaler to normalize features for the RF model.
- self.feature_buffer, self.label_buffer: Lists to collect historical features and their corresponding target labels, used for training the RF model.
- self.last_retrain: Tracks the bar index when the model was last trained.
- self.rf_ready: A flag indicating if the RandomForest model is trained and ready for predictions.
Performance Tracking: Counters for total_signals (raw SAR signals) and rf_filtered_signals (signals rejected by RandomForest).

2.3. Machine Learning Integration

This section defines how we prepare data for the RandomForest model, train it, and use its predictions to filter trade signals.

`calculate_features()`: Data for the RandomForest Model

    def calculate_features(self):
        """Calculate features optimized for Parabolic SAR + RandomForest + 3-month data"""
        if (len(self.psar) < 10 or
            len(self.rsi) < self.params.rsi_period):
            return None
        
        try:
            features = []
            
            # Core Parabolic SAR features
            sar_distance = (self.data.close[0] - self.psar[0]) / self.data.close[0] if self.data.close[0] > 0 else 0
            sar_distance_normalized = sar_distance / (self.atr[0] / self.data.close[0]) if self.atr[0] > 0 else 0
            price_above_sar = 1 if self.data.close[0] > self.psar[0] else 0
            price_below_sar = 1 if self.data.close[0] < self.psar[0] else 0
            features.extend([sar_distance, sar_distance_normalized, price_above_sar, price_below_sar])
            
            # SAR signal and momentum
            sar_signal_strength = self.sar_signal[0] if not np.isnan(self.sar_signal[0]) else 0
            sar_trend_consistency = 1 if (
                len(self.sar_long) > 3 and 
                ((self.sar_long[0] and self.sar_long[-1] and self.sar_long[-2]) or
                 (not self.sar_long[0] and not self.sar_long[-1] and not self.sar_long[-2]))
            ) else 0
            features.extend([sar_signal_strength, sar_trend_consistency])
            
            # RSI momentum features
            rsi_norm = self.rsi[0] / 100 if not np.isnan(self.rsi[0]) else 0.5
            rsi_overbought = 1 if self.rsi[0] > self.params.rsi_overbought else 0
            rsi_oversold = 1 if self.rsi[0] < self.params.rsi_oversold else 0
            rsi_momentum = (self.rsi[0] - self.rsi[-3]) / 100 if len(self.rsi) > 3 else 0
            rsi_in_range = 1 if self.params.rsi_oversold < self.rsi[0] < self.params.rsi_overbought else 0
            features.extend([rsi_norm, rsi_overbought, rsi_oversold, rsi_momentum, rsi_in_range])
            
            # SAR-RSI signal combinations (original strategy logic)
            sar_bullish_confirmed = 1 if (self.sar_signal[0] > 0 and self.rsi[0] < self.params.rsi_overbought) else 0
            sar_bearish_confirmed = 1 if (self.sar_signal[0] < 0 and self.rsi[0] > self.params.rsi_oversold) else 0
            features.extend([sar_bullish_confirmed, sar_bearish_confirmed])
            
            # Price momentum and trend
            price_change_1 = (self.data.close[0] - self.data.close[-1]) / self.data.close[-1] if len(self.data) > 1 else 0
            price_change_3 = (self.data.close[0] - self.data.close[-3]) / self.data.close[-3] if len(self.data) > 3 else 0
            momentum_norm = self.momentum[0] / self.data.close[0] if self.data.close[0] > 0 else 0
            features.extend([price_change_1, price_change_3, momentum_norm])
            
            # Trend confirmation indicators
            sma_distance = (self.data.close[0] - self.sma[0]) / self.sma[0] if self.sma[0] > 0 else 0
            ema_distance = (self.data.close[0] - self.ema[0]) / self.ema[0] if self.ema[0] > 0 else 0
            sma_ema_alignment = 1 if ((self.data.close[0] > self.sma[0] and self.data.close[0] > self.ema[0]) or
                                      (self.data.close[0] < self.sma[0] and self.data.close[0] < self.ema[0])) else 0
            features.extend([sma_distance, ema_distance, sma_ema_alignment])
            
            # Volatility context
            atr_norm = self.atr[0] / self.data.close[0] if self.data.close[0] > 0 else 0
            bb_position = (self.data.close[0] - self.bb.mid[0]) / (self.bb.top[0] - self.bb.bot[0]) if (self.bb.top[0] - self.bb.bot[0]) > 0 else 0.5
            bb_width = (self.bb.top[0] - self.bb.bot[0]) / self.bb.mid[0] if self.bb.mid[0] > 0 else 0
            features.extend([atr_norm, bb_position, bb_width])
            
            # Volume confirmation
            volume_ratio = self.data.volume[0] / self.volume_ma[0] if self.volume_ma[0] > 0 else 1.0
            volume_change = (self.data.volume[0] - self.data.volume[-1]) / self.data.volume[-1] if len(self.data) > 1 and self.data.volume[-1] > 0 else 0
            features.extend([volume_ratio, volume_change])
            
            # SAR effectiveness context
            sar_effectiveness = abs(sar_distance) * (1 if sar_trend_consistency else 0.5)
            features.append(sar_effectiveness)
            
            # Market condition features
            trending_market = 1 if (abs(sma_distance) > 0.01 and sma_ema_alignment) else 0
            features.append(trending_market)
            
            # Signal quality indicators
            signal_quality = sar_signal_strength * rsi_in_range * trending_market
            features.append(signal_quality)
            
            # Price action features
            high_low_ratio = (self.data.high[0] - self.data.low[0]) / self.data.close[0] if self.data.close[0] > 0 else 0
            close_position = (self.data.close[0] - self.data.low[0]) / (self.data.high[0] - self.data.low[0]) if (self.data.high[0] - self.data.low[0]) > 0 else 0.5
            features.extend([high_low_ratio, close_position])
            
            # Clean features
            features = [0 if np.isnan(x) or np.isinf(x) else x for x in features]
            return np.array(features)
            
        except:
            return None

This comprehensive method generates a rich set of numerical features for the RandomForest model based on the current bar’s data and indicator values. These features are designed to capture various aspects of market behavior relevant to SAR signals and trend confirmation. They include:

Parabolic SAR specifics: Distance from SAR, and binary indicators for price above/below SAR.
SAR Signal Context: sar_signal_strength (from CrossOver), and sar_trend_consistency (if SAR has maintained its direction for a few bars).
RSI Momentum: Normalized RSI value, overbought/oversold flags, and RSI momentum.
Combined Signals: sar_bullish_confirmed and sar_bearish_confirmed blend SAR and RSI logic.
Price Dynamics: Short-term price changes, general momentum.
Trend Confirmation: Price distance from SMAs/EMAs, and alignment of moving averages.
Volatility: Normalized ATR, Bollinger Band position relative to price, and band width.
Volume: Current volume relative to its average and volume change.
Composite Indicators: sar_effectiveness, trending_market, signal_quality combine multiple indicators to provide holistic market insights.
Price Action: High-low ratio and closing position within the bar’s range. All features are normalized or appropriately scaled, and robust error handling manages potential NaN or inf values.

`calculate_target_label()`: What the RandomForest Predicts

    def calculate_target_label(self):
        """Calculate target for RF training - next 3 bars return for 3-month data"""
        if len(self.data) < 5:
            return 0
        
        try:
            # Use next 3 bars return for 3-month data
            future_return = (self.data.close[-3] - self.data.close[0]) / self.data.close[0]
            return future_return
        except:
            return 0

The target label for our RandomForest model is the percentage return of the closing price three bars into the future. This allows the model to learn patterns that precede significant price movements (up or down).

`train_random_forest()`: Training the RandomForest Model

    def train_random_forest(self):
        """Train RandomForest model for 3-month data"""
        if len(self.feature_buffer) < 25:  # Minimum samples for 3-month data
            return False
        
        try:
            X = np.array(self.feature_buffer)
            y = np.array(self.label_buffer)
            
            # Remove invalid data
            valid_mask = np.isfinite(X).all(axis=1) & np.isfinite(y)
            X, y = X[valid_mask], y[valid_mask]
            
            if len(X) < 20:
                return False
            
            # Binary classification: top 50% returns are good for 3-month data
            threshold = np.percentile(np.abs(y), 50)
            y_binary = (np.abs(y) > threshold).astype(int)
            
            if len(np.unique(y_binary)) < 2:
                return False
            
            # Simple split for small datasets
            if len(X) < 35:
                X_train, X_test, y_train, y_test = X, X, y_binary, y_binary
            else:
                X_train, X_test, y_train, y_test = train_test_split(
                    X, y_binary, test_size=0.3, random_state=42
                )
            
            # Scale features
            X_train_scaled = self.scaler.fit_transform(X_train)
            if len(X_test) > 0:
                X_test_scaled = self.scaler.transform(X_test)
            
            # RandomForest optimized for Parabolic SAR and 3-month data
            self.rf_model = RandomForestClassifier(
                n_estimators=40,            # Moderate trees for SAR signals
                max_depth=5,                # Moderate depth for small datasets
                min_samples_split=5,        # Prevent overfitting
                min_samples_leaf=2,         # Ensure leaf nodes have enough samples
                max_features='sqrt',        # Square root of features
                random_state=42,
                class_weight='balanced',    # Handle class imbalance
                bootstrap=True,             # Bootstrap sampling
                oob_score=True              # Out-of-bag score for validation
            )
            
            self.rf_model.fit(X_train_scaled, y_train)
            
            # Evaluate if we have test data
            if len(X_test) > 0:
                accuracy = accuracy_score(y_test, self.rf_model.predict(X_test_scaled))
            else:
                accuracy = accuracy_score(y_train, self.rf_model.predict(X_train_scaled))
            
            self.rf_ready = True
            print(f"RandomForest trained - Accuracy: {accuracy:.3f}, OOB Score: {self.rf_model.oob_score_:.3f}, Samples: {len(X)}")
            return True
            
        except Exception as e:
            print(f"RandomForest training failed: {e}")
            return False

This method is responsible for training the RandomForestClassifier:

It collects features (X) and labels (y) from the buffers, handling invalid data points.
Binary Classification: The continuous future_return is converted into a binary label: if the absolute return is above the 50th percentile, it’s labeled 1 (a “good” signal, indicating a significant move); otherwise, it’s 0. This transforms the prediction task into a classification problem.
Data Splitting & Scaling: Data is split into training and testing sets (or all data is used if the buffer is small). Features are scaled using StandardScaler, which is crucial for distance-based ML algorithms.
RandomForest Configuration: An n_estimators of 40 (number of trees) and max_depth of 5 (tree depth) are chosen to balance model complexity with the limited data available for frequent retraining. Parameters like min_samples_split, min_samples_leaf, max_features, class_weight='balanced' (to handle potential class imbalance), and oob_score are set for robustness and validation.
Upon successful training, self.rf_ready is set to True, and the model’s accuracy and out-of-bag (OOB) score are printed.

`get_rf_confidence()`: Getting a Prediction from RandomForest

    def get_rf_confidence(self, features):
        """Get RandomForest prediction confidence"""
        if not self.rf_ready or self.rf_model is None or features is None:
            return 0.5  # Neutral when RF not ready
        
        try:
            features_scaled = self.scaler.transform(features.reshape(1, -1))
            proba = self.rf_model.predict_proba(features_scaled)[0]
            return proba[1] if len(proba) > 1 else 0.5
        except:
            return 0.5

This method queries the trained RandomForest model for its prediction confidence. It takes the current bar’s features, scales them using the fitted scaler, and then calls predict_proba. This returns the probability of the current market state belonging to class 1 (a “good” signal with a significant future move). This probability serves as our RandomForest confidence score.

2.4. Order Management (`notify_order`)

    def notify_order(self, order):
        """EXACT original notify_order function"""
        if order.status in [order.Completed]:
            if order.isbuy() and self.position.size > 0:
                stop_price = order.executed.price * (1 - self.params.stop_loss_pct)
                self.stop_order = self.sell(exectype=bt.Order.Stop, price=stop_price)
            elif order.issell() and self.position.size < 0:
                stop_price = order.executed.price * (1 + self.params.stop_loss_pct)
                self.stop_order = self.buy(exectype=bt.Order.Stop, price=stop_price)
        
        if order.status in [order.Completed, order.Canceled, order.Rejected]:
            self.order = None
            if order == self.stop_order:
                self.stop_order = None

This backtrader callback is triggered when an order’s status changes. It’s crucial for:

Setting Initial Stop-Loss: Once a buy or sell order completes, it immediately places a fixed percentage stop-loss order (stop_loss_pct) to manage risk.
Order Tracking: It updates self.order and self.stop_order to None once an order is completed, canceled, or rejected, preventing redundant actions.

2.5. The `next()` Method: The Strategy’s Brain

The next() method is the core logic loop, executed for each new bar of data. This is where the strategy evaluates indicators, performs ML filtering, and makes trading decisions.

    def next(self):
        # Collect features for RF training (minimal overhead)
        features = self.calculate_features()
        if features is not None and len(self.data) > 20:
            target = self.calculate_target_label()
            self.feature_buffer.append(features)
            self.label_buffer.append(target)
            
            # Keep buffer small for 3-month data
            if len(self.feature_buffer) > 70:
                self.feature_buffer = self.feature_buffer[-50:]
                self.label_buffer = self.label_buffer[-50:]
        
        # Retrain RF frequently for 3-month data
        if len(self.data) - self.last_retrain >= self.params.retrain_frequency:
            if self.train_random_forest():
                self.last_retrain = len(self.data)
        
        # EXACT original logic with RF enhancement
        if self.order is not None:
            return
        
        # Get RF confidence
        rf_confidence = self.get_rf_confidence(features)
        
        # EXACT original SAR breakout signals with RF enhancement
        if self.sar_signal > 0:  # SAR turns bullish (price crosses above SAR)
            self.total_signals += 1
            # EXACT original RSI confirmation with RF filter
            if self.rsi < self.params.rsi_overbought:
                # RF ENHANCEMENT: Add RF filter
                if not self.rf_ready or rf_confidence > self.params.rf_threshold:
                    if self.position.size < 0:  # Close short
                        if self.stop_order is not None:
                            self.cancel(self.stop_order)
                        self.order = self.close()
                        print(f"SAR BULLISH: Closing short at {self.data.close[0]:.2f} (RF: {rf_confidence:.3f})")
                    elif not self.position:  # Go long
                        self.order = self.buy()
                        print(f"SAR BULLISH: Going long at {self.data.close[0]:.2f} (RF: {rf_confidence:.3f})")
                else:
                    self.rf_filtered_signals += 1
                    print(f"SAR BULLISH signal filtered - RF confidence {rf_confidence:.3f} < {self.params.rf_threshold}")
            else:
                print(f"SAR BULLISH signal rejected - RSI overbought: {self.rsi[0]:.1f}")
                    
        elif self.sar_signal < 0:  # SAR turns bearish (price crosses below SAR)
            self.total_signals += 1
            # EXACT original RSI confirmation with RF filter
            if self.rsi > self.params.rsi_oversold:
                # RF ENHANCEMENT: Add RF filter
                if not self.rf_ready or rf_confidence > self.params.rf_threshold:
                    if self.position.size > 0:  # Close long
                        if self.stop_order is not None:
                            self.cancel(self.stop_order)
                        self.order = self.close()
                        print(f"SAR BEARISH: Closing long at {self.data.close[0]:.2f} (RF: {rf_confidence:.3f})")
                    elif not self.position:  # Go short
                        self.order = self.sell()
                        print(f"SAR BEARISH: Going short at {self.data.close[0]:.2f} (RF: {rf_confidence:.3f})")
                else:
                    self.rf_filtered_signals += 1
                    print(f"SAR BEARISH signal filtered - RF confidence {rf_confidence:.3f} < {self.params.rf_threshold}")
            else:
                print(f"SAR BEARISH signal rejected - RSI oversold: {self.rsi[0]:.1f}")

The next() method orchestrates the strategy’s real-time decision-making process:

ML Data Management: It continuously calls calculate_features() and calculate_target_label() to populate the feature_buffer and label_buffer. It then checks if it’s time to retrain the RandomForest model based on retrain_frequency and calls train_random_forest(). The buffers are kept relatively small to manage memory and ensure the model adapts quickly to recent market dynamics, which is crucial for 3-month data.
Order Check: It ensures no orders are currently pending execution to avoid placing duplicate or conflicting orders.
RandomForest Confidence: It obtains the rf_confidence score from the RandomForest model for the current bar’s features.
SAR Signal Detection:
- Bullish Signal: If sar_signal turns positive (price crosses above SAR), indicating a bullish reversal or continuation. total_signals is incremented.
- Bearish Signal: If sar_signal turns negative (price crosses below SAR), indicating a bearish reversal or continuation. total_signals is incremented.
RSI Confirmation: Both bullish and bearish SAR signals are further filtered by RSI:
- For bullish SAR, RSI must be below rsi_overbought (not already overextended).
- For bearish SAR, RSI must be above rsi_oversold (not already oversold).
RandomForest Filtering (The Enhancement): This is the core ML integration. If the RSI confirmation is met:
- If the RandomForest model is ready (rf_ready) AND its rf_confidence is below the rf_threshold, the signal is filtered out, and rf_filtered_signals is incremented. This means the strategy abstains from trades that the RandomForest model deems less likely to be profitable.
- If the rf_confidence is sufficient, the strategy takes action: either closes an existing opposing position or opens a new position (buy for bullish, sell for bearish). An initial fixed stop-loss is placed via notify_order.

2.6. `stop()` Method: Post-Backtest Summary

    def stop(self):
        """Enhanced stop function with RF statistics"""
        filter_rate = (self.rf_filtered_signals / self.total_signals * 100) if self.total_signals > 0 else 0
        
        print(f'\n=== RANDOM FOREST ENHANCED PARABOLIC SAR RESULTS ===')
        print(f'Total SAR Signals: {self.total_signals}')
        print(f'RF Filtered Signals: {self.rf_filtered_signals} ({filter_rate:.1f}%)')

The stop() method is automatically called at the very end of the backtest. It prints a concise summary: the total number of raw Parabolic SAR signals and, crucially, how many of these signals were filtered out by the RandomForest model, along with the filtering rate. This provides a direct measure of the ML model’s impact on trade selection.

3. Running the Backtest: Main Execution Block

This if __name__ == '__main__': block sets up the Backtrader environment, loads the data, configures the broker, runs the backtest, and displays the results.

if __name__=='__main__':
    # Test with 3-month data
    data = yf.download('ETH-USD', '2024-01-01', '2024-04-01', auto_adjust=False).droplevel(axis=1, level=1) # 3 months
    data_feed = bt.feeds.PandasData(dataname=data)
    
    cerebro = bt.Cerebro()
    cerebro.addstrategy(RandomForestEnhancedParabolicSARStrategy)
    cerebro.adddata(data_feed)
    cerebro.addsizer(bt.sizers.PercentSizer, percents=95)
    cerebro.broker.setcash(100000)
    cerebro.broker.setcommission(commission=0.001)
    
    print(f'Start: ${cerebro.broker.getvalue():,.2f}')
    results = cerebro.run()
    final_value = cerebro.broker.getvalue()
    total_return = ((final_value / 100000) - 1) * 100
    print(f'End: ${final_value:,.2f}')
    print(f'Return: {total_return:.2f}%')

Data Acquisition: yf.download('ETH-USD', '2024-01-01', '2024-04-01', auto_adjust=False).droplevel(axis=1, level=1) downloads 3 months of daily Ethereum (ETH-USD) data. The auto_adjust=False and droplevel are crucial for compatibility with backtrader.
Cerebro Setup:
- bt.Cerebro(): Initializes the main backtesting engine.
- cerebro.addstrategy(RandomForestEnhancedParabolicSARStrategy): Adds our custom strategy to Cerebro.
- cerebro.adddata(data_feed): Feeds the downloaded data to the strategy.
- cerebro.addsizer(bt.sizers.PercentSizer, percents=95): Configures position sizing to use 95% of the available cash.
- cerebro.broker.setcash(100000): Sets the initial trading capital to $100,000.
- cerebro.broker.setcommission(commission=0.001): Sets a commission of 0.1% per trade.
Execution and Results: cerebro.run() executes the backtest. The script then prints the starting and ending portfolio values, along with the total return percentage.
Plotting (Commented Out): The plotting section is commented out in your provided code, but if enabled, cerebro.plot() would visualize the price chart with trade entries and exits.

Results

The Rolling backtests shows significant improvement in performance over the base strategy. I tried 3-month window periods from 2020 to 2025 for Bitcoin:

Base Strategy:

ML-Enhanced:

Conclusion 💡

This RandomForest-Enhanced Parabolic SAR strategy demonstrates a powerful synergy between a classic trend-following indicator and a robust machine learning classifier. By leveraging the RandomForest model to filter traditional SAR signals, the strategy aims to reduce the impact of false signals and improve overall trading performance, especially in volatile markets or during periods where the SAR might typically generate whipsaws.

The design, featuring frequent retraining of the RandomForest model and a comprehensive set of input features, is tailored to adapt to the dynamics of short-term (3-month) data. While this implementation provides a strong foundation, further advancements could include:

Hyperparameter Optimization: Systematically tuning the parameters of both the Parabolic SAR/RSI and the RandomForest model.
Alternative ML Models: Exploring other classifiers or even regression models for predicting future price movements.
Ensemble Filtering: Combining multiple ML models to create an even more resilient signal filter.
Dynamic Stop-Loss/Take-Profit: Implementing more advanced risk management that adapts to market volatility, beyond a fixed percentage.

Integrating machine learning into traditional indicator-based strategies represents a significant step towards building more intelligent, adaptive, and potentially more profitable algorithmic trading systems.

1. Setting Up the Environment: Libraries and Configuration

2. The RandomForestEnhancedParabolicSARStrategy Class

2.1. Strategy Parameters (params)

2.2. Initialization (__init__)

2.3. Machine Learning Integration

calculate_features(): Data for the RandomForest Model

calculate_target_label(): What the RandomForest Predicts

train_random_forest(): Training the RandomForest Model

get_rf_confidence(): Getting a Prediction from RandomForest

2.4. Order Management (notify_order)

2.5. The next() Method: The Strategy’s Brain

2.6. stop() Method: Post-Backtest Summary