Friday, April 19, 2024 12:33:10 PM

Out of sample performance, issues with model F1 score weighting In-sample too much over OOS results?

one year ago
#39 Quote
I've run several tests and with different time frames and indicators, I seem to be getting mostly decent performance for the In Sample data (almost too good from what I've been able to create manually), but the Out of Sample portion falls off considerably.  Using 3 years of data the last 6 months' profits are mostly flat and/or slightly down in almost every single case where I was able to achieve consistent results in the In Sample period. I can make an algorithm with similar Profit Targets and Stop Loss manually that performs much better when testing Out Of Sample.

Is it possible it should be weighting Out of Sample results more aggressively when training the model? Because that is what matters most when coming up with an algorithm that actually works on the live market. The Out of Sample and forward testing needs to be mostly consistent with the overall results. Otherwise, you are reliant on an auto-optimization process that is curve-fitting the model to be useless on the live market/OOS data.

Is there some way of testing and creating models to reduce curve fitting with this software that I have missed?
0
one year ago
#40 Quote
Hi Ryan,
Thanks for posting! If you're getting overfitting for the data set, there are a few things you can try. Please note that, in general, the out of sample tests are not going to be as good as the in sample but there are things we can try to help.

The Pre-Signal Window parameter determines how many bars of indicator data will be used to create the machine learning model. The larger this number the more data will be used to determine if we get a long or short signal and will require more data to fit that particular model. Reducing this number will lessen this restriction but may make the model more generic. You'll need to experiment a bit to find the sweet spot for this parameter for the particular model.

The other possibility is creating models using more recent data, but then backtesting on older out of sample data. Creating models with smaller amounts of data then backtesting larger out of sample data sets.

We can also limit the data sets used in the creation of the model. In the Advanced Training Params section, there are four parameters that can be used to limit the data.

  Max Long Data Sets - This will set the maximum number of long data sets when creating a model for data that has reached a profit target
  Max Short Data Sets - This will set the maximum number of short data sets
  Max Failed to Reach Long Data Sets - This will set the maximum number of long data sets that failed to reach a profit target
  Max Failed to Reach Short Data Sets - This will set the maximum number of short data sets that failed to reach a profit target

One thing to note about setting the max data sets is that when creating the model it will only use the recent x sets for each type. For example, if we are creating a new machine learning model from 1-1-2010 through 1-1-2022 and we have Max Long Data Sets set at 1000. Even though we may have actually found 100000 Long Data Sets, we will only use the last 1000. Hypothetically this could be the data sets from 1-1-2021 through 1-1-2022.

Setting ratios for the data sets that have reached a profit target versus failed to reach a profit target. You'll need to experiment a little with this as well, but having a large ratio of data sets that failed to reach profit targets versus data sets that reached profit targets may impact your model. Using the parameters above will give you more control over your data sets.

Please let us know how it goes!
0
one year ago
#41 Quote
I reduced the percentage for the parameter Percentage of Data for Training to 50% and I am getting better results for Out Of Sample data.

I also had to change some other parameters as well. Once you have the training weighting more equalized I've noticed it may be better to have more positive training data sets than negative data sets otherwise negative recall and precision are going to be better trained and have better values, and you want negative precision high but but you need positive precision and recall to be better trained since those are the trades your model is actually entering on and potentially losing money on.

Negative precision should be good to avoid bad trades but recall doesn't have to be as good of a number.  This seems to also result in higher F1 scores for the model than low recall on positive training sets, not sure if that is specific to trading where false negatives shouldn't necessarily be weighted as high as false positives since that results in lost money.
0
one year ago
#42 Quote
That's great! Yes, for some models we've found a 2:1 ratio for positive to negative data sets has worked well. In some cases, we've found that if the F1 score is too high the model is not as tradeable. You'll need to experiment with your data set though.

Once you've found an acceptable good/bad signal ratio, the other part to look at is risk management for the strategy. We haven't included any type of risk management in the sample files other than basic profit targets and stops. We assumed users would have their own ideas on how to optimize entering and exiting trades.
0