Welcome to the Balanced Data Sets section. This section goes over how to balance data sets when creating a new model.  

When creating a new machine learning model, one important aspect is looking at the data that gets integrated into a model. The ratio of the data sets that reached a profit target versus data sets of failed to reach profit targets will vary wildly with the bars to target and ticks up or down to a target.

There are a few different methods in the Deep Signal Library that can be used to control sizes of the data sets and which data will get added to the models.

Bar by Bar Model Creation

In the Advanced Training Params section, the Allow Bar by Bar Model Creation checkbox allows the user to look for profit targets bar by bar when creating a model or have the library skip ahead (default) to look for profit targets at the end of the Bars to Target window. As an example, if we have Delete Old Files turned off and we look at the PriceMoveOnly.txt file we can look at the data sets that are added to the machine learning model when training the model.

If Allow Bar by Bar Model Creation is checked, we may find that multiple data sets from around the same time period will get added to the model.


In the case above, data sets for the time period starting at 06:33, 06:34 and 06:35 will be used in creating a model. All three sets have valid short profit targets. The price of ES went down at least 12 ticks during the time period starting from the Short Signal Bar and looking at the low price of each bar through the end of the data set.

The next set of data we turned off Allow Bar by Bar Model Creation, which will skip ahead to the end of the first data set to start looking for new profit targets. In this set the first data set is at 06:33 but then the next data set is at 06:48.

The user can choose whether or not they would like to use the data set from the first profit target and then skip ahead or use all of the data sets that found reach a profit target.

Setting Maximum Data Sets

In the Advanced Training Params section, there are four changeable parameters that set the data sizes for long, short, failed to reach long and failed to reach short profit targets. In the example below, all are currently set to 0.

Setting these values will automatically set the maximum number of data sets for the particular parameter. As an example, if during model creation the library found 40,000 long profit target reached data sets but the Max Long Data Sets was set to 1000, the library would only use the most recent 1000 data sets. If the user set the Max Long Data Sets to 1000 but the library only found 300, then only 300 data sets would be used.


Automatically Setting Data Set Ratios


The Deep Signal Library has the ability to set the ratio of either long or short Profit Target Reached (positive) versus Failed to Reach Profit Target (negative). This can help balance data sets so that the models are not skewed towards either Profit Target Reached or Failed to Reach Profit Target data sets.


In the Advanced Training Params section the Set Positive Negative Data Set Ratio will control whether the library will automatically set a positive/negative ratio. If checked, the library will use the maximum number of either positive or negative data sets but set the ratio appropriately.


As an example, when creating a new model if the library found 1000 long profit target reached data sets and 100 failed to reach profit target data sets and the ratio was set to 2.0, then the maximum long profit target reached data sets would be set to 200. 


For another example, if the library found 100 long profit target reached data sets and 1000 failed to reach profit target data sets and the ratio was still set to 2.0, then the maximum long failed to reach profit target reached data sets would be set to 50. 


If the user checks the Set Positive Negative Data Set Ratio checkbox, then the Deep Signal Library will ignore any maximum set data sets in the Setting Maximum Data Sets section.


Data Set Ordering

The data sets can be sorted by either Date (Default) or by Potential Profit Target for long and short trades. The Potential Profit Target is the maximum profit potential that is found by starting at the Signal Bar and looking at every bar of data through the end of the Bars To Target window, which starts at the Signal Bar and goes through the end of the data set. 

As an example, if we look at the following data set, the library has found a Short Signal at 6:58am in which the price of ES has gone down by at least 12 ticks or 3 points for ES. The last 4 columns of the data is the price for Open, High, Low and Close. Starting from the bar with * Short Signal Bar *, we can see the Low of ES at 6:59am is 3853.50. That meets our minimum of 3 points to include it in our data sets that are used for training our machine learning model. 

In addition, on the next bar at 7:00am, the Low is 3853.25. This is the biggest difference between the Close of the Signal Bar at 3858.50 and the Low of the bar at 7:00am at 3853.25. The difference between these two prices is 5.25 points.


If we are sorting by Potential Profit Target, we can either have the highest potential profit targets at the beginning of the list of data or at the end. If we choose to put the highest potential profit targets at the end of the list and we have Max Long Data Sets or Max Short Data Sets set to a number we can truncate either the lowest or highest potential profit targets.

A question might be why would we want to do this? If we have data sets in which we saw a black swan type of event where the price of the instrument fluctuated wildly, we may not want to include this data set for training our machine learning model. 


Removing Outlier Data Sets

Using the Remove Outlier Data Sets feature the library can also remove outlier data with standard deviation. The library will go through all of the data sets of the potential profit target amounts that are above or below the average of the profit targets plus or minus the standard deviation. Any outlier data sets will be removed. Using the Data Outlier Filter the user can choose to use either One, Two, Three or Four standard deviations from the average.




Futures, foreign currency and options trading contains substantial risk and is not for every investor. An investor could potentially lose all or more than the initial investment. Risk capital is money that can be lost without jeopardizing ones financial security or lifestyle. Only risk capital should be used for trading and only those with sufficient risk capital should consider trading. Past performance is not necessarily indicative of future results.