Welcome to the Advanced Trainer Options section. This section goes over the advanced trainer options that may be set before creating a new machine learning model. 

There are several options available for different trainers depending on the trainer. If the Use Default option is checked for the trainer, then the default values will be used. If the Use Default option is not checked then the model will use the custom values for the particular trainer.

Scda Trainer Options

    • Use Default Scda Trainer Options - If the Use Default Scda Trainer Options is checked then the default values will be used, otherwise the custom values for the parameters will be used when creating a new model.
    • L1 Regularization - L1 regularization is more robust in dealing with outliers. It creates sparsity, which means less important features will be 0. Valid range is (0.03125, 32768), default is 1.0.
    • L2 Regularization - L2 adds a penalty as the model complexity increases. The parameter penalizes all the parameters except the intercept so the model will generalize the data and won't overfit. Valid range is (0.03125, 32768), default is 0.1.


Lbfgs Trainer Options

    • Use Default Lbfgs Trainer Options - If the Use Default Lbfgs Trainer Options is checked then the default values will be used, otherwise the custom values for the parameters will be used when creating a new model.        
    • L1 Regularization - L1 regularization is more robust in dealing with outliers. It creates sparsity, which means less important features will be 0. Valid range is (0.03125, 32768), default is 1.0.
    • L2 Regularization - L2 adds a penalty as the model complexity increases. The parameter penalizes all the parameters except the intercept so the model will generalize the data and won't overfit. Valid range is (0.03125, 32768), default is 0.1.


Fast Forest Trainer Options

    • Use Default Fast Forest Trainer Options - If the Use Default Fast Forest Trainer Options is checked then the default values will be used, otherwise the custom values for the parameters will be used when creating a new model.        
    • Number of Leaves - The max number of leaves in each tree. Valid range is (4,32768), default is 4.
    • Number of Trees - The max number of trees. Valid range is (4,32768), default is 4.
    • Feature Fraction - The fraction of features (chosen randomly) to use on each iteration. Use 0.9 if only 90% of features is needed. Lower numbers help reduce over-fitting. Valid range is (2E-10f, 1.0), default is 1.0


Fast Tree Trainer Options

    • Use Default Fast Tree Trainer Options - If the Use Default Fast Tree Trainer Options is checked then the default values will be used, otherwise the custom values for the parameters will be used when creating a new model.        
    • Number of Leaves - The max number of leaves in each tree. Valid range is (4,32768), default is 4.
    • Number of Trees - The max number of trees. Valid range is (4,32768), default is 4.
    • Minimum Example Count Per Leaf - The minimal number of data points required to form a new tree leaf. Valid range is (2,128), default is 20.
    • Maximum Bin Count Per Feature - Maximum number of distinct values (bins) per feature. Valid range is (8, 1024), default is 256.
    • Feature Fraction - The fraction of features (chosen randomly) to use on each iteration. Use 0.9 if only 90% of features is needed. Lower numbers help reduce over-fitting. Valid range is (2E-10f, 1.0), default is 1.0
    • Learning Rate - Determines the step size taken at each iteration while moving toward minimizing a loss function. A larger value can potentially reduce the training time but may incur numerical instability and over-fitting. Valid range is (2E-10f, 1.0), default is 0.1.


Lgbm Trainer Options

    • Use Default Lgbm Trainer Options - If the Use Default Lgbm Trainer Options is checked then the default values will be used, otherwise the custom values for the parameters will be used when creating a new model.        
    • Number of Leaves - The max number of leaves in each tree. Valid range is (4,32768), default is 4.
    • Number of Trees - The max number of trees. Valid range is (4,32768), default is 4.
    • Minimum Example Count Per Leaf - The minimal number of data points required to form a new tree leaf. Valid range is (2,128), default is 20.
    • Maximum Bin Count Per Feature - Maximum number of distinct values (bins) per feature. Valid range is (8, 1024), default is 256.
    • Feature Fraction - The fraction of features (chosen randomly) to use on each iteration. Use 0.9 if only 90% of features is needed. Lower numbers help reduce over-fitting. Valid range is (2E-10f, 1.0), default is 1.0
    • Learning Rate - Determines the step size taken at each iteration while moving toward minimizing a loss function. A larger value can potentially reduce the training time but may incur numerical instability and over-fitting. Valid range is (2E-10f, 1.0), default is 0.1.
    • L1 Regularization - L1 regularization is more robust in dealing with outliers. It creates sparsity, which means less important features will be 0. Valid range is (0.03125, 32768), default is 1.0.
    • L2 Regularization - L2 adds a penalty as the model complexity increases. The parameter penalizes all the parameters except the intercept so the model will generalize the data and won't overfit. Valid range is (0.03125, 32768), default is 0.1.


Regularization

       For L1 regularization weight and L2 regularization weight, type in a value to use for the regularization parameters L1 and L2. A non-zero value is recommended for both. Regularization is a method for preventing overfitting by penalizing models with extreme coefficient values. Regularization works by adding the penalty that is associated with coefficient values to the error of the hypothesis. Thus, an accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less.

L1 and L2 regularization have different effects and uses.

    • L1 can be applied to sparse models, which is useful when working with high-dimensional data.
    • In contrast, L2 regularization is preferable for data that is not sparse.


This algorithm supports a linear combination of L1 and L2 regularization values: that is, if x = L1 and y = L2, then ax + by = c defines the linear span of the regularization terms.


Want to learn more about L1 and L2 regularization? The following Microsoft article provides a discussion of how L1 and L2 regularization are different and how they affect model fitting, with code samples for logistic regression and neural network models: L1 and L2 Regularization for Machine Learning


Here is some additional information regarding regularization in Wikipedia.



Futures, foreign currency and options trading contains substantial risk and is not for every investor. An investor could potentially lose all or more than the initial investment. Risk capital is money that can be lost without jeopardizing ones financial security or lifestyle. Only risk capital should be used for trading and only those with sufficient risk capital should consider trading. Past performance is not necessarily indicative of future results.