How to choose the best machine learning technique: comparison table

While the comparison table in this article applies to a specific problem in FinTech, the conclusions are consistent with findings from other frameworks. There is no single method that surpasses all others, for obvious reasons. Being the overall winner means winning on all potential datasets. The vast majority of datasets are pure noise, for which performance makes no sense and the winner randomly jumps from one dataset to another. However, in real-life applications, each dataset comes with significant patterns or signals. There are strategies that do better on average. The purpose of this article is to explore what works best. However, the successful methods largely depend on the type of problem being studied. It’s easier to rule out methods that are consistently underperforming.

The conclusions here are based on the book “Big Data and AI Strategies: Machine Learning and Alternative Data,” published by JP Morgan Chase and available (free) here. In particular, Plate 1 is a rearranged version of the one shown on page 117 in that book.


The 22 methods listed in Table 1 have been used in day trading of energy stocks (more precisely, indices that combine many stocks) for a long time. The methods are classified according to the Sharpe index. This metric measures the performance of an investment portfolio relative to a risk-free asset, after adjusting for its risk. It is defined as the difference between the investment returns and the risk-free return, divided by the standard deviation of the investment returns.

I have rearranged the original table slightly, so that methods with identical Sharpe are sorted by annualized return. It would be helpful to add a robustness metric and comparison to a base model, as well as the Sharpe ratio. In FinTech, great performance can come from spectacular results concentrated in a few short periods of time, followed by poor performance but with an overall positive outcome as the gains outweigh the losses. Or it can come from more modest but substantial earnings. The latter is preferable due to presumed sustainability, even if the overall performance over a 10-year period is lower.

Table 1: Performance of various methods for day trading of energy stocks


For those unfamiliar with all the terms in the table, I provide a brief summary. The two types of methods are:

  • Regressor, for regression: logistic, linear including Lasso and Ridge, or based on decision trees or support vector machines. Elastic net is a regression method with constraints on the regression coefficients to avoid overfitting. It offers the best of Lasso and ridge regression.
  • Classifier means supervised classification. Linear discriminant analysis is a rudimentary method that tries to separate clusters using hyperplanes. Quadratic discriminant analysis is a generalization using quadratic forms (second degree polynomials). Naïve Bayes is also a basic algorithm, but surprisingly efficient for problems like spam or fraud detection.

L2 regularization is the classic criterion based on the minimization of sums of squares. Conversely, L1 minimizes the sum of absolute values: it is less sensitive to outliers. XGBoost is an ensemble method that merges multiple decision trees. Combine the benefits of each decision tree taken separately. Note that logistic regression can be used for both supervised classification and regression.

Finally, “default” means to use the default hyperparameter values. Cross-validation determines optimal values ​​by testing outside the training set, but could result in overfitting.


Elastic net is a type of constrained regression that combines the benefits of ridge and lasso regression. Unsurprisingly, it performs better than both ridge and Lasso when using cross-validation. However, without cross-validation, it is the worst performer. As expected, XGBoost is the best.

Logistic regression works well as a classifier, but not as well as a regression technique. Not by chance the L1 version does better than L2. It’s surprising to see an underperforming support carrier rating. However, support vector regression (SVM) works well, although it shows higher volatility. The fact that linear discriminant analysis – a rudimentary classifier – is in the top 5 and above SVM is surprising and suspicious, although this is partly due to the lower volatility and thus better Sharpe ratio. It is even more surprising that quadratic discriminant analysis (a generalization) is faring much worse than the linear version. Naïve Bayes is rudimentary, but surpasses many more sophisticated techniques. It does not surprise me. Quite possibly, few methods out of the 22 outperform the basic buy-and-hold strategy, as the annualized return quickly drops below 4%.

For those interested, I’ve developed an alternative method to XGBoost, which is simpler but blends in with generic logistic regression, rather than just decision trees. Generic logistic regression (based on arbitrary rather than logistic distributions) and the simplified XGBoost algorithm are described in my article “Machine Learning Cloud Regression: The Swiss Army Knife of Optimization”, available here.

About the author


Vincent Granville is a pioneering data scientist and machine learning expert, founder of and co-founder of Data Science Central (acquired by TechTarget in 2020), former VC funded executive, author and patent owner. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, CNET, InfoSpace. Vincent is also a former post-doc at the University of Cambridge and the National Institute of Statistical Sciences (NISS).

Vincent published in Number Theory Journal, Journal of the Royal Statistical Society (Series B), e IEEE Transactions on Pattern Analysis and Machine Intelligence. He is also the author of “Intuitive Machine Learning and Explainable AI”, available here. He lives in Washington state and loves to research stochastic processes, dynamical systems, experimental mathematics and probabilistic number theory.

Leave a Comment

%d bloggers like this: