ML METHODS DESCRIPTION
K-Neighbors Classifier (KNC) K-Neighbors Classifier is a neighbors-based classification where k is an integer value specified by the user. It is an instance-based learning or non-generalizing learning: it does not attempt to construct a general internal model; simply, it stores instances of the training data. Classification is computed from a simple majority vote of nearest neighbors of each point: a query point is assigned the data class that has most representatives within nearest neighbors.
SVM Support vector machines (SVM) and NuSVM are algorithms capable of performing multi-class classifications on datasets. They are a set of supervised learning methods used for classification. SVM and NuSVC are similar methods, but accept slightly different sets of parameters and have different mathematical formulations. These are based on a library (libsvm). In SVM, the fit-time scales at least quadratically with the # of samples and may be impractical beyond tens of thousands of samples. NuSVM is similar but uses a parameter to control the # of support vectors.
NuSVM
Decision Tree Classifier (DTC) Decision Tree Classifier is a non-parametric supervised learning method. It is an algorithm capable of performing multi-class classification on datasets. The goal is to create models that predict the value of a target variable by learning simple decision rules inferred from the data features. For example, a classical decision tree learns from the data to approximate a sine curve with a set of if-then-else decision rules. The deeper the tree, the more complex the decision rules, and the better the model.
Random Forest Classifier (RFC) Random Forests Classifier is an ensemble learning method for classification, that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes of the individual trees. Random decision forests correct for decision trees’ habit of overfitting to their training set. A random forest is a meta estimator that fits a # of decision trees classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
AdaBoost Classifier (ABC) An AdaBoost (51) classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.
Gradient Boosting Classifier (GBC) Gradient Boosting Classifier builds an additive model in a forward stage-wise fashion. It allows for the optimization of arbitrary differentiable loss functions. In each stage n classes, regression trees are fit on the negative gradient of binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced.
Gaussian Naive Bayes (GNB) In the Gaussian Naive Bayes, the likelihood of the features is assumed to be Gaussian. Can perform online updates to model parameters via partial fit.
Linear Discriminant Analysis (LDA) Linear Discriminant Analysis is a classifier with a linear decision boundary generated by fitting class conditional densities to the data using Bayes’ rule. The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix. The fitted model can be used to reduce the dimensionality of the input, projecting it to the most discriminative directions.
Quadratic Discriminant Analysis (QDA) Quadratic Discriminant Analysis, it is a classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule. The model fits a Gaussian density to each class.