Our team initially tested 9 models, and the 4 below were the most promising ones. All 4 mostly had accuracy metrics in the high 90% and the confusion matrices had a majority of true positives and true negatives. We also tested the Gradient Boosting Classifier, Bernoulli model, Gaussian model, Support Vector Machines, and Logistic Regression, but their metrics were not up to the standards.
Decision trees are flowchart tree like structures where a system uses a series of questions to assign a classification. We ended up using this model for our final product.
KNN is a non-parametric method used for classification that works by selecting the specified number of examples (K-Neighbors) closest to the query, and then votes for the most frequent label based on the neighbors.
The Random Forest an unsupervised model that uses multiple decision trees and chooses certain prediction by means of voting.
XGBoost stands for eXtreme Gradient Boosting. An XGB merges weak tree models together to make stronger, cohesive, comprehensive data that is more easily read.
Decision Trees KNN
Random Forest Classifier XGBoost
Confusion matrices are one of the many methods we use to confirm whether or not a model is sufficient. The goal is to have a high number in the True Positive (Top-Left) and True Negative (Bottom-Right) boxes, because these values were correctly predicted. The other two boxes contain the number of incorrect predictions.