Hazardous Asteroid Detection System

The Overview

Our team initially tested 9 models, and the 4 below were the most promising ones. All 4 mostly had accuracy metrics in the high 90% and the confusion matrices had a majority of true positives and true negatives. We also tested the Gradient Boosting Classifier, Bernoulli model, Gaussian model, Support Vector Machines, and Logistic Regression, but their metrics were not up to the standards.

Model 1 Decision Trees

Decision trees are flowchart tree like structures where a system uses a series of questions to assign a classification. We ended up using this model for our final product.

Accuracy: 97.08%
Precision: 94.48%
Recall: 99.97%
F1: 97.15%

Model 2 K Nearest Neighbors

KNN is a non-parametric method used for classification that works by selecting the specified number of examples (K-Neighbors) closest to the query, and then votes for the most frequent label based on the neighbors.

Accuracy: 90.05%
Precision: 83.46%
Recall: 99.78%
F1: 90.89%

Model 3 Random Forest Classifier

The Random Forest an unsupervised model that uses multiple decision trees and chooses certain prediction by means of voting.

Accuracy: 96.87%
Precision: 99.98%
Recall: 94.09%
F1: 96.95%

Model 4 XGBoost

XGBoost stands for eXtreme Gradient Boosting. An XGB merges weak tree models together to make stronger, cohesive, comprehensive data that is more easily read.

Accuracy: 90.05%
Precision: 83.51%
Recall: 99.69%
F1: 90.89%

Confusion Matrices

Decision Trees KNN

Random Forest Classifier XGBoost

Confusion matrices are one of the many methods we use to confirm whether or not a model is sufficient. The goal is to have a high number in the True Positive (Top-Left) and True Negative (Bottom-Right) boxes, because these values were correctly predicted. The other two boxes contain the number of incorrect predictions.