Added new table

2025-10-27 11:37:06 +01:00
parent 3b772b2cfb
commit 4bb645a352
8 changed files with 62 additions and 43 deletions
--- a/Report/MLPproject.tex
+++ b/Report/MLPproject.tex
@@ -125,6 +125,19 @@ This is a very interesting result and maybe not so weird as it first seems. Ther
        DT&0.8483&0.8449&0.8483&0.8462&6.7357
    \end{tabular}}
 \end{table}
+\begin{table}[!htbp]
+    \centering
+    \caption{The performance metrics of the models on the test data.}
+    \label{perfmetric}
+    \resizebox{\columnwidth}{!}{
+    \begin{tabular}{c|c|c|c|c|c}
+        Model&Accuracy&Precision&Recall&F1 Score&Total Time\\
+        \hline
+        RF &0.8589&0.8535&0.8589&0.8534&150.8154\\
+        \hline
+        DT&0.8483&0.8449&0.8483&0.8462&6.7357
+    \end{tabular}}
+\end{table}
 Looking at the values we see that the difference between our models is not that large. The Random forest model is on average about 1 percentage point better than the Decision Tree. We can also see that all metrics are at about 0.85. This means that our models are not very accurate and that the differences between them is not that large at all. Which model that is better depends a lot on what is the priority. While it is clear that the Random Forest has the better performance, even by just a little bit, it is also significanty slower. So for this dataset was it really worth 30x the computational time to get a slightly better result? We are not really sure. The extra computational time is a definite negative but at the size of this dataset we are only talking about a couple of minutes which is not too bad. For another dataset the results may be different and it might be clearer which is really the prefered model.

 At a first glance at both the confusion matricies and the performance metrics the models do not look to be that good. But what has to be considered is the data that we are analyzing. We are looking at what possible indicators there are for a person to earn more than a certain amount of money. This is real world data and in the real world there is a lot of unique ways of earning money. While there certainly are some indicators that will clearly tell that somebody is earning a lot of money, there are other factors that are not as telling. This means that some features are less important than others. This can be seen in our models int he feature importance graphs in figure(\ref{fig:featureImportanceDT}) and (\ref{fig:featureImportanceRF}). This also means that there will be plenty of outliers in the data. No matter how good the model is, it cannot possibly catch all of these outliers. If it did it would be overfitted. We simply cannot expect a model to have very good accuracy on this type of data set.