Sike
This commit is contained in:
@@ -41,7 +41,7 @@
|
||||
|
||||
\Authors{Petrus Einarsson\textsuperscript{1}*, Jakob Nyström\textsuperscript{1}*} % Authors
|
||||
\affiliation{\textsuperscript{1}\textit{Department of Physics, Umeå University, Umeå, Sweden}} % Author affiliation
|
||||
\affiliation{*\textbf{Corresponding authors}: peei0011@student.umu.se, jany0047@student.umu.se} % Corresponding author
|
||||
\affiliation{*\textbf{Corresponding authors}: peei0011@student.umu.se, jany0047@student.umu.se } % Corresponding author
|
||||
\affiliation{*\textbf{Supervisor}: shahab.fatemi@umu.se}
|
||||
\Keywords{} % Keywords - if you don't want any simply remove all the text between the curly brackets
|
||||
\newcommand{\keywordname}{Keywords} % Defines the keywords heading name
|
||||
@@ -197,7 +197,7 @@ Table (\ref{dt_metrics}) and (\ref{rf_metrics}) shows the class-wise metrics of
|
||||
|
||||
At a first glance at both the confusion matricies and the performance metrics the models do not look to be that good. But what has to be considered is the data that we are analyzing. We are looking at what possible indicators there are for a person to earn more than a certain amount of money. This is real world data and in the real world there is a lot of unique ways of earning money. While there certainly are some indicators that will clearly tell that somebody is earning a lot of money, there are other factors that are not as telling. This means that some features are less important than others. This can be seen in our models in the feature importance graphs in figure(\ref{fig:featureImportanceDT}) and (\ref{fig:featureImportanceRF}). This also means that there will be plenty of outliers in the data. No matter how good the model is, it cannot possibly catch all of these outliers. If it did it would be overfitted. We simply cannot expect a model to have very good accuracy on this type of data set.
|
||||
|
||||
An important thing to touch on is the poor fit on higher-earning people by our model. We see that both models produce a precision of 77\% on the lower-earning individuals, which is quite bad compared to the precision of 87\% and 89\% on the higher-earning individuals. This means that out of all individuals predicted as higher-earning, only 77\% are correctly predicted. Even more notably, there is a very big discrepancy on the recall between the two classes. Recalls of 56\% and 63\% for the higher-earning class compared to 95\% and 94\% shows that out of all the higher-earning individuals, the models are not good at correctly detecting them as higher-earning. As we talked about above there may be many reasons for this poor fit. Of note is that we have optimized this model to find the best accuracy on all data point. We therefore stride to classify as many total data points correctly as possible and not on getting the best average for the classes separetly. Since there are more lower-earning people in our dataset it is very reasonable for the model to have optimised for that as well since it gives the best weighted accuracy. As previosly stated, the scoring metrics used for training the models should be adapted based on the problem at hand. If the problem requires similiar metrics across the classes, one should instead consider using scoring metrics such as balanced accuracy score, which are adapted to produce such results.
|
||||
An important thing to touch on is the poor fit on higher-earning people by our model. We see that both models produce a precision of 77\% on the lower-earning individuals, which is quite bad compared to the precision of 87\% and 89\% on the higher-earning individuals. This means that out of all individuals predicted as higher-earning, only 77\% are correctly predicted. Even more notably, there is a very big discrepancy on the recall between the two classes. Recalls of 56\% and 63\% for the higher-earning class compared to 95\% and 94\% shows that out of all the higher-earning individuals, the models are not good at correctly detecting them as higher-earning. Additionally, the F1-score of both classes demonstrates the discrepancy of the overall performances across the classes. It shows that the harmonic average of precision and recall is significantly lower for the lower-earning individuals than for the higher earning individuals. As we talked about above there may be many reasons for this poor fit. Of note is that we have optimized this model to find the best accuracy on all data point. We therefore stride to classify as many total data points correctly as possible and not on getting the best average for the classes separetly. Since there are more lower-earning people in our dataset it is very reasonable for the model to have optimised for that as well since it gives the best weighted accuracy. As previosly stated, the scoring metrics used for training the models should be adapted based on the problem at hand. If the problem requires similiar metrics across the classes, one should instead consider using scoring metrics such as balanced accuracy score, which are adapted to produce such results.
|
||||
|
||||
\subsection{Overfitting and Underfitting}
|
||||
We spent some time tuning the hyperparameters to ensure that we did not overfit. If we compare the validation results with the test results we see that the performance metrics do not change much at all. This is what we want to see as this means that we have avoidede overfitting the model. This means that our model could be used on other similar datasets and hopefully give similar perfomances. We also do not want our model to be underfit. This is a bit harder to validate as we want the errors to be as small as possible for both training and testing and as we stated before I believe that this is a difficult dataaset to get a great fit to. Therefore we believe that we have found a model that has a decent enough balance between bias and variance.
|
||||
|
||||
Reference in New Issue
Block a user