124 lines
5.9 KiB
TeX
124 lines
5.9 KiB
TeX
%This template (2020-03-06) is a modified version by Magnus Andersson and Jesper Erixon of the Stylish Article LaTeX Template Version 2.1 (1/10/15)
|
|
% Original author:
|
|
% Mathias Legrand (legrand.mathias@gmail.com)
|
|
%
|
|
% License:
|
|
% CC BY-NC-SA 3.0
|
|
|
|
%----------------------------
|
|
\documentclass[fleqn,10pt]{SelfArx} % Document font size and equations flushed left
|
|
\usepackage[english]{babel} % Specify a different language here - English by default
|
|
\usepackage{lipsum} % Required to insert dummy text. To be removed otherwise
|
|
\usepackage{float} % Allow you to set the figure at a specific position, use mainly H
|
|
% Additional packages
|
|
\usepackage{subcaption} % Allow you to create subfigures with individual captions
|
|
%----------------------------
|
|
% COLUMNS
|
|
%----------------------------
|
|
\setlength{\columnsep}{0.55cm} % Distance between the two columns of text
|
|
\setlength{\fboxrule}{0.75pt} % Width of the border around the abstract
|
|
%----------------------------
|
|
% COLORS
|
|
%----------------------------
|
|
\definecolor{color1}{RGB}{0,0,90} % Color of the article title and sections
|
|
\definecolor{color2}{RGB}{0,20,20} % Color of the boxes behind the abstract and headings
|
|
%----------------------------
|
|
% HYPERLINKS
|
|
%----------------------------
|
|
\usepackage{hyperref} % Required for hyperlinks
|
|
\hypersetup{hidelinks,colorlinks,breaklinks=true,urlcolor=color2,citecolor=color1,linkcolor=color1,bookmarksopen=false,pdftitle={Title},pdfauthor={Author}}
|
|
\urlstyle{same} % Sets url font
|
|
\usepackage{cleveref} % Added: Use cleveref to be able to reference subfigures e.g. Fig 1(a) etc.
|
|
\captionsetup[subfigure]{subrefformat=simple,labelformat=simple} % Added: Setup subfigure label
|
|
\renewcommand\thesubfigure{(\alph{subfigure})}
|
|
%----------------------------
|
|
% ARTICLE INFORMATION
|
|
%----------------------------
|
|
\JournalInfo{Department of Physics, Umeå University}
|
|
\Archive{\today}
|
|
|
|
\PaperTitle{Write the title of your report here} % Article title
|
|
|
|
\Authors{John Smith\textsuperscript{1}*, Jennie Smith\textsuperscript{1}} % Authors
|
|
\affiliation{\textsuperscript{1}\textit{Department of Physics, Umeå University, Umeå, Sweden}} % Author affiliation
|
|
\affiliation{*\textbf{Corresponding author}: john@smith.com} % Corresponding author
|
|
\affiliation{*\textbf{Supervisor}: joe@doe.com}
|
|
\Keywords{Optics --- Interference --- Diffraction} % Keywords - if you don't want any simply remove all the text between the curly brackets
|
|
\newcommand{\keywordname}{Keywords} % Defines the keywords heading name
|
|
%----------------------------
|
|
% ABSTRACT
|
|
%----------------------------
|
|
\Abstract{}
|
|
|
|
%----------------------------
|
|
\begin{document}
|
|
|
|
\flushbottom % Makes all text pages the same height
|
|
|
|
\maketitle % Print the title and abstract box
|
|
|
|
\tableofcontents % Print the contents section
|
|
|
|
\thispagestyle{empty} % Removes page numbering from the first page
|
|
|
|
%----------------------------
|
|
% ARTICLE CONTENTS
|
|
%----------------------------
|
|
|
|
%----------------------------
|
|
\section{Introduction}
|
|
|
|
\section{Data analysis}
|
|
|
|
\subsection{Dataset}
|
|
%https://www.kaggle.com/datasets/mosapabdelghany/adult-income-prediction-dataset
|
|
The dataset we decided to study is a labeled income prediction dataset. This dataset includes 14 features with information about the people in the srudy and a label with the income as either more than 50 000\$ per year or less than or equal to 50 000 \$ per year. This means that we are looking at a binary classification problem. A lot of the features are discrete where only a set number of options available. This includes features such as marital status, education and working class. The dataset features around 32500 data points.
|
|
|
|
\subsection{Data cleaning and feature engineering}
|
|
There were a couple of things with our dataset that had to be modified in order for it to be usable in our ML application. We find that some of the features are redundant or not interesting in our project. We romove the redundant feature education since there is another already numerically encoded feature containing the same data. We also chose to remove the feature 'fnlwgt' since it is a already calculated number that is used by the Census Bureau to estimate population statistics. Since we want to estimate the population statistics based on the other features and not the already calculated weight we remove this feature. We have a mix of numerical and non-numerical features in our dataset. Since the machine learning models cannot use non-numerical data we have to encode the non-numercial data into corresponding numbers. This is with the label encoder built into sci-kit learn and used on all non-numerical data.
|
|
\subsection{Handling missing values}
|
|
With our numerical version of the dataset we found with the info function in pandas that around 2500 values were NaN values. We reasoned that filling these values with something as the mean of the category does not make very much sense for our application. Since there are many discrete categories a mean value means nothing. Especially since we gave many categories arbitrary numbers
|
|
|
|
\section{Model selection}
|
|
|
|
\section{Model Training and Hyperparameter Tuning}
|
|
|
|
\section{Model Evaluations}
|
|
\section{}
|
|
|
|
|
|
%----------------------------
|
|
% REFERENCE LIST
|
|
%----------------------------
|
|
\bibliographystyle{model1-num-names}
|
|
\begin{thebibliography}{4}
|
|
\bibitem{Steinhaus:Mathematical}
|
|
Steinhaus, H.,
|
|
Mathematical Snapshots,
|
|
3rd Edition. New York: Dover, pp. 93-94,
|
|
(1999)
|
|
|
|
\bibitem{Greivenkamp:FieldGuide}
|
|
Greivenkamp,
|
|
J. E., Field Guide to Geometrical Optics,
|
|
SPIE Press,
|
|
Bellingham, WA,
|
|
(2004)
|
|
|
|
\bibitem{Pedrotti:Introduction}
|
|
Pedrotti, F.L. and Pedrotti, L.S.,
|
|
Introduction to Optics,
|
|
3rd Edition,
|
|
Addison-Wesley,
|
|
(2006)
|
|
|
|
\bibitem{Davis:ChemWiki}
|
|
UC Davis ChemWiki,
|
|
Propagation of Error,
|
|
Available at: \url{https://chem.libretexts.org/Textbook_Maps/Analytical_Chemistry/Supplemental_Modules_(Analytical_Chemistry)/Quantifying_Nature/Significant_Digits/Propagation_of_Error},
|
|
(Accessed: 10th March 2016).
|
|
\end{thebibliography}
|
|
|
|
|
|
|
|
\end{document} |