validation - Dealing with Missing Values in dataset -


upto extent should fill missing values feature in dataset doesnt become redundant ?

i have dataset has max of 42000 observations. there 3 features have around 20000, 35000 , 7000 values missing. should still use them filling these missing values or dump these 3 features?

how decide threshold keeping or dumping feature given number of missing values of feature ?

generally, can interpolate missing values nearest samples in dataset, manual pandas missing values http://pandas.pydata.org/pandas-docs/stable/missing_data.html, lists many possible techniques interpolate missing values known part of dataset.

but in case, think it's better remove 2 first features, because doubt there interpolation missing values, when have such big amount of them, more half of values.

but may try fix third feature missing values.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -