validation - Dealing with Missing Values in dataset -


upto extent should fill missing values feature in dataset doesnt become redundant ?

i have dataset has max of 42000 observations. there 3 features have around 20000, 35000 , 7000 values missing. should still use them filling these missing values or dump these 3 features?

how decide threshold keeping or dumping feature given number of missing values of feature ?

generally, can interpolate missing values nearest samples in dataset, manual pandas missing values http://pandas.pydata.org/pandas-docs/stable/missing_data.html, lists many possible techniques interpolate missing values known part of dataset.

but in case, think it's better remove 2 first features, because doubt there interpolation missing values, when have such big amount of them, more half of values.

but may try fix third feature missing values.


Comments