validation - Dealing with Missing Values in dataset -
upto extent should fill missing values feature in dataset doesnt become redundant ?
i have dataset has max of 42000 observations. there 3 features have around 20000, 35000 , 7000 values missing. should still use them filling these missing values or dump these 3 features?
how decide threshold keeping or dumping feature given number of missing values of feature ?
generally, can interpolate missing values nearest samples in dataset, manual pandas missing values http://pandas.pydata.org/pandas-docs/stable/missing_data.html, lists many possible techniques interpolate missing values known part of dataset.
but in case, think it's better remove 2 first features, because doubt there interpolation missing values, when have such big amount of them, more half of values.
but may try fix third feature missing values.
Comments
Post a Comment