validation - Dealing with Missing Values in dataset -

September 15, 2014

upto extent should fill missing values feature in dataset doesnt become redundant ?

i have dataset has max of 42000 observations. there 3 features have around 20000, 35000 , 7000 values missing. should still use them filling these missing values or dump these 3 features?

how decide threshold keeping or dumping feature given number of missing values of feature ?

generally, can interpolate missing values nearest samples in dataset, manual pandas missing values http://pandas.pydata.org/pandas-docs/stable/missing_data.html, lists many possible techniques interpolate missing values known part of dataset.

but in case, think it's better remove 2 first features, because doubt there interpolation missing values, when have such big amount of them, more half of values.

but may try fix third feature missing values.

Search This Blog

Enable

validation - Dealing with Missing Values in dataset -

Comments

Post a Comment

Popular posts from this blog

resizing Telegram inline keyboard -

javascript - How to bind ViewModel Store to View? -

recursion - Can every recursive algorithm be improved with dynamic programming? -