r - How are results of complete.cases() and data[is.na(data)] <- 0 different? -

August 15, 2012

i have dataframe data , after several computations on it, final dataframe df.final has missing values in it.

before going ahead further calculations on df.final, better off making missing values zero's by

data[id.na(data)] <- 0

as mentioned here @ how replace na values zeros in r?, or doing

df.final <- df.final[complete.cases(df.final), ] # considering one's without na

be more beneficial?

how 2 different?

if set na zero, effect on calculations if measured , got zero. if you're measuring temperatures in july, you'll results if had few frosty days in there. average temperature lower.

if set na.rm=t or use complete.cases, effect if that measurement never happened (which case, really). our average temperature in july average days did measure.

if have few isolated na values (sum(is.na())) might want set them 0 (or other sensible value, in example average temperature in july might good).

i set 0 if there vanishingly few (so don't care it's skewing measurements) or if 0 sensible value (for example, if want work experience in months, na might mean "no experience").

software soft: if dataset small enough, can try both , observe how affects your data.

Search This Blog

Enable

r - How are results of complete.cases() and data[is.na(data)] <- 0 different? -

Comments

Post a Comment

Popular posts from this blog

resizing Telegram inline keyboard -

javascript - How to bind ViewModel Store to View? -

python - Alternative to referencing variable before assignment -