r - How are results of complete.cases() and data[is.na(data)] <- 0 different? -
i have dataframe data
, after several computations on it, final dataframe df.final
has missing values in it.
before going ahead further calculations on df.final
, better off making missing values zero's by
data[id.na(data)] <- 0
as mentioned here @ how replace na values zeros in r?, or doing
df.final <- df.final[complete.cases(df.final), ] # considering one's without na
be more beneficial?
how 2 different?
if set na
zero, effect on calculations if measured , got zero. if you're measuring temperatures in july, you'll results if had few frosty days in there. average temperature lower.
if set na.rm=t
or use complete.cases
, effect if that measurement never happened (which case, really). our average temperature in july average days did measure.
if have few isolated na values (sum(is.na())
) might want set them 0 (or other sensible value, in example average temperature in july might good).
i set 0 if there vanishingly few (so don't care it's skewing measurements) or if 0 sensible value (for example, if want work experience in months, na
might mean "no experience").
software soft: if dataset small enough, can try both , observe how affects your data.
Comments
Post a Comment