r - How are results of complete.cases() and data[is.na(data)] <- 0 different? -


i have dataframe data , after several computations on it, final dataframe df.final has missing values in it.

before going ahead further calculations on df.final, better off making missing values zero's by

data[id.na(data)] <- 0 

as mentioned here @ how replace na values zeros in r?, or doing

df.final <- df.final[complete.cases(df.final), ] # considering one's without na  

be more beneficial?

how 2 different?

if set na zero, effect on calculations if measured , got zero. if you're measuring temperatures in july, you'll results if had few frosty days in there. average temperature lower.

if set na.rm=t or use complete.cases, effect if that measurement never happened (which case, really). our average temperature in july average days did measure.

if have few isolated na values (sum(is.na())) might want set them 0 (or other sensible value, in example average temperature in july might good).

i set 0 if there vanishingly few (so don't care it's skewing measurements) or if 0 sensible value (for example, if want work experience in months, na might mean "no experience").

software soft: if dataset small enough, can try both , observe how affects your data.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -