NA values and R aggregate function -


here's simple data frame missing value:

m = data.frame( name = c('name','name'), col1 = c(na,1) , col2 = c(1,1))

when apply aggregate m way:

aggregate(.~name, m, fun=sum, na.rm=true)

the result is:

rowname col1 col2 name    1    1 

so entire first row ignored. if do

aggregate(m[,2:3], by=list(m$name), fun=sum, na.rm=true)

the result is

group.1 col1 col2 name    1    2 

so (1,1) entry ignored.

this caused major debugging headache in 1 of codes, since thought these 2 calls equivalent. there reason why "formula" entry method treated differently?

thanks.

good question, in opinion, shouldn't have caused major debugging headache because documented quite in multiple places in manual page aggregate.

first, in usage section:

## s3 method class 'formula' aggregate(formula, data, fun, ...,           subset, na.action = na.omit) 

later, in description:

na.action: function indicates should happen when data contain na values. default ignore missing values in given variables.


i can't answer why formula mode written differently---that's function authors have answer---but using above information, can use following:

aggregate(.~name, m, fun=sum, na.rm=true, na.action=null) #   name col1 col2 # 1 name    1    2 

Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -