NA values and R aggregate function -

May 15, 2015

here's simple data frame missing value:

m = data.frame( name = c('name','name'), col1 = c(na,1) , col2 = c(1,1))

when apply aggregate m way:

aggregate(.~name, m, fun=sum, na.rm=true)

the result is:

rowname col1 col2 name    1    1

so entire first row ignored. if do

aggregate(m[,2:3], by=list(m$name), fun=sum, na.rm=true)

the result is

group.1 col1 col2 name    1    2

so (1,1) entry ignored.

this caused major debugging headache in 1 of codes, since thought these 2 calls equivalent. there reason why "formula" entry method treated differently?

thanks.

good question, in opinion, shouldn't have caused major debugging headache because documented quite in multiple places in manual page aggregate.

first, in usage section:

## s3 method class 'formula' aggregate(formula, data, fun, ...,           subset, na.action = na.omit)

later, in description:

na.action: function indicates should happen when data contain na values. default ignore missing values in given variables.

i can't answer why formula mode written differently---that's function authors have answer---but using above information, can use following:

aggregate(.~name, m, fun=sum, na.rm=true, na.action=null) #   name col1 col2 # 1 name    1    2

Search This Blog

Enable

NA values and R aggregate function -

Comments

Post a Comment

Popular posts from this blog

resizing Telegram inline keyboard -

javascript - How to bind ViewModel Store to View? -

javascript - Solution fails to pass one test with large inputs? -