r - Tried streamlining w/ SDCols - got "longer object length is not a multiple of shorter object length" -
i have tried searching stackoverflow , google answers question, couldn't find applied closely enough me able apply it. however, i'm new r, it's may need little walking through it.
if use following code, works fine.
> dput(b) structure(list(dump_end_shift_date = structure(c(1420070400, 1420070400, 1420156800, 1420156800, 1420243200, 1420243200, 1420329600, 1420329600, 1420416000, 1420416000, 1420502400), class = c("posixct", "posixt"), tzone = "utc"), quantity_reporting = c(235, 219, 232, 219, 219, 219, 219, 219, 219, 219, 235), wtrecv = c(32.71, 32.71, 20.19, 33.42, 21.61, 21.61, 21.61, 20.19, 21.61, 20.19, 24.2), lc12 = c(0, 0, 0, 94, 100, 100, 100, 0, 100, 0, 100), lc34 = c(0, 100, 0, 6, 0, 0, 0, 0, 0, 0, 0), lc5 = c(0, 0, 5, 0, 0, 0, 0, 5, 0, 5, 0), = c(25, 0, 60, 0, 0, 0, 0, 60, 0, 60, 0), uc = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), ibc = c(75, 0, 35, 0, 0, 0, 0, 35, 0, 35, 0)), .names = c("dump_end_shift_date", "quantity_reporting", "wtrecv", "lc12", "lc34", "lc5", "his", "uc", "ibc"), class = c("data.table", "data.frame"), row.names = c(na, -11l), .internal.selfref = <pointer: 0x0000000005860788>) library(data.table) b_daily <- b[,.(d_tons=sum(quantity_reporting)),by=dump_end_shift_date] b_daily[,"d_wtrecv" := b[,.(d_wtrecv=sum(quantity_reporting*wtrecv)),by=dump_end_shift_date] [,.(round(d_wtrecv/d_tons, digits=2))]] b_daily[,"d_lc12" := b[,.(d_lc12=sum(quantity_reporting*lc12)),by=dump_end_shift_date] [,.(round(d_lc12/d_tons, digits=2))]] b_daily[,"d_lc34" := b[,.(d_lc34=sum(quantity_reporting*lc34)),by=dump_end_shift_date] [,.(round(d_lc34/d_tons, digits=2))]] b_daily[,"d_lc5" := b[,.(d_lc5=sum(quantity_reporting*lc5)),by=dump_end_shift_date] [,.(round(d_lc5/d_tons, digits=2))]] b_daily[,"d_his" := b[,.(d_his=sum(quantity_reporting*his)),by=dump_end_shift_date] [,.(round(d_his/d_tons, digits=2))]] b_daily[,"d_uc" := b[,.(d_uc=sum(quantity_reporting*uc)),by=dump_end_shift_date] [,.(round(d_uc/d_tons, digits=2))]] b_daily[,"d_ibc" := b[,.(d_ibc=sum(quantity_reporting*ibc)),by=dump_end_shift_date] [,.(round(d_ibc/d_tons, digits=2))]]
however, seems inelegant - think should able using sd , sdcols. tried following, test case:
b_daily2 <- b[,lapply(.sd, function (x) sum(x*b[,quantity_reporting])/sum(b[,quantity_reporting])), by=dump_end_shift_date, .sdcols=c("wtrecv")] [,.(dump_end_shift_date,d_wtrecv=round(wtrecv, digits=2))]
the resulting numbers little off, , following warning:
"in x * mqd[, quantity_reporting] : longer object length not multiple of shorter object length"
i understand indicates recycling due objects being different lengths...but don't understand why or what. appreciated. apologize in advance if elementary question. thank you.
this arguably inelegant, @ least fits single operation:
b_daily <- b[,{ d_tons = sum(quantity_reporting) d_wtrecv = round( sum(quantity_reporting*wtrecv)/d_tons, digits = 2 ) list(d_tons = d_tons, d_wtrecv = d_wtrecv) },by=dump_end_shift_date]
if there many columns d_wtrecv
, names stored in cols = c("wtrecv",...)
, then...
cols <- c("wtrecv","lc12","lc34","lc5","his","uc","ibc") b_daily2 <- b[,{ d_tons = sum(quantity_reporting) res = lapply(mget(cols), function(x) round( sum(quantity_reporting*x)/d_tons, digits = 2 ) ) c(list(d_tons = d_tons), setnames(res, paste0("d_",cols))) },by=dump_end_shift_date]
a similar approach using .sdcols
possible when a bug related it fixed.
aside. think there feature request allow first column used in computing second, like
# non-working code: b_daily <- b[,.( d_tons = sum(quantity_reporting), d_wtrecv = round( sum(quantity_reporting*wtrecv) / d_tons, digits = 2) ),by=dump_end_shift_date]
this how mutate
in dplyr package works. however, multicolumn case, dplyr more of hassle help, far can figure.
by way, may want wait on rounding. usually, it's idea printing purposes , unnecessarily worsens later calculations.
Comments
Post a Comment