r - Nested sorting of levels by the sum of another column -
i looking better way of performing sorting of levels within groups sum of column.
within each key (e.g. k1 in sample data), levels (e.g. x, y in sample data) should sorted in descending order sum of numeric column (size in sample data). more generally, might have n keys (i.e. k_1 k_n).
i cant think of other way perform aggregate first lookup rank each key sort whole table using concatenated key.
is there better way of doing using combination of base + data.table syntax i.e. faster or shorter way of solving problem?
sample data:
library(data.table) dat <- data.table(k1=c("x","x","y","y","y"), k2=c("k","r","r","g","g"), k3=1:5, size=c(1, 2.5, 4, 3.5, 1), key=c("k1","k2","k3")) # k1 k2 k3 size #1: x k 1 1.0 #2: x r 2 2.5 #3: y g 4 3.5 #4: y g 5 1.0 #5: y r 3 4.0 desired output:
k1 k2 k3 size y g 4 3.5 y g 5 1.0 y r 3 4.0 x r 2 2.5 x k 1 1.0 example walkthrough using sample data:
for k1, sum of size y 8.5 greater sum of size x, 3.5 hence y should above x in rank.
within k1=y,the sum of size g 4.5 greater sum of size r, 4.0, hence g should above r in rank within k1=y.
within k1=y & k2=g, size k3=4 3.5 greater size k3=5 of 1.0, hence k3=4 should come above k3=5.
what have tried far (which seems convoluted):
dat[, rank:=""] (n in seq_along(key(dat))) { x <- key(dat)[seq_len(n)] #name of dummy column col <- last(paste0(x, "_rank")) #aggregate desired key rankdt <- dat[, list(size=sum(as.numeric(size), na.rm=true)), by=x] #rank size column in descending order left pad rank equal number of digits rankdt[,(col) := formatc( frank(-size, ties.method="first"), width=ceiling(.n/10), format="d", flag="0")] #concatenate rank existing list of rank dat[rankdt, rank := paste0(rank, get(col)), on=x] } dat[order(rank)] test case 2
dat <- data.table(k1=c("x","x","y","y","y"), k2=c("k","r","r","g","g"), k3=1:5, size=c(1, 16, 4, 3.5, 10), key=c("k1","k2","k3")) # k1 k2 k3 size #1: x k 1 1.0 #2: x r 2 16.0 #3: y g 4 3.5 #4: y g 5 10.0 #5: y r 3 4.0 desired output test case 2:
k1 k2 k3 size rank y g 5 10.0 122 y g 4 3.5 124 y r 3 4.0 133 x r 2 16.0 211 x k 1 1.0 245 tried searching "recursive sort", "nested sort", "group + aggregate + rank". these either sort alphabetically and/or sort particular column rather sort based on column.
thanks!
edit: shorter soln inspired @jaap
#factor shift previous sum when summing latter sums, previous ordering not corrupted m <- dat[,max(abs(size))] total <- reduce(f=function(init, keys) { m * init + dat[, s:=sum(size), by=keys]$s }, x=reduce(c, key(dat), accumulate=true), init=dat[, rep(0, .n)]) dat[order(-total)]
with:
cols <- setdiff(key(dat), 'k3') dat[, ksum := sum(size), = cols][] setorderv(dat, c('ksum', cols), order = -1) you get:
k1 k2 k3 size ksum 1: y g 4 3.5 4.5 2: y g 5 1.0 4.5 3: y r 3 4.0 4.0 4: x r 2 2.5 2.5 5: x k 1 1.0 1.0
what does:
cols <- key(dat)creates vector of column-names.dat[, ksum := sum(size), = cols]sums values grouping columns , adds new variableksumtot data.table.setorderv(dat, c('ksum', cols), order = -1)reorders data.table reference set of columns (including sum-column:c('ksum', cols)) in descending order (order = -1).
if don't want keep ksum-column, can remove dat[, ksum := null]:
> dat[, ksum := null][] k1 k2 k3 size 1: y g 4 3.5 2: y g 5 1.0 3: y r 3 4.0 4: x r 2 2.5 5: x k 1 1.0
Comments
Post a Comment