r - Nested sorting of levels by the sum of another column -


i looking better way of performing sorting of levels within groups sum of column.

within each key (e.g. k1 in sample data), levels (e.g. x, y in sample data) should sorted in descending order sum of numeric column (size in sample data). more generally, might have n keys (i.e. k_1 k_n).

i cant think of other way perform aggregate first lookup rank each key sort whole table using concatenated key.

is there better way of doing using combination of base + data.table syntax i.e. faster or shorter way of solving problem?

sample data:

library(data.table)  dat <- data.table(k1=c("x","x","y","y","y"),      k2=c("k","r","r","g","g"),      k3=1:5,     size=c(1, 2.5, 4, 3.5, 1),     key=c("k1","k2","k3"))  #   k1 k2 k3 size #1:  x  k  1  1.0 #2:  x  r  2  2.5 #3:  y  g  4  3.5 #4:  y  g  5  1.0 #5:  y  r  3  4.0 

desired output:

   k1 k2 k3 size     y  g  4  3.5     y  g  5  1.0     y  r  3  4.0     x  r  2  2.5     x  k  1  1.0 

example walkthrough using sample data:

for k1, sum of size y 8.5 greater sum of size x, 3.5 hence y should above x in rank.

within k1=y,the sum of size g 4.5 greater sum of size r, 4.0, hence g should above r in rank within k1=y.

within k1=y & k2=g, size k3=4 3.5 greater size k3=5 of 1.0, hence k3=4 should come above k3=5.

what have tried far (which seems convoluted):

dat[, rank:=""] (n in seq_along(key(dat))) {     x <- key(dat)[seq_len(n)]      #name of dummy column     col <- last(paste0(x, "_rank"))      #aggregate desired key     rankdt <- dat[, list(size=sum(as.numeric(size), na.rm=true)), by=x]      #rank size column in descending order left pad rank equal number of digits     rankdt[,(col) := formatc(         frank(-size, ties.method="first"),          width=ceiling(.n/10),          format="d",          flag="0")]      #concatenate rank existing list of rank     dat[rankdt,         rank := paste0(rank, get(col)),         on=x] } dat[order(rank)] 

test case 2

dat <- data.table(k1=c("x","x","y","y","y"),      k2=c("k","r","r","g","g"),      k3=1:5,     size=c(1, 16, 4, 3.5, 10),     key=c("k1","k2","k3"))  #   k1 k2 k3 size #1:  x  k  1  1.0 #2:  x  r  2 16.0 #3:  y  g  4  3.5 #4:  y  g  5 10.0 #5:  y  r  3  4.0 

desired output test case 2:

   k1 k2 k3 size rank     y  g  5 10.0  122     y  g  4  3.5  124     y  r  3  4.0  133     x  r  2 16.0  211     x  k  1  1.0  245 

tried searching "recursive sort", "nested sort", "group + aggregate + rank". these either sort alphabetically and/or sort particular column rather sort based on column.

thanks!


edit: shorter soln inspired @jaap

#factor shift previous sum when summing latter sums, previous ordering not corrupted m <- dat[,max(abs(size))]  total <- reduce(f=function(init, keys) {         m * init + dat[, s:=sum(size), by=keys]$s     },      x=reduce(c, key(dat), accumulate=true),      init=dat[, rep(0, .n)])  dat[order(-total)] 

with:

cols <- setdiff(key(dat), 'k3')  dat[, ksum := sum(size), = cols][] setorderv(dat, c('ksum', cols), order = -1) 

you get:

   k1 k2 k3 size ksum 1:  y  g  4  3.5  4.5 2:  y  g  5  1.0  4.5 3:  y  r  3  4.0  4.0 4:  x  r  2  2.5  2.5 5:  x  k  1  1.0  1.0 

what does:

  • cols <- key(dat) creates vector of column-names.
  • dat[, ksum := sum(size), = cols] sums values grouping columns , adds new variable ksum tot data.table.
  • setorderv(dat, c('ksum', cols), order = -1) reorders data.table reference set of columns (including sum-column: c('ksum', cols)) in descending order (order = -1).

if don't want keep ksum-column, can remove dat[, ksum := null]:

> dat[, ksum := null][]    k1 k2 k3 size 1:  y  g  4  3.5 2:  y  g  5  1.0 3:  y  r  3  4.0 4:  x  r  2  2.5 5:  x  k  1  1.0 

Comments

Popular posts from this blog

Sort a complex associative array in PHP -

vb.net - How to ignore if a cell is empty nothing -

recursion - Can every recursive algorithm be improved with dynamic programming? -