r - merge with condition on two dataframes -


similar unanswered question here: merge 2 dataframes condition on timestamp question more general , not time-series specific

i have 2 dataframes want merge proximity condition on 2 columns both dataframes. in sql have done like

select * (select * a) t1 inner join (select * b) t2 on t1.user = t2.user , t1.label - t2.label < 2

what i'm looking canincal way above in r, like-

merge(x,y,by='user', condition = x$label - y$label <=2 ) 

so following should without rows 3,7,11,12,13,15 etc....

set.seed(1212) <- data.frame(user=rep(paste("u",1:3,sep=''),4),label=sample.int(10,12,t)) b <- data.frame(user=rep(paste("u",1:3,sep=''),4),label=sample.int(10,12,t))  merge(a,b,by='user')     user label.x label.y 1    u1       3       1 2    u1       3       3 3    u1       3      10 4    u1       3       5 5    u1       3       1 6    u1       3       3 7    u1       3      10 8    u1       3       5 9    u1       1       1 10   u1       1       3 11   u1       1      10 12   u1       1       5 13   u1       4       1 14   u1       4       3 15   u1       4      10 16   u1       4       5 17   u2       7       1 18   u2       7       7 19   u2       7       4 20   u2       7       2 21   u2       2       1 22   u2       2       7 23   u2       2       4 24   u2       2       2 25   u2       6       1 26   u2       6       7 27   u2       6       4 28   u2       6       2 29   u2       1       1 30   u2       1       7 31   u2       1       4 32   u2       1       2 33   u3       8       7 34   u3       8       1 35   u3       8       7 36   u3       8       4 37   u3       1       7 38   u3       1       1 39   u3       1       7 40   u3       1       4 41   u3      10       7 42   u3      10       1 43   u3      10       7 44   u3      10       4 45   u3       9       7 46   u3       9       1 47   u3       9       7 48   u3       9       4 

with base r, try:

subset(merge(a, b, = 'user'), label.x - label.y <= 2) 

if data huge, can try non-equi join data.table:

library(data.table) setdt(a) setdt(b)  a[, label.a := label - 2] b[, label.b := label] y <- a[b, on = .(user, label.a <= label.b), allow.cartesian=true][, label.a := null] 
> head(y)    user label i.label 1:   u1     1       1 2:   u1     3       1 3:   u1     3       1 4:   u2     1       7 5:   u2     2       7 6:   u2     6       7 

here, label in y label a, , label.i label b.

or more explictly, use j expression:

library(data.table) setdt(a) setdt(b)  a[, label2 := label - 2] y <- a[     b,     .(user, label.a = label, label.b = i.label),     on = .(user, label2 <= label),     allow.cartesian=true] 

Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -