r - How to get all possible orders of comma separated strings -

August 15, 2012

i tried searching r problem couldn't find useful.

i have dataframe this:

post_id new_mentions_1      new_mentions_2    1          model                          2      telephone          louis vuitton    3           uber          employee    4   united states                          5          onion         pepper, rice, garlic

and expected result expanding dataframe possible orders of new_mention_2

 post_id new_mentions_1      new_mentions_2    1          model                          2      telephone           louis vuitton    3           uber            employee    4   united states                          5          onion        pepper,rice,garlic    5          onion        rice,garlic,pepper    5          onion        garlic,pepper,rice    5          onion        pepper,garlic,rice    5          onion        garlic,rice,pepper    5          onion        rice,pepper,garlic

please me program this. have few rows 5 keywords separated commas.

there should easier way deal this, can't seem find it.

to make sure we're talking same dataframe, repost data:

df <- structure(list(new_mentions_1 = c("model", "telephone", "uber",          "united_states", "onion"), new_mentions_2 = c(na, "louis_vuitton",          "employee", na, "pepper,rice,garlic")), .names = c("new_mentions_1",          "new_mentions_2"), class = "data.frame", row.names = c(na, -5l))

first check rows in df have multiple values in new_mentions_2 column, using grep. functions returns rows in second column contains comma-value. split data frame in part doesn't need fixing (i.e. has no comma values in second column) , call newdf. part needs fixing called subdf.

we'll mess around (details below code) bit subdf possible combinations of values , append results newdf data frame:

library(gtools) # rows in df have multiple values in second column? inds <- grep(pattern = ",", df$new_mentions_2)  subdf <- df[inds, ] newdf <- df[-inds, ]  # in case have multiple 'problematic' rows, we'll loop through of them for(i in 1:nrow(subdf)){   splitted <- strsplit(subdf$new_mentions_2[i], ", ")[[1]]   n        <- length(splitted)   shuffled <- permutations(n, n)   for(j in 1:nrow(shuffled)){     val_2 <- paste(splitted[shuffled[j, ]], collapse = ", ")     val_1 <- subdf$new_mentions_1[i]     newdf <- rbind(newdf, c(val_1, val_2))   } }

the 'messing around' part performed in outer loop. firstly, value of e.g. "pepper, rice, garlic" split every , (comma + space). splitted contain c("pepper", "rice", "garlic"), possible combinations using permutations function gtools-package. in first line of inner-loop, shuffled strings put single string (paste() collapse = ", ") argument), can fit them in 1 column of data frame again.

Search This Blog

Enable

r - How to get all possible orders of comma separated strings -

Comments

Post a Comment

Popular posts from this blog

Sort a complex associative array in PHP -

vb.net - How to ignore if a cell is empty nothing -

python 2.7 - Counting the columns with missing values in a pandas dataset -