r - How to get all possible orders of comma separated strings -
i tried searching r problem couldn't find useful.
i have dataframe this:
post_id new_mentions_1 new_mentions_2 1 model 2 telephone louis vuitton 3 uber employee 4 united states 5 onion pepper, rice, garlic and expected result expanding dataframe possible orders of new_mention_2
post_id new_mentions_1 new_mentions_2 1 model 2 telephone louis vuitton 3 uber employee 4 united states 5 onion pepper,rice,garlic 5 onion rice,garlic,pepper 5 onion garlic,pepper,rice 5 onion pepper,garlic,rice 5 onion garlic,rice,pepper 5 onion rice,pepper,garlic please me program this. have few rows 5 keywords separated commas.
there should easier way deal this, can't seem find it.
to make sure we're talking same dataframe, repost data:
df <- structure(list(new_mentions_1 = c("model", "telephone", "uber", "united_states", "onion"), new_mentions_2 = c(na, "louis_vuitton", "employee", na, "pepper,rice,garlic")), .names = c("new_mentions_1", "new_mentions_2"), class = "data.frame", row.names = c(na, -5l)) first check rows in df have multiple values in new_mentions_2 column, using grep. functions returns rows in second column contains comma-value. split data frame in part doesn't need fixing (i.e. has no comma values in second column) , call newdf. part needs fixing called subdf.
we'll mess around (details below code) bit subdf possible combinations of values , append results newdf data frame:
library(gtools) # rows in df have multiple values in second column? inds <- grep(pattern = ",", df$new_mentions_2) subdf <- df[inds, ] newdf <- df[-inds, ] # in case have multiple 'problematic' rows, we'll loop through of them for(i in 1:nrow(subdf)){ splitted <- strsplit(subdf$new_mentions_2[i], ", ")[[1]] n <- length(splitted) shuffled <- permutations(n, n) for(j in 1:nrow(shuffled)){ val_2 <- paste(splitted[shuffled[j, ]], collapse = ", ") val_1 <- subdf$new_mentions_1[i] newdf <- rbind(newdf, c(val_1, val_2)) } } the 'messing around' part performed in outer loop. firstly, value of e.g. "pepper, rice, garlic" split every , (comma + space). splitted contain c("pepper", "rice", "garlic"), possible combinations using permutations function gtools-package. in first line of inner-loop, shuffled strings put single string (paste() collapse = ", ") argument), can fit them in 1 column of data frame again.
Comments
Post a Comment