r - How to separate a column in dplyr based on regex -
i have following data frame:
df <- structure(list(x2 = c("bb_137.hvmsc", "bb_138.combined.hvmsc", "bb_139.combined.hvmsc", "bb_140.combined.hvmsc", "bb_141.hvmsc", "bb_142.combined.hmsc-bm")), .names = "x2", row.names = c(na, -6l), class = c("tbl_df", "tbl", "data.frame"))
which looks this
> df # tibble: 6 x 1 x2 <chr> 1 bb_137.hvmsc 2 bb_138.combined.hvmsc 3 bb_139.combined.hvmsc 4 bb_140.combined.hvmsc 5 bb_141.hvmsc 6 bb_142.combined.hmsc-bm
what want separate 2 columns (with .
separator), keeping last field second column
col1 col2 bb_137 hvmsc bb_138.combined hvmsc bb_139.combined hvmsc bb_140.combined hvmsc bb_141 hvmsc bb_142.combined hmsc-bm
what's right way it?
my attempt this:
> df %>% separate(x2, = c("sid","status", "tiss"), sep = "[.]") # tibble: 6 x 3 sid status tiss * <chr> <chr> <chr> 1 bb_137 hvmsc <na> 2 bb_138 combined hvmsc 3 bb_139 combined hvmsc 4 bb_140 combined hvmsc 5 bb_141 hvmsc <na> 6 bb_142 combined hmsc-bm
warning message: few values @ 2 locations: 1, 5
we can use negative lookahead separator in separate function.
library(tidyr) separate(data = df, col = x2, = c("col1", "col2"), sep = "(\\.)(?!.*\\.)") # col1 col2 # <chr> <chr> #1 bb_137 hvmsc #2 bb_138.combined hvmsc #3 bb_139.combined hvmsc #4 bb_140.combined hvmsc #5 bb_141 hvmsc #6 bb_142.combined hmsc-bm
regex taken this answer.
Comments
Post a Comment