text processing - R Cleaning and reordering names/serial numbers in data frame -


let's have data frame follows in r:

 data <- data.frame("serialnum" = character(), "year" = integer(), "name" = character(), stringsasfactors = f)  data[1,] <- c("983\n837\n424\n ", 2015, "michael\nlewis\npaul\n ")  data[2,] <- c("123\n456\n789\n136", 2014, "elaine\njerry\ngeorge\nkramer")  data[3,] <- c("987\n654\n321\n975\n ", 2010, "john\npaul\ngeorge\nringo\nna")  data[4,] <- c("424\n983\n837", 2015, "paul\nmichael\nlewis")  data[5,] <- c("456\n789\n123\n136", 2014, "jerry\ngeorge\nelaine\nkramer") 

what want following:

  1. split each string of names , each string of serial numbers own vectors (or list of string vectors).
  2. eliminate character "na" in either set of vectors or blank spaces denoted "...\n ".
  3. reorder each list of names alphabetically , reorder corresponding serial numbers according same permutation.
  4. concatenate each vector in same fashion (i paste(., collapse = "\n")).

my issue how without using loop. object-oriented way this? first attempt in direction made list command list <- strsplit(data$name, split = "\n") , here need loop in order find permutations of names, seems process won't scale according actual data. additionally, once make list list i'm not sure how go removing na symbols or blank spaces. appreciated!

using lapply take each row of data frame , turn new data frame 1 name per row. creates list of 5 data frames, 1 each row of original data frame.

 seinfeld = lapply(1:nrow(data), function(i) {     # turn strings data frame 1 name per row    dat = data.frame(serialnum=unlist(strsplit(data[i,"serialnum"], split="\n")),                year=data[i,"year"],               name=unlist(strsplit(data[i,"name"], split="\n")))     # rid of empty strings , na values    dat = dat[!(dat$name %in% c(""," ","na")), ]     # order alphabetically    dat = dat[order(dat$name), ]  }) 

update: based on comment, let me know if result you're trying achieve:

seinfeld = lapply(1:nrow(data), function(i) {    # turn strings data frame 1 name per row   dat = data.frame(serialnum=unlist(strsplit(data[i,"serialnum"], split="\n")),                     name=unlist(strsplit(data[i,"name"], split="\n")))    # rid of empty strings , na values   dat = dat[!(dat$name %in% c(""," ","na")), ]    # order alphabetically   dat = dat[order(dat$name), ]    # collapse single row new sort order   dat = data.frame(serialnum=paste(dat[, "serialnum"], collapse="\n"),                    year=data[i, "year"],                    name=paste(dat[, "name"], collapse="\n"))  })  do.call(rbind, seinfeld)             serialnum year                          name 1      837\n983\n424 2015          lewis\nmichael\npaul 2 123\n789\n456\n136 2014 elaine\ngeorge\njerry\nkramer 3 321\n987\n654\n975 2010     george\njohn\npaul\nringo 4      837\n983\n424 2015          lewis\nmichael\npaul 5 123\n789\n456\n136 2014 elaine\ngeorge\njerry\nkramer 

Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -