regex - Remove a portion of a randomized string over an entire dataframe column in R -

September 15, 2015

need removing random text in string appears before address (data set has ~5000 observations). dataframe test2$address reads follows:

addresses <- c(   "140 national plz oxon hill, md 20745",   "6324 windsor mill rd gwynn oak, md 21207",   "23030 indian creek dr sterling, va 20166",   "located in reston town center 18882 explorer st reston, va 20190" )

i want spit out addresses in common format:

[885] "23030 indian creek dr sterling, va 20166"  [886] "18882 explorer st reston, va 20190"

not sure how go doing there no specific pattern text comes before address number.

if know address portion want start digits, , part want remove text, can use this:

sub(".*?(\\d+)", "\\1", x)

output:

[1] "140 national plz oxon hill, md 20745"     [2] "6324 windsor mill rd gwynn oak, md 21207" [3] "23030 indian creek dr sterling, va 20166" [4] "18882 explorer st reston, va 20190"

what remove (.*) before first (?) digit series (\\d+).

sample data:

x <- c("140 national plz oxon hill, md 20745",        "6324 windsor mill rd gwynn oak, md 21207",        "23030 indian creek dr sterling, va 20166",        "located in reston town center 18882 explorer st reston, va 20190")

Search This Blog

Enable

regex - Remove a portion of a randomized string over an entire dataframe column in R -

Comments

Post a Comment

Popular posts from this blog

resizing Telegram inline keyboard -

javascript - How to bind ViewModel Store to View? -

recursion - Can every recursive algorithm be improved with dynamic programming? -