regex - Remove a portion of a randomized string over an entire dataframe column in R -
need removing random text in string appears before address (data set has ~5000 observations). dataframe test2$address
reads follows:
addresses <- c( "140 national plz oxon hill, md 20745", "6324 windsor mill rd gwynn oak, md 21207", "23030 indian creek dr sterling, va 20166", "located in reston town center 18882 explorer st reston, va 20190" )
i want spit out addresses in common format:
[885] "23030 indian creek dr sterling, va 20166" [886] "18882 explorer st reston, va 20190"
not sure how go doing there no specific pattern text comes before address number.
if know address portion want start digits, , part want remove text, can use this:
sub(".*?(\\d+)", "\\1", x)
output:
[1] "140 national plz oxon hill, md 20745" [2] "6324 windsor mill rd gwynn oak, md 21207" [3] "23030 indian creek dr sterling, va 20166" [4] "18882 explorer st reston, va 20190"
what remove (.*) before first (?) digit series (\\d+).
sample data:
x <- c("140 national plz oxon hill, md 20745", "6324 windsor mill rd gwynn oak, md 21207", "23030 indian creek dr sterling, va 20166", "located in reston town center 18882 explorer st reston, va 20190")
Comments
Post a Comment