regex - Python match word to word list after removing repeating characters -
i have list of words positive , negative sentiment e.g. ['happy', 'sad']
now when processing tweets i'm removing repeating characters (allowing 2 repetitions):
happpppyyy -> happyy saaad -> saad
the check if e.g. saad
part of word list should return true
because similar sad
.
how can implement behaviour?
i build regular expressions dynamically turning word:
happy
into
h+a+p+p+y+
pass list of "happy" words this:
import re re_list = [re.compile("".join(["{}+".format(c) c in x])) x in ['happy', 'glad']]
then test (using any
return true
if happy regex matches:
for w in ["haaappy","saad","glaad"]: print(w,any(re.match(x,w) x in re_list))
result:
haaappy true saad false glaad true
Comments
Post a Comment