regex - Python match word to word list after removing repeating characters -


i have list of words positive , negative sentiment e.g. ['happy', 'sad']

now when processing tweets i'm removing repeating characters (allowing 2 repetitions):

happpppyyy -> happyy  saaad -> saad 

the check if e.g. saad part of word list should return true because similar sad.

how can implement behaviour?

i build regular expressions dynamically turning word:

happy 

into

h+a+p+p+y+ 

pass list of "happy" words this:

import re  re_list = [re.compile("".join(["{}+".format(c) c in x])) x in ['happy', 'glad']] 

then test (using any return true if happy regex matches:

for w in ["haaappy","saad","glaad"]:     print(w,any(re.match(x,w) x in re_list)) 

result:

haaappy true saad false glaad true 

Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -