python - How to Split String Using Delimiters -
i have file called so:
test.txt
dog;cat;mouse;bird;turtle;# animals dog;cat;mouse;bird;turtle;horse cow # animals
i need breaking second line first line:
dog;cat;mouse;bird;turtle;horse;cow;# animals
the hard part has no set parameters on how many animal can inserted between 5th element , in front of '#' symbol. have 2 i'm showing in example or 10.
i'm able break down two-dimensional array not sure how split second string.
with open (file) f: lines = list (f) temp = [line.strip ().split (';') line in lines]
output:
for in temp: print (i) ['dog', 'cat', 'mouse', 'bird', 'turtle', '# animals'] ['dog', 'cat', 'mouse', 'bird', 'turtle', 'horse cow # animals']
desired output:
['dog', 'cat', 'mouse', 'bird', 'turtle', '# animals'] ['dog', 'cat', 'mouse', 'bird', 'turtle', 'horse', 'cow', '# animals']
any appreciated.
-updated-
my actual data contains following pattern:
10-2-2015;10:02;location;xxx.xxx.xxx.xxx;xxx.xxx.xxx.xxx;somename1 # more alphanumeric text caps , lower case 10-2-2015;10:02;location;xxx.xxx.xxx.xxx;xxx.xxx.xxx.xxx;somename1; somename2 somename3 # more,alphanumeric,text,with,caps,and,lower,case
the x's represents ips , subnet. commas after '#' should untouched.
you might try regular expression:
>>> import re >>> my_expression = r'[a-z]+|#.+' >>> f = 'dog;cat;mouse;bird;turtle;# animals' >>> s = 'dog;cat;mouse;bird;turtle;horse cow # animals' >>> re.findall(my_expression, f) ['dog', 'cat', 'mouse', 'bird', 'turtle', '# animals'] >>> re.findall(my_expression, s) ['dog', 'cat', 'mouse', 'bird', 'turtle', 'horse', 'cow', '# animals']
the above find every instance of either group of 1 or more lowercase letters ([a-z]+
) or (|
) hash/pound sign followed 1 or more characters (#.+
).
for updated sample data:
>>> my_expression = r'#.+|[^ ;]+' >>> f='10-2-2015;10:02;location;xxx.xxx.xxx.xxx;xxx.xxx.xxx.xxx;somename1 # more alphanumeric text caps , lower case' >>> s='10-2-2015;10:02;location;xxx.xxx.xxx.xxx;xxx.xxx.xxx.xxx;somename1; somename2 somename3 # more,alphanumeric,text,with,caps,and,lower,case' >>> my_expression = r'#.+|[^ ;]+' >>> re.findall(my_expression, f) ['10-2-2015', '10:02', 'location', 'xxx.xxx.xxx.xxx', 'xxx.xxx.xxx.xxx', 'somename1', '# more alphanumeric text caps , lower case'] >>> re.findall(my_expression, s) ['10-2-2015', '10:02', 'location', 'xxx.xxx.xxx.xxx', 'xxx.xxx.xxx.xxx', 'somename1', 'somename2', 'somename3', '# more,alphanumeric,text,with,caps,and,lower,case', '\n']
this expression looks either hash/pound sign followed 1 or more characters (#.+
) or (|
) group of 1 or more characters neither spaces nor semicolons ([^ ;]+
).
Comments
Post a Comment