python - How to Split String Using Delimiters -

February 15, 2011

i have file called so:

test.txt

dog;cat;mouse;bird;turtle;# animals dog;cat;mouse;bird;turtle;horse cow # animals

i need breaking second line first line:

dog;cat;mouse;bird;turtle;horse;cow;# animals

the hard part has no set parameters on how many animal can inserted between 5th element , in front of '#' symbol. have 2 i'm showing in example or 10.

i'm able break down two-dimensional array not sure how split second string.

with open (file) f:     lines = list (f)     temp = [line.strip ().split (';') line in lines]

output:

for in temp:     print (i)  ['dog', 'cat', 'mouse', 'bird', 'turtle', '# animals'] ['dog', 'cat', 'mouse', 'bird', 'turtle', 'horse cow # animals']

desired output:

['dog', 'cat', 'mouse', 'bird', 'turtle', '# animals'] ['dog', 'cat', 'mouse', 'bird', 'turtle', 'horse', 'cow', '# animals']

any appreciated.

-updated-

my actual data contains following pattern:

10-2-2015;10:02;location;xxx.xxx.xxx.xxx;xxx.xxx.xxx.xxx;somename1 # more alphanumeric text caps , lower case 10-2-2015;10:02;location;xxx.xxx.xxx.xxx;xxx.xxx.xxx.xxx;somename1; somename2 somename3 # more,alphanumeric,text,with,caps,and,lower,case

the x's represents ips , subnet. commas after '#' should untouched.

you might try regular expression:

>>> import re >>> my_expression = r'[a-z]+|#.+' >>> f = 'dog;cat;mouse;bird;turtle;# animals' >>> s = 'dog;cat;mouse;bird;turtle;horse cow # animals' >>> re.findall(my_expression, f) ['dog', 'cat', 'mouse', 'bird', 'turtle', '# animals'] >>> re.findall(my_expression, s) ['dog', 'cat', 'mouse', 'bird', 'turtle', 'horse', 'cow', '# animals']

the above find every instance of either group of 1 or more lowercase letters ([a-z]+) or (|) hash/pound sign followed 1 or more characters (#.+).

for updated sample data:

>>> my_expression = r'#.+|[^ ;]+' >>> f='10-2-2015;10:02;location;xxx.xxx.xxx.xxx;xxx.xxx.xxx.xxx;somename1 # more alphanumeric text caps , lower case' >>> s='10-2-2015;10:02;location;xxx.xxx.xxx.xxx;xxx.xxx.xxx.xxx;somename1; somename2 somename3 # more,alphanumeric,text,with,caps,and,lower,case' >>> my_expression = r'#.+|[^ ;]+' >>> re.findall(my_expression, f) ['10-2-2015', '10:02', 'location', 'xxx.xxx.xxx.xxx', 'xxx.xxx.xxx.xxx', 'somename1', '# more alphanumeric text caps , lower case'] >>> re.findall(my_expression, s) ['10-2-2015', '10:02', 'location', 'xxx.xxx.xxx.xxx', 'xxx.xxx.xxx.xxx', 'somename1', 'somename2', 'somename3', '# more,alphanumeric,text,with,caps,and,lower,case', '\n']

this expression looks either hash/pound sign followed 1 or more characters (#.+) or (|) group of 1 or more characters neither spaces nor semicolons ([^ ;]+).

Search This Blog

Enable

python - How to Split String Using Delimiters -

Comments

Post a Comment

Popular posts from this blog

resizing Telegram inline keyboard -

javascript - How to bind ViewModel Store to View? -

javascript - Solution fails to pass one test with large inputs? -