python - Using `itertools.groupby()` to get lists of runs of strings that start with `A`? -


the (abstracted) problem this: have log file

a: 1 a: 2 a: 3 b: 4 b: 5 a: 6 c: 7 d: 8 a: 9 a: 10 a: 11 

and want end list of lists this:

[["1", "2", "3"], ["6"], ["9", "10", "11"]] 

where file has been broken "runs" of strings starting a. know can use itertools.groupby solve this, , right have solution (where f list of lines in file).

starts_with_a = lambda x: x.startswith("a") coalesced = [g _, g in groupby(f), key=starts_with_a] runs = [re.sub(r'a: ', '', s) s in coalesced if starts_with_a(s)] 

so use groupby, have filter out stuff doesn't start "a". okay, , pretty terse, there more elegant way it? i'd love way that:

  • doesn't require 2 passes
  • is terser (and/or) more readable

help me harness might of itertools!

yes, filter out lines don't start a use key produced groupby() each group returned. it's return value of key function, it'll true lines start a. i'd use str.partition() here instead of regular expression:

coalesce = (g key, g in groupby(f, key=lambda x: x[:1] == "a") if key) runs = [[res.partition(':')[-1].strip() res in group] group in coalesce] 

since str.startswith() argument fixed-width string literal, may use slicing; x[:1] slices of first character , compares 'a', gives same test x.startswith('a').

i used generator expression group groupby() filtering; could inline 1 list comprehension:

runs = [[res.partition(':')[-1].strip() res in group]         key, group in groupby(f, key=lambda x: x[:1] == "a") if key] 

demo:

>>> itertools import groupby >>> f = '''\ ... a: 1 ... a: 2 ... a: 3 ... b: 4 ... b: 5 ... a: 6 ... c: 7 ... d: 8 ... a: 9 ... a: 10 ... a: 11 ... '''.splitlines(true) >>> coalesce = (g key, g in groupby(f, key=lambda x: x[:1] == "a") if key) >>> [[res.partition(':')[-1].strip() res in group] group in coalesce] [['1', '2', '3'], ['6'], ['9', '10', '11']] 

Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -