Python Performance: remove item from list -


this question has answer here:

i have list lenght of: 370000. in list have items like: "a", "y", "y", "q", "q", "p", "p",, meaning list of words time time single characters.

i want remove characters list, pretty new in python first thing came mind like:

for word in words:     if word== 'm' or  word== 'y' or word== 'y' or word== 'p' or word== 'q' or word== 'q' or word== 'a' or word== 'uh':         words.remove(word) 

in list 370.000 items method taking loooot of time. seriously, lot.

does have awesome idea on how better performance?

thanks in advance.

tried bogo-benchmarks in ipython.

import random # don't know how generate words, use integers substitute. words = [random.randint(0, 25) in xrange(370000)] badlist = range(7) badtuple = tuple(badlist) badset = set(badlist) # list comprehension %timeit [w w in words if w not in badlist] 10 loops, best of 3: 59.2 ms per loop %timeit [w w in words if w not in badtuple] 10 loops, best of 3: 64.7 ms per loop %timeit [w w in words if w not in badset] 10 loops, best of 3: 30.3 ms per loop # filter %timeit filter(lambda w: w not in badlist, words) 10 loops, best of 3: 85.6 ms per loop %timeit filter(lambda w: w not in badtuple, words) 10 loops, best of 3: 92.1 ms per loop %timeit filter(lambda w: w not in badset, words) 10 loops, best of 3: 50.8 ms per loop 

conclusion: probably, using list comprehension not in <set> filter condition best.

but i've said, benchmark bogus, , need repeat benchmarks on actual kind of data you'll encounter see better.


some idea on why list comprehension + "not in set" optimal.

  1. filter vs list comprehension: filter calls input callable, , callable-calling in python has own overhead (creating stack frame, etc.) also filter tries smart , return correct type, adds overhead. (this infinitesimally small) instead, list comprehension's condition check (the if ... clause) has less overhead call. it's expression evaluation without full bells , whistles of python call stack.
  2. test set membership o(1) in average case , o(n) in worst case, list/tuple membership o(n).

Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -