Python Performance: remove item from list -
this question has answer here:
- fast way remove few items list/queue 5 answers
i have list lenght of: 370000
. in list have items like: "a", "y", "y", "q", "q", "p", "p",
, meaning list of words time time single characters.
i want remove characters list, pretty new in python first thing came mind like:
for word in words: if word== 'm' or word== 'y' or word== 'y' or word== 'p' or word== 'q' or word== 'q' or word== 'a' or word== 'uh': words.remove(word)
in list 370.000 items method taking loooot of time. seriously, lot.
does have awesome idea on how better performance?
thanks in advance.
tried bogo-benchmarks in ipython.
import random # don't know how generate words, use integers substitute. words = [random.randint(0, 25) in xrange(370000)] badlist = range(7) badtuple = tuple(badlist) badset = set(badlist) # list comprehension %timeit [w w in words if w not in badlist] 10 loops, best of 3: 59.2 ms per loop %timeit [w w in words if w not in badtuple] 10 loops, best of 3: 64.7 ms per loop %timeit [w w in words if w not in badset] 10 loops, best of 3: 30.3 ms per loop # filter %timeit filter(lambda w: w not in badlist, words) 10 loops, best of 3: 85.6 ms per loop %timeit filter(lambda w: w not in badtuple, words) 10 loops, best of 3: 92.1 ms per loop %timeit filter(lambda w: w not in badset, words) 10 loops, best of 3: 50.8 ms per loop
conclusion: probably, using list comprehension not in <set>
filter condition best.
but i've said, benchmark bogus, , need repeat benchmarks on actual kind of data you'll encounter see better.
some idea on why list comprehension + "not in set" optimal.
filter
vs list comprehension:filter
calls input callable, , callable-calling in python has own overhead (creating stack frame, etc.)also(this infinitesimally small) instead, list comprehension's condition check (thefilter
tries smart , return correct type, adds overhead.if ...
clause) has less overhead call. it's expression evaluation without full bells , whistles of python call stack.- test set membership o(1) in average case , o(n) in worst case, list/tuple membership o(n).
Comments
Post a Comment