python - Best way to iterate over an array that uses its own output -
firstly, i'd apologize badly worded title - can't think of better way phrase it. basically, i'm wondering if there's faster way implement array operation in python each operation depends on previous outputs in iterative fashion (e.g. forward differencing operations, filtering, etc.). basically, operations of form like:
for n in range(1, len(x)): y[n] = x[n] + x[n - 1] + y[n-1]
where x
array of values, , y
output. in case, y[0]
assumed known or calculated separately before above loop. question is: there numpy functionality speed sort of self-referential loop? major bottleneck in scripts have. know numpy routines benefit being executed c routines, curious if knew of numpy routines here. else failing that, there better ways program loop (in python) speed execution large array sizes? (>500,000 data points).
accessing single numpy array elements or (elementwise-)iterating on numpy array slow (like really slow). if ever want manual iteration on numpy array: don't it!
but got options. easiest convert array python list , iterate on list (sounds silly, stay me - i'll present benchmarks @ end of answer 1):
x = x.tolist() y = y.tolist() n in range(1, len(x)): y[n] = x[n] + x[n - 1] + y[n-1]
if use direct iteration on lists, faster:
x = x.tolist() y = y.tolist() idx, (y_n_m1, x_n, x_n_m1) in enumerate(zip(y, x[1:], x), 1): y[idx] = x_n + x_n_m1 + y_n_m1
then there more sophisticated options require additional packages. notably cython
, numba
, these designed work on array-elements directly , avoid python overhead whenever possible. example numba use your approach inside jitted (just-in-time compiled) function:
import numba nb @nb.njit def func(x, y): n in range(1, len(x)): y[n] = x[n] + x[n - 1] + y[n-1]
there x
, y
can numpy arrays numba work on buffer directly, out-speeding other approaches (possibly orders of magnitude).
numba "heavier" dependency cython, can faster , easier use. without conda
it's hard install numba... ymmv
however here's cython version of code (compiled using ipython magic, it's bit different if you're not using ipython):
in [1]: %load_ext cython in [2]: %%cython ...: ...: cimport cython ...: ...: @cython.boundscheck(false) ...: @cython.wraparound(false) ...: cpdef cython_indexing(double[:] x, double[:] y): ...: cdef py_ssize_t n ...: n in range(1, len(x)): ...: y[n] = x[n] + x[n - 1] + y[n-1] ...: return y
just give example (based on the timing framework answer question), regarding timings:
import numpy np import numba nb import scipy.signal def numpy_indexing(x, y): n in range(1, len(x)): y[n] = x[n] + x[n - 1] + y[n-1] return y def list_indexing(x, y): x = x.tolist() y = y.tolist() n in range(1, len(x)): y[n] = x[n] + x[n - 1] + y[n-1] return y def list_direct(x, y): x = x.tolist() y = y.tolist() idx, (y_n_m1, x_n, x_n_m1) in enumerate(zip(y, x[1:], x), 1): y[idx] = x_n + x_n_m1 + y_n_m1 return y @nb.njit def numba_indexing(x, y): n in range(1, len(x)): y[n] = x[n] + x[n - 1] + y[n-1] return y def numpy_cumsum(x, y): y[1:] = x[1:] + x[:-1] np.cumsum(y, out=y) return y def scipy_lfilter(x, y): = [1, -1] b = [1, 1] return y[0] - x[0] + scipy.signal.lfilter(b, a, x) # make sure approaches give same result x = np.random.random(10000) y = np.zeros(10000) y[0] = np.random.random() np.testing.assert_array_equal(numba_indexing(x, y), numpy_indexing(x, y)) np.testing.assert_array_equal(numba_indexing(x, y), numpy_cumsum(x, y)) np.testing.assert_almost_equal(numba_indexing(x, y), scipy_lfilter(x, y)) np.testing.assert_array_equal(numba_indexing(x, y), cython_indexing(x, y)) # timing setup timings = {numpy_indexing: [], list_indexing: [], list_direct: [], numba_indexing: [], numpy_cumsum: [], scipy_lfilter: [], cython_indexing: []} sizes = [2**i in range(1, 20, 2)] # timing size in sizes: x = np.random.random(size=size) y = np.zeros(size) y[0] = np.random.random() func in timings: res = %timeit -o func(x, y) timings[func].append(res) # plottig absolute times %matplotlib notebook import matplotlib.pyplot plt fig = plt.figure(1) ax = plt.subplot(111) func in timings: ax.plot(sizes, [time.best time in timings[func]], label=str(func.__name__)) ax.set_xscale('log') ax.set_yscale('log') ax.set_xlabel('size') ax.set_ylabel('time [seconds]') ax.grid(which='both') ax.legend() plt.tight_layout() # plotting relative times fig = plt.figure(1) ax = plt.subplot(111) baseline = numba_indexing # choose 1 function baseline func in timings: ax.plot(sizes, [time.best / ref.best time, ref in zip(timings[func], timings[baseline])], label=str(func.__name__)) ax.set_yscale('log') ax.set_xscale('log') ax.set_xlabel('size') ax.set_ylabel('time relative "{}"'.format(baseline.__name__)) ax.grid(which='both') ax.legend() plt.tight_layout()
with following results:
absolute runtimes
relative runtimes (compared numba function)
so, converting list 3 times faster! iterating directly on these lists (yet smaller) speedup, 20% in benchmark we're 4 times faster compared original solution. numba can speed factor of more 100 compared list operations! , cython bit slower numba (~40-50%), because haven't squeezed out every possible optimization (usually it's not more 10-20% slower) cython. large arrays difference gets smaller.
1 did go more details in another answer. q+a converting set
because set
uses (hidden) "manual iteration" applies here.
i included timings numpy cumsum
, scipy lfilter
approaches. these 20 times slower small arrays , 4 times slower large arrays compared numba function. if interpret question correctly looked general ways not ones applied in example. not every self-referencing loop can implemented using cum*
functions numpy or scipys filters. seems can't compete cython and/or numba.
Comments
Post a Comment