python - Pandas: Multiple rolling periods -
i multiple rolling period means , std several columns simultaneously.
this code using rolling(5):
def add_mean_std_cols(df): res = df.rolling(5).agg(['mean','std']) res.columns = res.columns.map('_'.join) cols = np.concatenate(list(zip(df.columns, res.columns[0::2], res.columns[1::2]))) final = res.join(df).loc[:, cols] return final
i rolling (5), (15), (30), (45) periods on same operation.
i thought iterating on periods not know how avoid getting rolling mean/std of rolling mean/std...
i suggest creating dataframe multiindex columns. there's no way around using loop here iterate on windows. resulting form that's easy index , easy read pd.read_csv
. initialize empty dataframe np.empty
of appropriate shape , use .loc
assign values.
import numpy np import pandas pd np.random.seed(123) df = pd.dataframe(np.random.randn(100,3)).add_prefix('col') windows = [5, 15, 30, 45] stats = ['mean', 'std'] cols = pd.multiindex.from_product([windows, df.columns, stats], names=['window', 'feature', 'metric']) df2 = pd.dataframe(np.empty((df.shape[0], len(cols))), columns=cols, index=df.index) window in windows: df2.loc[:, window] = df.rolling(window=window).agg(stats).values
now have result df2
has same index original object. has 3 column levels: first window, second columns original frame, , third statistic.
print(df2.shape) (100, 24)
this makes easy check values specific rolling window:
print(df2[5]) # rolling window = 5 feature col0 col1 col2 metric mean std mean std mean std 0 nan nan nan nan nan nan 1 nan nan nan nan nan nan 2 nan nan nan nan nan nan 3 nan nan nan nan nan nan 4 -0.87879 1.45348 -0.26559 0.71236 0.53233 0.89430 .. ... ... ... ... ... ... 95 -0.44231 1.02552 -1.22138 0.45140 -0.36440 0.95324 96 -0.58638 1.10246 -0.90165 0.79723 -0.44543 1.00166 97 -0.70564 0.85711 -0.42644 1.07174 -0.44766 1.00284 98 -0.95702 1.01302 -0.03705 1.05066 0.16437 1.32341 99 -0.57026 1.10978 0.08730 1.02438 0.39930 1.31240 print(df2[5]['col0']) # rolling window = 5, stats of col0 metric mean std 0 nan nan 1 nan nan 2 nan nan 3 nan nan 4 -0.87879 1.45348 .. ... ... 95 -0.44231 1.02552 96 -0.58638 1.10246 97 -0.70564 0.85711 98 -0.95702 1.01302 99 -0.57026 1.10978 print(df2.loc[:, (5, slice(none), 'mean')]) # rolling window = 5, # means of each column period 5 feature col0 col1 col2 metric mean mean mean 0 nan nan nan 1 nan nan nan 2 nan nan nan 3 nan nan nan 4 -0.87879 -0.26559 0.53233 .. ... ... ... 95 -0.44231 -1.22138 -0.36440 96 -0.58638 -0.90165 -0.44543 97 -0.70564 -0.42644 -0.44766 98 -0.95702 -0.03705 0.16437 99 -0.57026 0.08730 0.39930
and lastly make single-indexed dataframe, here's kludgy use of itertools
.
df = pd.dataframe(np.random.randn(100,3)).add_prefix('col') import itertools means = [col + '_mean' col in df.columns] stds = [col + '_std' col in df.columns] iters = [iter(means), iter(stds)] iters = list(it.__next__() in itertools.cycle(iters)) iters = list(itertools.product(iters, [str(win) win in windows])) iters = ['_'.join(it) in iters] df2 = [df.rolling(window=window).agg(stats).values window in windows] df2 = pd.dataframe(np.concatenate(df2, axis=1), columns=iters, index=df.index)
Comments
Post a Comment