python - iterate over pandas.DataFrame with MultiIndex by index names -
i iterate on dataframe multiindex while using names access specific value of index column. example, given following
import pandas pd index = pd.multiindex.from_product([range(2), range(3)], names=['index_a', 'index_b']) table = pd.dataframe({'my_column': range(len(index))}, index=index) i iterate on rows of table using code like:
for row in named_index_iterator(table): print(row.my_column, row.index_a, row.index_b) or
for row in named_index_iterator(table): print(row.my_column, row.index.index_a, row.index.index_b) to implement named_index_iterator cannot use itertuples or iterrows in dataframe gives plain tuples, not named tuples, index. cannot use like:
for data_row, index_row: itertools.zip_longest(table.itertuples(), table.index): as iterator on table.index gives again plain tuples, not named tuples.
as workaround use
for row in table.reset_index().itertuples(): but copies table.
answering own question references.
i created following utility iterate on indexes names:
import collections def df_iter_with_index_names(table): indexnames = collections.namedtuple('indexnames', table.index.names) row in table.itertuples(): yield (indexnames(*row.index), row) with usage like:
import collections import pandas pd index = pd.multiindex.from_product([range(2), range(3)], names=['index_a', 'index_b']) table = pd.dataframe({'my_column': range(len(index))}, index=index) print(table) def df_iter_with_index_names(table): indexnames = collections.namedtuple('indexnames', table.index.names) row in table.itertuples(): yield (indexnames(*row.index), row) index, row in df_iter_with_index_names(table): print(index.index_a, row.my_column) it can improved remove index name row tuple, leftover dataframe.itertuples(), can live that.
Comments
Post a Comment