python - Find the difference between successive timestamps separately for each day using pandas -
i have dataframe datetime index:
>>> d.head() out[29]: value time 2017-04-02 21:11:00.221 1114.73 2017-04-03 00:01:00.221 1114.73 2017-04-03 00:01:01.345 1114.73 2017-04-03 00:01:02.701 1114.10
i want successive differences in index times, each day separately. doing incomplete:
d['datetime']= d.index d['datetime_diff']=d['datetime'].diff()
this gives me difference between successive index timestamps, doesnt start afresh each day. can separate date datetime, groupby on date , calculate timediffs each date. there no set first , last time each day.
after getting these timediffs, intend stats mean, median, count etc.
is there better way this? guess reduces different problem of marking first value on each day. can first value on each day using group-by, doesn't solve issue instead of retrieving first value, need easy way label first value.
use pd.timegrouper
, , group frequency of 1d
diff = df.groupby(pd.timegrouper(freq='1d')).diff() diff value time 2017-04-02 21:11:00.221 nan 2017-04-03 00:01:00.221 nan 2017-04-03 00:01:01.345 0.00 2017-04-03 00:01:02.701 -0.63
if df.time
not of datetime
type, you'll need convert it:
df.index = pd.to_datetime(df.index)
to difference of index only, there's simpler way - first, reset_index
, groupby
, call .diff
on column. can use pd.grouper
key=time
this.
diff = df.reset_index().groupby(pd.grouper(key='time', freq='1d')).time.diff() diff 0 nat 1 nat 2 00:00:01.124000 3 00:00:01.356000 name: time, dtype: timedelta64[ns]
as aside, if interested in day-wise stats, can groupby , call .describe
:
g = df.groupby(pd.grouper(level=0, freq='1d')) g.describe() value \ count mean std min 25% 50% 75% time 2017-04-02 1.0 1114.73 nan 1114.73 1114.730 1114.73 1114.73 2017-04-03 3.0 1114.52 0.363731 1114.10 1114.415 1114.73 1114.73 max time 2017-04-02 1114.73 2017-04-03 1114.73
Comments
Post a Comment