Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

To perform aggregate functions on particular datetime values in a Pandas DataFrame, we can use the groupby method along with the resample() method.

For example, let's say we have a DataFrame with a datetime column and a value column:

import pandas as pd

data = {
    'datetime': [
        '2020-01-01 00:00:00', '2020-01-01 01:00:00', '2020-01-01 02:00:00', 
        '2020-01-02 00:00:00', '2020-01-02 01:00:00', '2020-01-02 02:00:00',
        '2020-01-03 00:00:00', '2020-01-03 01:00:00', '2020-01-03 02:00:00',
    ],
    'value': [10, 20, 30, 40, 50, 60, 70, 80, 90]
}

df = pd.DataFrame(data)
df['datetime'] = pd.to_datetime(df['datetime'])

We can use the resample() method to group the data by a particular frequency, such as daily or hourly. For example, to group the data by day and calculate the sum of the values for each day, we can do:

df.resample('D', on='datetime').sum()

This will return a new DataFrame with the aggregated values:

            value
datetime        
2020-01-01     60
2020-01-02    150
2020-01-03    240

Similarly, to group the data by hour and calculate the mean of the values for each hour, we can do:

df.resample('H', on='datetime').mean()

This will return:

                         value
datetime                     
2020-01-01 00:00:00  10.000000
2020-01-01 01:00:00  20.000000
2020-01-01 02:00:00  30.000000
2020-01-01 03:00:00        NaN
2020-01-01 04:00:00        NaN
...                        ...
2020-01-02 22:00:00        NaN
2020-01-02 23:00:00        NaN
2020-01-03 00:00:00  70.000000
2020-01-03 01:00:00  80.000000
2020-01-03 02:00:00  90.000000

[73 rows x 1 columns]

Note that when we group by a particular frequency, some dates/times may not have any data associated with them in the original DataFrame, and so our new DataFrame will contain NaN values in those rows.