Tech TLDR;

Group Log Data by Timestamp in Python with Pandas

December 23, 2018 by admin

I often find myself with logs in the following format, exported in CSV.

timestamp,request_time_in_millis 1545509231863,101 1545509231863,38 1545541592109,202 1545595617945,53 1545595617945,1024

First column is timestamp in milliseconds, second column is timing for some request, also in milliseconds.

Often I’d like to see the min, max and average request time, group by day and hour. Here is how to get this insight using Pandas library in Python.

Let’s load sample data first.

import pandas as pd # Generate sample data (normally would read this from a log file) data = { 'millis': [101, 38, 202, 53, 1024], 'time': [1545509231863, 1545509231863, 1545541592109, 1545592017945, 1545592017945] } df = pd.DataFrame.from_dict(data)
Code language: PHP (php)

At this point the data frame will look as follows:

millis time 0 101 1545564489895 1 38 1545564489895 2 202 1545590975516 3 53 1545592017945 4 1024 1545592017945

Now we can convert the timestamp to date and group by day and hour.

# Convert timestamp to datetime, note "unit" is set to "ms", # would be "s" timestamp in seconds df['date'] = df['time'].apply(lambda x: pd.to_datetime(x, unit='ms')) # Only keep columns we care about df = df[['date', 'millis']] # Prevent statistic view from splitting across terminal pd.set_option('display.expand_frame_repr', False) # See grouped statistics df.groupby([df['date'].dt.day, df['date'].dt.hour]).describe()
Code language: PHP (php)

The final output, will be grouped by day of the month and hour of the day, as follows:

millis count mean std min 25% 50% 75% max date date 22 20 2.0 69.5 44.547727 38.0 53.75 69.5 85.25 101.0 23 5 1.0 202.0 NaN 202.0 202.00 202.0 202.00 202.0 19 2.0 538.5 686.600685 53.0 295.75 538.5 781.25 1024.0
Code language: CSS (css)

Filed Under: data, Python

Copyright © 2023 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in