I often find myself with logs in the following format, exported in CSV.
timestamp,request_time_in_millis
1545509231863,101
1545509231863,38
1545541592109,202
1545595617945,53
1545595617945,1024
First column is timestamp in milliseconds, second column is timing for some request, also in milliseconds.
Often I’d like to see the min, max and average request time, group by day and hour. Here is how to get this insight using Pandas library in Python.
Let’s load sample data first.
import pandas as pd
# Generate sample data (normally would read this from a log file)
data = {
'millis': [101, 38, 202, 53, 1024],
'time': [1545509231863, 1545509231863, 1545541592109, 1545592017945, 1545592017945]
}
df = pd.DataFrame.from_dict(data)
Code language: PHP (php)
At this point the data frame will look as follows:
millis time
0 101 1545564489895
1 38 1545564489895
2 202 1545590975516
3 53 1545592017945
4 1024 1545592017945
Now we can convert the timestamp to date and group by day and hour.
# Convert timestamp to datetime, note "unit" is set to "ms",
# would be "s" timestamp in seconds
df['date'] = df['time'].apply(lambda x: pd.to_datetime(x, unit='ms'))
# Only keep columns we care about
df = df[['date', 'millis']]
# Prevent statistic view from splitting across terminal
pd.set_option('display.expand_frame_repr', False)
# See grouped statistics
df.groupby([df['date'].dt.day, df['date'].dt.hour]).describe()
Code language: PHP (php)
The final output, will be grouped by day of the month and hour of the day, as follows:
millis
count mean std min 25% 50% 75% max
date date
22 20 2.0 69.5 44.547727 38.0 53.75 69.5 85.25 101.0
23 5 1.0 202.0 NaN 202.0 202.00 202.0 202.00 202.0
19 2.0 538.5 686.600685 53.0 295.75 538.5 781.25 1024.0
Code language: CSS (css)