Tech TLDR;

  • Archive
  • Top Posts
  • GitHub
  • LinkedIn
  • Contact

Crunching Honeypot IP Data with Pandas and Python

April 7, 2018 by admin

Table of Contents

  • Honeypots Used
  • Data Analysis
    • Load Data
      • Sample output
    • Load data into Panda’s DataFrame
      • Sample Output
    • Show top 10 IPs and attack count
      • Sample output
    • Show attacks for most common IPs
      • Sample Output
    • Show All attacks that took place
      • Sample Output
    • Show all IPs for elastichoney attack
      • Sample Output
    • Types of attacks for each elastichoney IP found
      • Sample Output

I am taking a cyber security class. This week’s assignment had us work on Honeypots. Honeypot is a server that pretends to have a vulnerability of sorts (open ports, old software etc.) and instead collects data on people who are trying to hack it.

At the end of the experiment I ended up with some data for four honeypots. The data and my write up can be found here. In this post I wanted to focus on how I used Pandas and Python to help me gather some insight into data that I’ve collected.

Pandas is a great library for Python that makes it really easy to explore various kinds of data (JSON, CSV etc). It’s available via pip install pandas. I recommend using ipython (available via pip install ipython) for all simple Python tasks. If you are not used to the command line I recommend using Jupyter Notebook instead, which is like ipython in a browser.

Honeypots Used

  1. Dionaea with HTTP – Network Scanners – https://github.com/DinoTools/dionaea
  2. p0f – Network Scanners – https://github.com/threatstream/mhn/wiki/p0f-Sensor
  3. Suricata Sensor – IDS, IPS and NSM engine – https://github.com/threatstream/mhn/wiki/Suricata-Sensor
  4. ElasticHoney Sensor – Remote Code Execution in ES before 1.3.8 – https://github.com/threatstream/mhn/wiki/ElasticHoney-Sensor

Data Analysis

Load Data

import pandas as pd
import json

# Load data, can be found at https://github.com/akras14/codepath9/blob/master/session.json
with open("session.json") as f:
    data = f.read()
    data = data.split("\n")
    data.pop() # Drop last empty element
    data = [json.loads(d) for d in data]
Code language: PHP (php)

Sample output

data
[{'_id': {'$oid': '5ac00385616a1e781bfa54b3'},
  'destination_port': 80,
  'honeypot': 'dionaea',
  'hpfeed_id': {'$oid': '5ac00383616a1e781bfa54b2'},
  'identifier': 'e8351d14-352d-11e8-a320-42010a800002',
  'protocol': 'httpd',
  'source_ip': '199.201.64.145',
  'source_port': 38877,
  'timestamp': {'$date': '2018-03-31T21:54:11.887+0000'}}, # etc ...
Code language: PHP (php)

Load data into Panda’s DataFrame

df = pd.DataFrame.from_dict(data)
df.iloc[0] # Show first item in dataframe
Code language: PHP (php)

Sample Output

_id                      {'$oid': '5ac00385616a1e781bfa54b3'}
destination_ip                                            NaN
destination_port                                           80
honeypot                                              dionaea
hpfeed_id                {'$oid': '5ac00383616a1e781bfa54b2'}
identifier               e8351d14-352d-11e8-a320-42010a800002
protocol                                                httpd
sensor                                                    NaN
source_ip                                      199.201.64.145
source_port                                             38877
suricata                                                  NaN
timestamp           {'$date': '2018-03-31T21:54:11.887+0000'}
Name: 0, dtype: object
Code language: JavaScript (javascript)

OK, that’s cool. Let see which IP hit me the most.

Show top 10 IPs and attack count

most_common = df['source_ip'].value_counts()[:10]
# and
most_common.to_dict()
Code language: PHP (php)

Sample output

10.128.0.8        4457
199.201.64.145    1965
5.188.11.145      1295
199.201.64.139     992
191.101.167.7      764
5.62.39.237        658
5.62.43.21         657
77.72.85.25        512
5.188.9.25         441
5.188.11.63        410
Name: source_ip, dtype: int64

# As dictionary
{'10.128.0.8': 4457,
 '191.101.167.7': 764,
 '199.201.64.139': 992,
 '199.201.64.145': 1965,
 '5.188.11.145': 1295,
 '5.188.11.63': 410,
 '5.188.9.25': 441,
 '5.62.39.237': 658,
 '5.62.43.21': 657,
 '77.72.85.25': 512}
Code language: PHP (php)

Note: Dictionary is out of order. You can also do a for loop on most_common variable itself, but I wanted to demo a to_dict() conversion.

I wonder what attacks those IPs run on my honeypots?

Show attacks for most common IPs

for ip in most_common.to_dict():
     print (ip, df[df['source_ip'] == ip]['honeypot'].unique())
Code language: PHP (php)

Sample Output

10.128.0.8 ['suricata']
199.201.64.145 ['dionaea']
5.188.11.145 ['dionaea']
199.201.64.139 ['dionaea']
191.101.167.7 ['dionaea']
5.62.39.237 ['dionaea']
5.62.43.21 ['dionaea']
77.72.85.25 ['dionaea']
5.188.9.25 ['dionaea' 'p0f' 'suricata']
5.188.11.63 ['dionaea' 'p0f' 'suricata']
Code language: CSS (css)

Cool looks like two IPs were able to hit 3 out of the 4 honeypots. BTW, let’s check all of the attacks that took place.

Show All attacks that took place

df['honeypot'].value_counts()
Code language: CSS (css)

Sample Output

dionaea         21657
suricata         5454
p0f              2403
elastichoney        6
Name: honeypot, dtype: int64

Interesting. There were only 6 elastichoney attacks. It’s something that the most common IPs check would have overlooked. Let’s see which IPs they came from.

Show all IPs for elastichoney attack

df[df['honeypot'] == 'elastichoney']['source_ip'].unique()
Code language: JavaScript (javascript)

Sample Output

array(['125.212.217.215', '221.229.204.122', '216.218.206.68',
       '211.23.154.138'], dtype=object)
Code language: PHP (php)

And what kind of attacks did those IPs perform?

Types of attacks for each elastichoney IP found

for ip in df[df['honeypot'] == 'elastichoney']['source_ip'].unique():
    print(ip)
    print(df[df['source_ip'] == ip]['honeypot'].value_counts())
    print("\n")
Code language: PHP (php)

Sample Output

125.212.217.215
dionaea         9
p0f             5
elastichoney    3
Name: honeypot, dtype: int64


221.229.204.122
dionaea         32
p0f              1
elastichoney     1
Name: honeypot, dtype: int64


216.218.206.68
dionaea         2
elastichoney    1
Name: honeypot, dtype: int64


211.23.154.138
dionaea         9
p0f             4
elastichoney    1
Name: honeypot, dtype: int64
Code language: CSS (css)

So looks like every IP that hit elastichoney pot, also hit other honeypots, but they did it only a few times. Probably to avoid getting detected by the sort of most common IP check that I ran first.

Filed Under: data, Python

Copyright © 2023 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in