Tech TLDR;

Crunching Honeypot IP Data with Pandas and Python

April 7, 2018 by admin

Table of Contents

  • Honeypots Used
  • Data Analysis
    • Load Data
      • Sample output
    • Load data into Panda’s DataFrame
      • Sample Output
    • Show top 10 IPs and attack count
      • Sample output
    • Show attacks for most common IPs
      • Sample Output
    • Show All attacks that took place
      • Sample Output
    • Show all IPs for elastichoney attack
      • Sample Output
    • Types of attacks for each elastichoney IP found
      • Sample Output

I am taking a cyber security class. This week’s assignment had us work on Honeypots. Honeypot is a server that pretends to have a vulnerability of sorts (open ports, old software etc.) and instead collects data on people who are trying to hack it.

At the end of the experiment I ended up with some data for four honeypots. The data and my write up can be found here. In this post I wanted to focus on how I used Pandas and Python to help me gather some insight into data that I’ve collected.

Pandas is a great library for Python that makes it really easy to explore various kinds of data (JSON, CSV etc). It’s available via pip install pandas. I recommend using ipython (available via pip install ipython) for all simple Python tasks. If you are not used to the command line I recommend using Jupyter Notebook instead, which is like ipython in a browser.

Honeypots Used

  1. Dionaea with HTTP – Network Scanners – https://github.com/DinoTools/dionaea
  2. p0f – Network Scanners – https://github.com/threatstream/mhn/wiki/p0f-Sensor
  3. Suricata Sensor – IDS, IPS and NSM engine – https://github.com/threatstream/mhn/wiki/Suricata-Sensor
  4. ElasticHoney Sensor – Remote Code Execution in ES before 1.3.8 – https://github.com/threatstream/mhn/wiki/ElasticHoney-Sensor

Data Analysis

Load Data

import pandas as pd import json # Load data, can be found at https://github.com/akras14/codepath9/blob/master/session.json with open("session.json") as f: data = f.read() data = data.split("\n") data.pop() # Drop last empty element data = [json.loads(d) for d in data]
Code language: PHP (php)

Sample output

data [{'_id': {'$oid': '5ac00385616a1e781bfa54b3'}, 'destination_port': 80, 'honeypot': 'dionaea', 'hpfeed_id': {'$oid': '5ac00383616a1e781bfa54b2'}, 'identifier': 'e8351d14-352d-11e8-a320-42010a800002', 'protocol': 'httpd', 'source_ip': '199.201.64.145', 'source_port': 38877, 'timestamp': {'$date': '2018-03-31T21:54:11.887+0000'}}, # etc ...
Code language: PHP (php)

Load data into Panda’s DataFrame

df = pd.DataFrame.from_dict(data) df.iloc[0] # Show first item in dataframe
Code language: PHP (php)

Sample Output

_id {'$oid': '5ac00385616a1e781bfa54b3'} destination_ip NaN destination_port 80 honeypot dionaea hpfeed_id {'$oid': '5ac00383616a1e781bfa54b2'} identifier e8351d14-352d-11e8-a320-42010a800002 protocol httpd sensor NaN source_ip 199.201.64.145 source_port 38877 suricata NaN timestamp {'$date': '2018-03-31T21:54:11.887+0000'} Name: 0, dtype: object
Code language: JavaScript (javascript)

OK, that’s cool. Let see which IP hit me the most.

Show top 10 IPs and attack count

most_common = df['source_ip'].value_counts()[:10] # and most_common.to_dict()
Code language: PHP (php)

Sample output

10.128.0.8 4457 199.201.64.145 1965 5.188.11.145 1295 199.201.64.139 992 191.101.167.7 764 5.62.39.237 658 5.62.43.21 657 77.72.85.25 512 5.188.9.25 441 5.188.11.63 410 Name: source_ip, dtype: int64 # As dictionary {'10.128.0.8': 4457, '191.101.167.7': 764, '199.201.64.139': 992, '199.201.64.145': 1965, '5.188.11.145': 1295, '5.188.11.63': 410, '5.188.9.25': 441, '5.62.39.237': 658, '5.62.43.21': 657, '77.72.85.25': 512}
Code language: PHP (php)

Note: Dictionary is out of order. You can also do a for loop on most_common variable itself, but I wanted to demo a to_dict() conversion.

I wonder what attacks those IPs run on my honeypots?

Show attacks for most common IPs

for ip in most_common.to_dict(): print (ip, df[df['source_ip'] == ip]['honeypot'].unique())
Code language: PHP (php)

Sample Output

10.128.0.8 ['suricata'] 199.201.64.145 ['dionaea'] 5.188.11.145 ['dionaea'] 199.201.64.139 ['dionaea'] 191.101.167.7 ['dionaea'] 5.62.39.237 ['dionaea'] 5.62.43.21 ['dionaea'] 77.72.85.25 ['dionaea'] 5.188.9.25 ['dionaea' 'p0f' 'suricata'] 5.188.11.63 ['dionaea' 'p0f' 'suricata']
Code language: CSS (css)

Cool looks like two IPs were able to hit 3 out of the 4 honeypots. BTW, let’s check all of the attacks that took place.

Show All attacks that took place

df['honeypot'].value_counts()
Code language: CSS (css)

Sample Output

dionaea 21657 suricata 5454 p0f 2403 elastichoney 6 Name: honeypot, dtype: int64

Interesting. There were only 6 elastichoney attacks. It’s something that the most common IPs check would have overlooked. Let’s see which IPs they came from.

Show all IPs for elastichoney attack

df[df['honeypot'] == 'elastichoney']['source_ip'].unique()
Code language: JavaScript (javascript)

Sample Output

array(['125.212.217.215', '221.229.204.122', '216.218.206.68', '211.23.154.138'], dtype=object)
Code language: PHP (php)

And what kind of attacks did those IPs perform?

Types of attacks for each elastichoney IP found

for ip in df[df['honeypot'] == 'elastichoney']['source_ip'].unique(): print(ip) print(df[df['source_ip'] == ip]['honeypot'].value_counts()) print("\n")
Code language: PHP (php)

Sample Output

125.212.217.215 dionaea 9 p0f 5 elastichoney 3 Name: honeypot, dtype: int64 221.229.204.122 dionaea 32 p0f 1 elastichoney 1 Name: honeypot, dtype: int64 216.218.206.68 dionaea 2 elastichoney 1 Name: honeypot, dtype: int64 211.23.154.138 dionaea 9 p0f 4 elastichoney 1 Name: honeypot, dtype: int64
Code language: CSS (css)

So looks like every IP that hit elastichoney pot, also hit other honeypots, but they did it only a few times. Probably to avoid getting detected by the sort of most common IP check that I ran first.

Filed Under: data, Python

Copyright © 2021 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in