Table of Contents
- Honeypots Used
- Data Analysis
I am taking a cyber security class. This week’s assignment had us work on Honeypots. Honeypot is a server that pretends to have a vulnerability of sorts (open ports, old software etc.) and instead collects data on people who are trying to hack it.
At the end of the experiment I ended up with some data for four honeypots. The data and my write up can be found here. In this post I wanted to focus on how I used Pandas and Python to help me gather some insight into data that I’ve collected.
Pandas is a great library for Python that makes it really easy to explore various kinds of data (JSON, CSV etc). It’s available via pip install pandas
. I recommend using ipython (available via pip install ipython
) for all simple Python tasks. If you are not used to the command line I recommend using Jupyter Notebook instead, which is like ipython in a browser.
Honeypots Used
- Dionaea with HTTP – Network Scanners – https://github.com/DinoTools/dionaea
- p0f – Network Scanners – https://github.com/threatstream/mhn/wiki/p0f-Sensor
- Suricata Sensor – IDS, IPS and NSM engine – https://github.com/threatstream/mhn/wiki/Suricata-Sensor
- ElasticHoney Sensor – Remote Code Execution in ES before 1.3.8 – https://github.com/threatstream/mhn/wiki/ElasticHoney-Sensor
Data Analysis
Load Data
import pandas as pd
import json
# Load data, can be found at https://github.com/akras14/codepath9/blob/master/session.json
with open("session.json") as f:
data = f.read()
data = data.split("\n")
data.pop() # Drop last empty element
data = [json.loads(d) for d in data]
Code language: PHP (php)
Sample output
data
[{'_id': {'$oid': '5ac00385616a1e781bfa54b3'},
'destination_port': 80,
'honeypot': 'dionaea',
'hpfeed_id': {'$oid': '5ac00383616a1e781bfa54b2'},
'identifier': 'e8351d14-352d-11e8-a320-42010a800002',
'protocol': 'httpd',
'source_ip': '199.201.64.145',
'source_port': 38877,
'timestamp': {'$date': '2018-03-31T21:54:11.887+0000'}}, # etc ...
Code language: PHP (php)
Load data into Panda’s DataFrame
df = pd.DataFrame.from_dict(data)
df.iloc[0] # Show first item in dataframe
Code language: PHP (php)
Sample Output
_id {'$oid': '5ac00385616a1e781bfa54b3'}
destination_ip NaN
destination_port 80
honeypot dionaea
hpfeed_id {'$oid': '5ac00383616a1e781bfa54b2'}
identifier e8351d14-352d-11e8-a320-42010a800002
protocol httpd
sensor NaN
source_ip 199.201.64.145
source_port 38877
suricata NaN
timestamp {'$date': '2018-03-31T21:54:11.887+0000'}
Name: 0, dtype: object
Code language: JavaScript (javascript)
OK, that’s cool. Let see which IP hit me the most.
Show top 10 IPs and attack count
most_common = df['source_ip'].value_counts()[:10]
# and
most_common.to_dict()
Code language: PHP (php)
Sample output
10.128.0.8 4457
199.201.64.145 1965
5.188.11.145 1295
199.201.64.139 992
191.101.167.7 764
5.62.39.237 658
5.62.43.21 657
77.72.85.25 512
5.188.9.25 441
5.188.11.63 410
Name: source_ip, dtype: int64
# As dictionary
{'10.128.0.8': 4457,
'191.101.167.7': 764,
'199.201.64.139': 992,
'199.201.64.145': 1965,
'5.188.11.145': 1295,
'5.188.11.63': 410,
'5.188.9.25': 441,
'5.62.39.237': 658,
'5.62.43.21': 657,
'77.72.85.25': 512}
Code language: PHP (php)
Note: Dictionary is out of order. You can also do a for loop on most_common variable itself, but I wanted to demo a to_dict()
conversion.
I wonder what attacks those IPs run on my honeypots?
Show attacks for most common IPs
for ip in most_common.to_dict():
print (ip, df[df['source_ip'] == ip]['honeypot'].unique())
Code language: PHP (php)
Sample Output
10.128.0.8 ['suricata']
199.201.64.145 ['dionaea']
5.188.11.145 ['dionaea']
199.201.64.139 ['dionaea']
191.101.167.7 ['dionaea']
5.62.39.237 ['dionaea']
5.62.43.21 ['dionaea']
77.72.85.25 ['dionaea']
5.188.9.25 ['dionaea' 'p0f' 'suricata']
5.188.11.63 ['dionaea' 'p0f' 'suricata']
Code language: CSS (css)
Cool looks like two IPs were able to hit 3 out of the 4 honeypots. BTW, let’s check all of the attacks that took place.
Show All attacks that took place
df['honeypot'].value_counts()
Code language: CSS (css)
Sample Output
dionaea 21657
suricata 5454
p0f 2403
elastichoney 6
Name: honeypot, dtype: int64
Interesting. There were only 6 elastichoney attacks. It’s something that the most common IPs check would have overlooked. Let’s see which IPs they came from.
Show all IPs for elastichoney attack
df[df['honeypot'] == 'elastichoney']['source_ip'].unique()
Code language: JavaScript (javascript)
Sample Output
array(['125.212.217.215', '221.229.204.122', '216.218.206.68',
'211.23.154.138'], dtype=object)
Code language: PHP (php)
And what kind of attacks did those IPs perform?
Types of attacks for each elastichoney IP found
for ip in df[df['honeypot'] == 'elastichoney']['source_ip'].unique():
print(ip)
print(df[df['source_ip'] == ip]['honeypot'].value_counts())
print("\n")
Code language: PHP (php)
Sample Output
125.212.217.215
dionaea 9
p0f 5
elastichoney 3
Name: honeypot, dtype: int64
221.229.204.122
dionaea 32
p0f 1
elastichoney 1
Name: honeypot, dtype: int64
216.218.206.68
dionaea 2
elastichoney 1
Name: honeypot, dtype: int64
211.23.154.138
dionaea 9
p0f 4
elastichoney 1
Name: honeypot, dtype: int64
Code language: CSS (css)
So looks like every IP that hit elastichoney pot, also hit other honeypots, but they did it only a few times. Probably to avoid getting detected by the sort of most common IP check that I ran first.