GreedyBear Blog

GreedyBear version 3 coming

2026-01-29T00:00:00+00:00

Over the last months some new contributors helped us to implement a lot of new stuff in GreedyBear. Because of the huge number of new features and a different structure of the Feeds API responses, we are releasing a new major version in the next week.

Breaking changes

Feeds API responses do not contain the fields “honeypots”, “cowrie” and “log4j” anymore.
Log4Pot-specific data handling was removed, because the honeypot is not that relevant anymore.
The possibility to use legacy extraction with an 11 minute time window has been removed. The LEGACY_EXTRACTION switch in the env_file will be ignored.

Highlights

We are using the Elasticsearch client version 9 now to match T-Pots recent migration to ES9.
GreedyBear now dynamically supports all honeypots that are actively collecting data in the attached T-Pot instance.
A shiny new API endpoint that aggregates IOC data by ASN was built by Dorna Raj Gyawali.
Automated ingestion of FireHol blocklists enriches IOCs with threat intelligence categories thanks to Krishna Awasthi.
Users can now authenticate using email instead of just username, thanks to the work of ManaswibRane.
Self-hosted instances can now set their own license text (or none) via environment variable thanks to Krishna Awasthi.
The monitoring jobs can now send alerts via ntfy thanks to Varandani Harsh Pramod
GreedyBear now extracts and tracks Tor exit nodes as a dedicated data source thanks to Sumit Das
And a lot of additional stuff happened under the hood. Thank you Shivraj Suman, Srijan, Amisha Chhajed, Ravi Teja Bhagavatula and Eshaan Gupta.

GreedyBear version 2.0 released

2025-10-03T00:00:00+00:00

Almost four years have passed since the GreedyBear launch in 2021. Much has changed since then, and some of the underlying technologies require an update. That’s why we are releasing a new major version of GreedyBear which comes with the most current versions of Django (5.2) and PostgreSQL (18). These changes will ensure our project remains greedy and up-to-date for years to come but require some manual intervention. You can find a detailed upgrade guide here.

Improvements to GreedyBear

2025-05-28T00:00:00+00:00

Over the past few months I wrote my Master’s thesis about improving threat intelligence generated from honeypot data. For this purpose I made some changes to the GreedyBear project from Matteo Lodi, who greatly supported my coding work.

New feeds

The core of my work is the development and comparison of scoring models which try to predict future honeypot interactions. As a result of this comparison, two of these models were integrated into GreedyBear and already do their work on the Honeynet instance:

The first model is a Random Forest classifier, a machine learning model that predicts binary events. In our case, for each known IP address it estimates the probability that this IP address will hit any honeypot in the next 24 hours. GreedyBear now offers a feed that orders its entries by that probability such that the most likely IP addresses to reoccur are at the top of the list.

The second model, a Random Forest regressor, predicts the number of honeypot hits that we can expect from an IP address in the next 24 hours. Analogous to the “likely to reoccur” feed from the classifier model, GreedyBear now also offers the “most expected hits” feed which is based on the prediction of the regressor model.

Both predictions, along with some other new information, are also included in every ‘json’ based GreedyBear feed. For details about the different feeds and their contents, please refer to the documentation.

Command sequences

The Cowrie honeypot records the sequence of commands which an attacker executes during a SSH session. These command sequences and their relation to the IP addresses which executed them, are now also extracted and stored by GreedyBear. The new command sequence API supports two kinds of requests:

You can send an IP address and receive every command sequence which was executed by this address.
You can send a SHA256 hash of a (correctly formatted) command sequence and receive every IP address that executed this sequence.

In addition there is a clustering feature, which groups similar command sequences together, allowing for a “fuzzy” search using the ‘include_similar’ query parameter. If this parameter is used, the result will also contain IP addresses that executed similar command to the one requested. In my testing, this feature allowed me to attribute more than 2000 IP addresses to the ‘mdrfckr’ botnet on my personal instance of GreedyBear. On the Honeynet instance, the clustering feature is currently not activated, as it is very resource hungry. I’ll try to make it more efficient soon(ish). :)

If you are interested in reading into my full thesis, you can find it here. If you want to get in touch, you can find me on Mastodon.

Presenting GreedyBear

2023-07-20T00:00:00+00:00

GreedyBear is a tool that was created mainly to help to extract Indicators of Compromise from one or more available TPOTs. For those who do not know this tool, we are talking about the most popular all-in-one honeypot available in the community. While the T-POT is great in allowing a fast, easy and reliable installation and collection of data, it struggles in organizing that data in a way that they can be easily collected and disseminated. This is where GreedyBear comes in and becomes the Threat Intelligence Platform for the TPOT.

Started as a personal Christmas project from Matteo Lodi, since then GreedyBear is being improved mainly thanks to the efforts of the Certego Threat Intelligence Team.

It has been evolved to a fully operational web application which provides convenient ways to explore and search extracted data and fully fledged API REST to programmatically extract them.

Thanks to the efforts of The Honeynet Project, we have a public site which allows us to share the data collected from the TPOTs of this organization. Check the official site here!

Happy hunting!