Skip to content

Latest commit

 

History

History
executable file
·
86 lines (51 loc) · 2.96 KB

README.md

File metadata and controls

executable file
·
86 lines (51 loc) · 2.96 KB

Funnel

Funnel is a lightweight yara-based feed scraper. Give a list of inputs and it will check them. Put it in a crontab and it will regularly update the database. If the article gets matched to the yara rule, it will be put into the database. All matched results get put into an sqlite database, with the rule it flagged.

Installation

Install your required dependencies and you're good to go.

pip3 install -r requirements.txt

Usage:

Funnel.py [-h] [-v] [-u] rule_path target_path

positional arguments:
  rule_path      path to directory of rules used on list of feeds
  target_path    path to sources list or url

optional arguments:
  -h, --help     show this help message and exit
  -v, --verbose  increase output verbosity
  -u, --url      scan one url instead of using sources list

Example:

You want to get every new post on the internet that has your name or personal info in it. You would use as many sources as possible,and fill out the personal_info.yar rule.

Schedule this command to run regularly using crontab:

python3 Funnel.py rules/personal/ sources/sources-large.json

Want to scan just one url to see if it matches against any of your rule set?

python3 Funnel.py -u rules/ https://www.bbc.com/news/world-asia-47844000

A bar that wants all the newest margharita recipes? You could do that. Every single post about a politician, for a data visualization project on how much each person is talked about? Works too! Just add rules and sources.

Sources

The sources should be in a json file, with a url and a title for each source in the list. Here is a barebones example:

{
    "sources-rss":[
        {
                "url": "https://www.reddit.com/r/netsec/.rss",
                "title": "netsec subreddit"
        },
        {
                "url": "https://www.reddit.com/r/malware/.rss",
                "title": "malware subreddit"
        }

    ]

}

Tip: Extract sources from feedly by using the opml_to_json.py file in the util folder to turn your exported feedly opml file into a valid sources file

Rules

Some sample rules have been provided in the rules folder. Any standard yara rule will work, it is always being compared on text content at this point, no file analysis yet. You can pass in either a directory of rules, a nested directory of rules, or just one rule.

Database

The database is in sqlite, and works with two tables. The first, is a table of links of matched articles, which have a unique id. The second table is a table of the matched rules with the matched article's id together. This keeps duplicates out of the links table, and makes for easy reference.

Contribute

Feel free to add your suggestions for what to add to this project, even better if you give me a pull request!

Inspired by ThreatIngestor from InQuest