Skip to content

0xfiending/honey-faucet-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

honey-faucet-rs

DESIGN

Event-Driven Data Warehouse Creation for NLP and on-chain analytics, with a focus on NFT data.

A 'flow' is an ETL pipeline sequence that generates an output, in the form of either transformed data or analysis.
A 'flow_step' signifies an operation on set of data. (copy, move, transform, ...)
The modular design provides a framework for building scalable, custom ETL pipelines.

CLI

The CLI tool is designed to query specific subjects of interest prior to setting up a pipeline.
The current set of supported actions hit the twitter v2 api endpoint.

Suggested Use (Topic Search):

  • The [Counts] command should be used to gauge how much data there is for a particular topic.
  • The [Recent] command should be used to view the first 100 results of raw tweet data for a particular topic.
  • At this point, if enough data is available for a particular topic, a flow can be set up for it.
  • Otherwise, one can do a deep dive using the [Tweet Lookup], [User Timeline], [Mentions Timeline], or [Users Lookup] commands.

Suggested Use (Account R&D - Persons/Projects of Interest):

  • If the unique user name is known, then the user_id can be found using the [Users Lookup] command.
  • Next, the [User Timeline] or [Mentions Timeline] commands can be used to view a portion of the timeline data for a particular user.

Recent Command

Counts Command

Tweet Lookup Command

User Timeline Command

Users Lookup Command

AUTOMATED PIPELINE EXECUTION

flow-controller - This mechanism will facilitate the booting and stopping of jobs for the day based on configured cron schedules.
job-controller - This mechanism will facilitate the scheduling and execution of job steps.

SUPPORTED FEATURES

nlp-recent-topic-land - This flow step will pull and land recents data for a topic.
nlp-user-timeline-land - This flow step will pull and land standard timeline data for a particular user.
nlp-topic-land - This flow step will pull and land data specified by date for a topic. (WIP - R&D for v1.1 endpoint for archive search)

NOTES

*design is subject to change as implementation progresses.
methodology is agile and re-factoring takes place after each feature is finished.

Current NFT Sentiment Analysis Design:
- Collect data for an initial Training and Test Set.
- Perform standard sentiment analysis on tweet text for a particular topic.
- R&D to include tweet impressions in analysis (likes + retweets + comments)
- Aggregate Persons of Interest as separate topics and include them for the analysis. (EX. Do Kwon for topic = "LUNA")
---- Avoid shitposters, anime pfps, and trolls
- Aggregate Projects of Interest as separate topics and include them for the analysis. (EX: Cyberkongz,Nansen.ai,LooksRare for topic = "NFT")
- R&D on analysis tuning

The current design will consist of a 4-step Flow.
- The first step will be called nlp-topic-land.
---- This step will pull data down from Twitter and save it in a parquet file format.
---- The second step will remove duplicate tweets. (rm_dups)
- The third step will load and place a dataset in a ml pipeline in spark to prep data.
- The fourth step will make classifications on a data set based on various algorithms. (Naive Bayes, SVM, Logistic Regression, LSTM, etc...)
- The idea is to be able to provide sentiment classification (positive / negative) for a particular topic for a particular interval (Past 24HR, Past 7D, Past 30D).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages