Skip to content

Commit

Permalink
add tensorflow example (#42)
Browse files Browse the repository at this point in the history
* first version from google docs

* in-progress effort on tensorflow. submitting for visibility of status

* accuracy tweak

* fix description

* small tweak

* save the link id

* fixing parsing of json article

* adding in rapidApi integration

* final working script, with minimal API usage

* final README.md

* fix formatting

* fix formatting

* removing duplicate import

* Update README.md

* adding line to main import

* fix url

Co-authored-by: margaretkennedy <[email protected]>
  • Loading branch information
rachelmbrubaker and margaretkennedy authored Sep 23, 2021
1 parent a43729f commit c9e1dc2
Show file tree
Hide file tree
Showing 4 changed files with 403 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ The following folders can be found in this repository:
- **[`metriccentury`](https://github.com/deephaven/examples/tree/main/metriccentury)** - Data recorded from a 100 km bike ride
- **[`pems`](https://pems.dot.ca.gov/)** - Traffic flow data collected near Davis, CA.
- **[`taxi`](https://azure.microsoft.com/en-us/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/)** - Yellow Taxi trip records
- **[`tensorflow`](https://www.tensorflow.org/)** - Statistically calculate positive/negative sentiment using machine-learning
training mechanisms based on an RSS feed from Seeking Alpha.
- **[`fit`](https://www.strava.com/)** - Workout results in the proprietary fit format developed by Garmin. Downloadable from Strava.
- **[`tickingHeartRate`]** - Simulated ticking heart rate data.

Expand Down
36 changes: 36 additions & 0 deletions tensorflow/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Tensorflow example demonstrating data from Seeking Alpha

Pull a RSS feed from Seeking Alpha, and statistically calculate positive/negative sentiment using machine-learning
training mechanisms.

## Table of contents

* `tensorflow.py` - Python script to run.
* `trainData.csv` - The input data to train the AI algorithm.

## Steps to run

1. Install Python modules:
`docker exec $(basename $(pwd))_grpc-api_1 pip install tensorflow tensorflow_hub sklearn spacy bs4 lxml`
Note: please use this exact install mechanism, rather than variations
from [How to install Python packages](https://deephaven.io/core/docs/how-to-guides/install-python-packages).
The lxml installation is somewhat fragile in allowing bs4 to see that it has been installed.
See <https://github.com/deephaven/deephaven-core/discussions/1299> for more information.
1. Install the spacy english module:
`docker exec $(basename $(pwd))_grpc-api_1 pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz`
Alternatively, use another version from here:
<https://github.com/explosion/spacy-models/releases/>
1. Drag/drop the file `trainData.csv` onto the Deephaven console.
1. Get a login to <https://rapidapi.com/developer> (free) and subscribe to <https://rapidapi.com/apidojo/api/seeking-alpha/>.
* Note that every time you run the script, you will consume some quota of your API usage for this particular
endpoint. This is kept minimal: a single API access of each published article being advertised by Seeking Alpha
on any one day (using the `knownLinks[]` variable within the script). However, to allow repeated iterations for
debug/troubleshooting, all variables are reset on a new script run, and hence another round of API calls is
required for each run.
* The number of API calls per day is usually small(~5-30), so provided the script is only run once-per-day, the free
tier of 500 calls/month should be adequate for demonstrative purposes.
* API call usage can be seen here: <https://rapidapi.com/developer/dashboard>
1. Look at any of the endpoint examples, and **select+save** your unique endpoint API key. It is called `x-rapidapi-key`.
1. Import your key into Deephaven by running:
`ra_sa_key='enter-your-key-here'` (avoiding any additional space/quote characters)
1. Run `tensorflow.py`.
Loading

0 comments on commit c9e1dc2

Please sign in to comment.