Skip to content

Commit

Permalink
Merge pull request #6 from CMUSTRUDEL/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
user2589 authored Jul 25, 2020
2 parents 1eaad7f + c3299f7 commit a3a38aa
Show file tree
Hide file tree
Showing 20 changed files with 604 additions and 1,869 deletions.
39 changes: 39 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Semantic Release

on:
push:
branches: [ master ]

jobs:
release:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Python Semantic Release
uses: relekang/[email protected]
with:
pypi_token: ${{ secrets.PYPI_TOKEN }}

pages:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
with:
ref: gh-pages
path: docs/build/html
- name: GitHub pages
- run: |
python -m pip install --upgrade pip
pip install sphinx sphinx-autobuild
sphinx-build -M html "docs" "docs/build"
git config user.name github-actions
git config user.email [email protected]
cd docs/build/html
git add .
git commit -m "github pages"
git push
38 changes: 38 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Run unit tests on every push

on: push

jobs:
test:
name: Python ${{ matrix.python-version }} tests
runs-on: ubuntu-latest

strategy:
matrix:
python-version: [2.7, 3.6, 3.7, 3.8]

steps:
- uses: actions/checkout@v2

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}

- name: Cache pip
uses: actions/cache@v1
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('setup.py') }}
restore-keys: |
${{ runner.os }}-pip-
- name: Install dependencies (Python ${{ matrix.python-version }})
run: |
python -m pip install --upgrade pip
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Run tests on Python ${{ matrix.python-version }}
env:
GITHUB_API_TOKENS: ${{ secrets.GH_API_TOKENS }}
run: make test
51 changes: 0 additions & 51 deletions .travis.yml

This file was deleted.

115 changes: 1 addition & 114 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,115 +1,2 @@
# Python interface for code hosting platforms API

It is intended to facilitate research of Open Source projects.
At this point, it is basically functional but is missing:

- tests
- documentation
- good architecture

Feel free to contribute any of those.

### Installation

```bash
pip install --user --upgrade strudel.scraper
```


### Usage

```python
import stscraper as scraper
import pandas as pd

gh_api = scraper.GitHubAPI()
# so far only GiHub, Bitbucket and Gitlab are supported
# bb_api = scraper.BitbucketAPI()
# gl_api = scraper.GitLabAPI()

# repo_issues is a generator that can be used
# to instantiate a pandas dataframe
issues = pd.DataFrame(gh_api.repo_issues('cmustrudel/strudel.scraper'))
```



### Settings

GitHub and GitLab APIs limit request rate for unauthenticated requests
(although GitLab limit is much more generous).
There are several ways to set your API keys, listed below in order of priority.

**Important note:** API objects are reused in subsequent calls.
The same keys used to instantiate the first API object will be used by
ALL other instances.

#### Class instantiation:

```python
import stscraper

gh_api = stscraper.GitHubAPI(tokens="comman-separated list of tokens")
```

#### At runtime:

```python
import stscraper
import stutils

# IMPORTANT: do this before creation of the first API object!
stutils.CONFIG['GITHUB_API_TOKENS'] = 'comma-separated list of tokens'
stutils.CONFIG['GITLAB_API_TOKENS'] = 'comma-separated list of tokens'

# any api instance created after this, will use the provided tokens
gh_api = stscraper.GitHubAPI()
```

#### settings file:

```
project root
\
|- my_module
| \- my_file.py
|- settings.py
```

```python
# settings.py

GITHUB_API_TOKENS = 'comma-separated list of tokens'
GITLAB_API_TOKENS = 'comma-separated list of tokens'
```

```python
# my_file.py
import stscraper

# keys from settings.py will be reused automatically
gh_api = stscraper.GitHubAPI()
```

#### Environment variable:


```bash
# somewhere in ~/.bashrc
export GITHUB_API_TOKENS='comma-separated list of tokens'
export GITLAB_API_TOKENS='comma-separated list of tokens'
```

```python
# somewhere in the code
import stscraper

# keys from environment variables will be reused automatically
gh_api = stscraper.GitHubAPI()
```


#### Hub config:

If you have [hub](https://github.com/github/hub) installed and everything else
fails, its configuration will be reused for GitHub API.
Please see https://cmustrudel.github.io/strudel.scraper/ for documentation.
Empty file removed docs/bitbucket.rst
Empty file.
Empty file removed docs/github.rst
Empty file.
Empty file removed docs/gitlab.rst
Empty file.
65 changes: 62 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,66 @@ Reference
.. toctree::
:maxdepth: 2

.. py:module:: stscraper
`stscraper` is a Python interface for GitHub API

Key features:

- utilize multiple API keys to speed up scraping
- transparently handle pagination and minor network errors

Installation
------------

.. code-block:: bash
pip install --user --upgrade strudel.scraper
Usage
-----

The main way to use this module is through :py:class:`GitHubAPI` objects.

.. code-block::
import stscraper as scraper
import pandas as pd
gh_api = scraper.GitHubAPI("token1,token2,...")
# repo_issues is a generator that can be used
# to instantiate a pandas dataframe
issues = pd.DataFrame(gh_api.repo_issues('cmustrudel/strudel.scraper'))
Tokens can be provided either at class instantiation or through an environment
variable:

.. code-block:: bash
# somewhere in ~/.bashrc
export GITHUB_API_TOKENS='comma-separated list of tokens'
.. code-block::
# later, in some Python file:
gh_api = scraper.GitHubAPI() # tokens from the environment var will be used
If no keys were passed at class instantiation and `GITLAB_API_TOKENS`
environment variable is not defined, `stscraper` will also check `GITHUB_TOKEN`
environment variable. This variable is created by GitHub actions runner and also
used by `hub <https://github.com/github/hub)>`_ utility.

REST (v3) API
-------------
.. autoclass:: GitHubAPI
:members:
:exclude-members:

GraphQL (v4) API
----------------

.. autoclass:: GitHubAPIv4
:members:

:doc:`github`
:doc:`gitlab`
:doc:`BitBucket`
26 changes: 0 additions & 26 deletions scripts/check_gh_limits.py

This file was deleted.

Loading

0 comments on commit a3a38aa

Please sign in to comment.