Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge dev branch into master #1

Merged
merged 38 commits into from
Jun 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
594c641
fix: blob showcnt
hrz6976 Feb 26, 2024
978eed9
perf: fix hexstrify takes too long
hrz6976 Jun 3, 2024
b35eea5
chore: enable line profiling
hrz6976 Jun 3, 2024
223cb44
feat: add timer logs
hrz6976 Jun 3, 2024
1eeebcc
chore: add file exists check
hrz6976 Jun 3, 2024
f962bdc
feat: detect encoding when utf8 fails
hrz6976 Jun 3, 2024
0fa7f7c
feat: add perl-like shell api
hrz6976 Jun 3, 2024
5b1baf6
refactor: decouple woc.tch and woc.local
hrz6976 Jun 4, 2024
e56c4bc
test: add tests for woc.tch
hrz6976 Jun 4, 2024
c0a3757
test: fixture generation -> py3
hrz6976 Jun 4, 2024
8c70395
test: add test cases for woc.local
hrz6976 Jun 4, 2024
cabcaa9
build: toolchain upgrade
hrz6976 Jun 5, 2024
eeb6349
fix: test_tch with poetry
hrz6976 Jun 5, 2024
d9fc436
ci: add gh actions unit tests
hrz6976 Jun 5, 2024
b70703c
test: add coverage
hrz6976 Jun 5, 2024
4862c2d
fix: pyi compatibility with py38
hrz6976 Jun 5, 2024
e76a6fc
refactor: extract get_pos & parse_commit
hrz6976 Jun 5, 2024
df5462d
feat: add profile option to cli & add tests
hrz6976 Jun 5, 2024
edb8ebf
refactor: extract apis & rework parsers in cython
hrz6976 Jun 6, 2024
f0bd65a
feat: add objects api
hrz6976 Jun 6, 2024
ade23ab
test: add tests for objects api
hrz6976 Jun 6, 2024
f69d242
feat: port util methods from oscar.py
hrz6976 Jun 6, 2024
30477c6
feat: get_values support for (o)bb2cf & c2fbb
hrz6976 Jun 8, 2024
86bc8df
feat: add c2fbb, (o)bb2cf and other maps to oop api
hrz6976 Jun 8, 2024
8b8410b
refactor: TCHashDB(path) from bytes to str
hrz6976 Jun 9, 2024
790b845
feat: count the number of keys
hrz6976 Jun 9, 2024
bffcd9b
docs: add pdoc and ruff
hrz6976 Jun 10, 2024
a2d8bc5
build: add pre-commit config
hrz6976 Jun 11, 2024
888b099
style: format sources
hrz6976 Jun 11, 2024
8be2466
chore: cleanup unused assets
hrz6976 Jun 11, 2024
66f7baa
feat: expose WocMaps.maps / objects
hrz6976 Jun 11, 2024
a2038ad
docs: update README
hrz6976 Jun 11, 2024
88b5289
fix: handle decompress returns None
hrz6976 Jun 11, 2024
e11982b
style: add conventional commits checker
hrz6976 Jun 11, 2024
808eb51
revert: remove bb2cf quirk
hrz6976 Jun 12, 2024
4c2bb5a
docs: update contributing.md
hrz6976 Jun 12, 2024
44305bf
docs: add favicon, drop large images
hrz6976 Jun 12, 2024
2c28904
ci: add build wheel pipeline
hrz6976 Jun 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions .github/workflows/build-wheel.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Run unit tests on every push

on: [push]

jobs:
build:
name: Build wheels for Python ${{ matrix.python-version }}
runs-on: ubuntu-latest
if: github.ref_name == github.event.repository.default_branch

strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]

steps:
- name: Checkout
uses: actions/checkout@v3

- name: Install Poetry
run: |
PIPX_BIN_DIR=/usr/local/bin pipx install poetry

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: poetry
cache-dependency-path: poetry.lock

- name: Set Poetry environment
run: |
poetry env use ${{ matrix.python-version }}

- name: Install dependencies
run: |
poetry install

- name: Build wheels
run: |
poetry build

- name: Upload wheels
uses: actions/upload-artifact@v2
with:
path: dist/*.whl

- name: Upload source distribution
uses: actions/upload-artifact@v2
if: matrix.python-version == '3.8'
with:
path: dist/*.tar.gz
File renamed without changes.
38 changes: 20 additions & 18 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,33 +5,35 @@ on: [push, pull_request]
jobs:
test:
name: Python ${{ matrix.python-version }} tests
runs-on: ubuntu-20.04
runs-on: ubuntu-latest

strategy:
matrix:
python-version: [3.6, 3.8]
python-version: ["3.8", "3.9", "3.10", "3.11"]

steps:
- uses: actions/checkout@v2
- name: Checkout
uses: actions/checkout@v3

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
- name: Install Poetry
run: |
PIPX_BIN_DIR=/usr/local/bin pipx install poetry

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: poetry
cache-dependency-path: poetry.lock

- name: Cache pip
uses: actions/cache@v1
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('setup.py') }}
restore-keys: |
${{ runner.os }}-pip-
- name: Set Poetry environment
run: |
poetry env use ${{ matrix.python-version }}

- name: Install dependencies (Python ${{ matrix.python-version }})
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install typing cython setuptools>=18.0
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
poetry install

- name: Run tests on Python ${{ matrix.python-version }}
run: make test_local
- name: Run tests
run: |
poetry run pytest -v
43 changes: 43 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
ci:
skip: [pytest]

default_language_version:
python: python3.8

repos:
# ruff: linting + formatting
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.4
hooks:
- id: ruff
args: ["--fix"]
- id: ruff-format

# pytest: testing
- repo: local
hooks:
- id: pytest
name: pytest
entry: poetry run pytest
language: system
types: [python]
pass_filenames: false

# enforce conventional commit messages
- repo: https://github.com/compilerla/conventional-pre-commit
rev: v3.2.0
hooks:
- id: conventional-pre-commit
stages: [commit-msg]
args: []

# # skip poetry check for now, it's large and slow
# # poetry: check lock and generate requirements.txt
# - repo: https://github.com/python-poetry/poetry
# rev: 1.8.3
# hooks:
# - id: poetry-check
# args: ["--lock"]
# - id: poetry-export
# args: ["-f", "requirements.txt", "--with", "build", "--output", "requirements.txt"]
# verbose: true
1 change: 0 additions & 1 deletion MANIFEST.in

This file was deleted.

33 changes: 0 additions & 33 deletions Makefile

This file was deleted.

138 changes: 122 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,132 @@
# Python interface for OSCAR data
# python-woc

**python-woc** is the python interface to the World of Code (WoC) data.
It precedes the [oscar.py](https://ssc-oscar.github.io/oscar.py) project and is hundreds of times faster than the invoking [lookup](https://github.com/ssc-oscar/lookup) scripts via subprocess.

This is a convenience library to access World of Code data
(WoC; it was referred internally as oscar while development, hence the name).
Since everything is stored in local files it won't work unless you have access
to one of the WoC servers.
## Requirements

### Installation
- Linux with a GNU toolchain (only tested on x86_64, Ubuntu / CentOS)

Normally it is preinstalled on WoC servers. To install manually,
e.g. to a virtual environment not using system packages, just use:
- Python 3.8 or later

```shell
python3 setup.py build_ext
python3 setup.py install --user
## Install python-woc

### From PyPI

The latest version of `python-woc` is available on PyPI and can be installed using `pip`:

```bash
pip3 install python-woc
```

### From Source

To try out latest features, you may install python-woc from source:

```bash
git clone https://github.com/ssc-oscar/python-woc.git
cd python-woc
python3 -m pip install -r requirements.txt
python3
```

## Generate Profiles

One of the major improvents packed in python-woc is profile. Profiles tell the driver what versions of what maps are available, decoupling the driver from the folder structure of the data. It grants the driver the ability to work with multiple versions of WoC, on a different machine, or even on the cloud.

Profiles are generated using the `woc.detect` script. The script takes a list of directories, scans for matched filenames, and generates a profile:

```bash
python3 woc.detect /path/to/woc/1 /path/to/woc/2 ... > wocprofile.json
```

By default, python-woc looks for `wocprofile.json`, `~/.wocprofile.json`, and `/etc/wocprofile.json` for the profile.

## Use CLI

python-woc's CLI is a drop-in replacement for the `getValues` and `showCnt` perl scripts. We expect existing scripts to be work just well with the following:

```bash
alias getValues='python3 -m woc.get_values'
alias showCnt='python3 -m woc.show_content'
```

Installing from sources requires extra tools to compile (cython,
manylinux docker image etc), but still possible. Refer to the
[Build page](https://ssc-oscar.github.io/oscar.py) in the reference.
The usage is the same as the original scripts, and the output should be identical:

```bash
# echo some_key | echo python3 -m woc.get_values some_map
> echo e4af89166a17785c1d741b8b1d5775f3223f510f | showCnt commit 3
tree f1b66dcca490b5c4455af319bc961a34f69c72c2
parent c19ff598808b181f1ab2383ff0214520cb3ec659
author Audris Mockus <[email protected]> 1410029988 -0400
committer Audris Mockus <[email protected]> 1410029988 -0400

News for Sep 5
```

### Reference
You may find more examples in the [lookup](https://github.com/ssc-oscar/lookup#ov-readme) repository.
If you find any incompatibilities, please [submit an issue report](https://github.com/ssc-oscar/python-woc/issues/new).

Please see <https://ssc-oscar.github.io/oscar.py> for the full reference.
## Use Python API

The python API is designed to get rid of the overhead of invoking the perl scripts via subprocess. It is also more native to python and provides a more intuitive interface.

With a `wocprofile.json`, you can create a `WocMapsLocal` object and access the maps in the file system:

```python
>>> from woc.local import WocMapsLocal
>>> woc = WocMapsLocal()
>>> woc.maps
{'p2c', 'a2b', 'c2ta', 'a2c', 'c2h', 'b2tac', 'a2p', 'a2f', 'c2pc', 'c2dat', 'b2c', 'P2p', 'P2c', 'c2b', 'f2b', 'b2f', 'c2p', 'P2A', 'b2fa', 'c2f', 'p2P', 'f2a', 'p2a', 'c2cc', 'f2c', 'c2r', 'b2P'}
```

To query the maps, you can use the `get_values` method:

```python
>>> woc.get_values("b2fa", "05fe634ca4c8386349ac519f899145c75fff4169")
('1410029988', 'Audris Mockus <[email protected]>', 'e4af89166a17785c1d741b8b1d5775f3223f510f')
>>> woc.get_values("c2b", "e4af89166a17785c1d741b8b1d5775f3223f510f")
['05fe634ca4c8386349ac519f899145c75fff4169']
>>> woc.get_values("b2tac", "05fe634ca4c8386349ac519f899145c75fff4169")
[('1410029988', 'Audris Mockus <[email protected]>', 'e4af89166a17785c1d741b8b1d5775f3223f510f')]
```

Use `show_content` to get the content of a blob, a commit, or a tree:

```python
>>> woc.show_content("tree", "f1b66dcca490b5c4455af319bc961a34f69c72c2")
[('100644', 'README.md', '05fe634ca4c8386349ac519f899145c75fff4169'), ('100644', 'course.pdf', 'dfcd0359bfb5140b096f69d5fad3c7066f101389')]
>>> woc.show_content("commit", "e4af89166a17785c1d741b8b1d5775f3223f510f")
('f1b66dcca490b5c4455af319bc961a34f69c72c2', ('c19ff598808b181f1ab2383ff0214520cb3ec659',), ('Audris Mockus <[email protected]>', '1410029988', '-0400'), ('Audris Mockus <[email protected]>', '1410029988', '-0400'), 'News for Sep 5')
>>> woc.show_content("blob", "05fe634ca4c8386349ac519f899145c75fff4169")
'# Syllabus for "Fundamentals of Digital Archeology"\n\n## News\n\n* ...'
```

Note that the function yields different types for different maps. Please refer to the [documentation](https://ssc-oscar.github.io/python-woc) for details.

## Use Python Objects API

The objects API provides a more intuitive way to access the WoC data.
Note that the objects API is not a replacement to [oscar.py](https://ssc-oscar.github.io/oscar.py) even looks pretty much like the same: many of the methods have their signatures changed and refactored to be more consistent, intuitive and performant. Query results are cached, so you can access the same object multiple times without additional overhead.

Call `init_woc_objects` to initialize the objects API with a WoC instance:

```python
from woc.local import WocMapsLocal
from woc.objects import init_woc_objects
woc = WocMapsLocal()
init_woc_objects(woc)
```

To get the tree of a commit:

```python
from woc.objects import Commit
>>> c1 = Commit("91f4da4c173e41ffbf0d9ecbe2f07f3a3296933c")
>>> c1.tree
Tree(836f04d5b374033b1608269e2f3aaabae263a0db)
>>> c1.projects[0].url
'https://github.com/woc-hack/thebridge'
```

For more, check `woc.objects` in the documentation.
2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.html
*.js
2 changes: 0 additions & 2 deletions docs/DataFormat.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@


## Git objects

### Sequential access:
Expand Down
18 changes: 0 additions & 18 deletions docs/_static/custom.css

This file was deleted.

Loading
Loading