Skip to content

Commit

Permalink
Merge pull request #34 from UW-Macrostrat/auth-system
Browse files Browse the repository at this point in the history
Macrostrat auth system module
  • Loading branch information
davenquinn authored Oct 17, 2024
2 parents 0cca31c + 6e8de1d commit 488d117
Show file tree
Hide file tree
Showing 20 changed files with 3,625 additions and 726 deletions.
43 changes: 29 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,40 @@
A monorepo containing Python-based tools and libraries for Earth data projects.

- This is still very early-stage.
- The intent is to share common subsystems between Sparrow, Macrostrat and other tools.
- All modules can be consumed as PyPI packages, or embedded locally as a submodule (though this is less-recommended).
- The intent is to share common subsystems between Sparrow, Macrostrat and other
tools.
- All modules can be consumed as PyPI packages, or embedded locally as a
submodule (though this is less-recommended).

## Modules

- `macrostrat.app_frame`: A control framework for manging Dockerized applications. Currently used by Sparrow, Mapboard GIS, and Macrostrat.
- `macrostrat.database`: Database connection and query utilities geared towards PostgreSQL
- `macrostrat.dinosaur`: Utilities for on-the-fly database migration and conformance testing
- `macrostrat.app_frame`: A control framework for manging Dockerized
applications. Currently used by Sparrow, Mapboard GIS, and Macrostrat.
- `macrostrat.auth_system`: Authentication utilities
- `macrostrat.database`: Database connection and query utilities geared towards
PostgreSQL
- `macrostrat.dinosaur`: Utilities for on-the-fly database migration and
conformance testing
- `macrostrat.utils`: Helpers for logging and command-line apps

## Development

You need `python >= 3.8` and the `poetry` package manager (installed separately) to develop the modules here.
Running `poetry install` (aliased to `make`) bootstraps the project in a local virtual environment.
You need `python >= 3.8` and the `poetry` package manager (installed separately)
to develop the modules here.
Running `poetry install` (aliased to `make`) bootstraps the project in a local
virtual environment.

Dependencies can be installed by adding them to the respective `pyproject.toml` files or by running `poetry add ...`.
Make sure to keep development dependencies (e.g., for testing) separate from core package dependencies.
`poetry add -D ...` adds dependencies that will only be installed in development, analogous to NPM and Yarn.
Dependencies can be installed by adding them to the respective `pyproject.toml`
files or by running `poetry add ...`.
Make sure to keep development dependencies (e.g., for testing) separate from
core package dependencies.
`poetry add -D ...` adds dependencies that will only be installed in
development, analogous to NPM and Yarn.

## Testing

Tests can be run using `make test`, or, for added control, `poetry run pytest ...`.
Tests can be run using `make test`, or, for added control,
`poetry run pytest ...`.
Docker is required to run all tests, as some of them require several containers.

### Testing the `macrostrat.app_frame` module
Expand All @@ -40,11 +52,14 @@ This repository is designed to facilitate rapid iteration of its components
and release to PyPI. All modules are part of the `macrostrat` namespace package:
`macrostrat.database`, `macrostrat.dinosaur`, `macrostrat.utils`, etc.

To release a new version of a module, increment its `pyproject.toml` file and
To release a new version of a module, increment its version in the appropriate
`pyproject.toml` file and
run `make publish`. This will run a publication script that checks for current
versions and publishes if none exist.

## Structure and similar projects

- [Macrostrat's web component libraries](https://github.com/UW-Macrostrat/web-components) are also structured as a monorepo.
- [Opendoor Labs' Python monorepo](https://medium.com/opendoor-labs/our-python-monorepo-d34028f2b6fa) is a reference for code organization
- [Macrostrat's web component libraries](https://github.com/UW-Macrostrat/web-components)
are also structured as a monorepo.
- [Opendoor Labs' Python monorepo](https://medium.com/opendoor-labs/our-python-monorepo-d34028f2b6fa)
is a reference for code organization
31 changes: 31 additions & 0 deletions auth-system/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Changelog

All notable changes to this project will be documented in this file.

## [0.1.0] - 2024-10-17

### Bring in legacy Sparrow authentication system

We started this module by copying Sparrow's authentication code
at commit `ff0284620462fcaff127069ee00cef91b6412fa5` (2024-09-19). We have
begun by excising Sparrow-specific code and replacing it with a more
generalized authentication system.

- Remove user model code from `sparrow.database`
- Get all tests to pass by mocking database

There are now 16 passing tests of the old auth system!
These can be run with `poetry run pytest auth-system`

### Begin bringing in Macrostrat's newer ORCID-based authentication system

- Copied the Macrostrat v2 security model from Macrostrat-xdd repository [commit
`79330fa`](https://github.com/UW-Macrostrat/macrostrat-xdd/commit/79d30fa3fe3be62ca80cedc69752d3825fabadbf).
- Made minimal changes to align with the new module structure.

## [1.0.0]

- Integrate the system more closely with the `macrostrat.database` module
- Update to newer versions of `pyjwt` and `werkzeug`.
- Use `ContextVar` rather than global variables for session storage.
- Rename `orcid` -> `core` to reflect the uncertain scope of the module.
31 changes: 31 additions & 0 deletions auth-system/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Macrostrat authentication system

This module contains tools to manipulate Macrostrat's user authentication
system. It is divided into two submodules:

- `macrostrat.auth_system.legacy`: A JWT-based authentication system relying on
local storage of hashed passwords. This system was created as part
of [Sparrow](https://sparrow-data.org) and is being phased out in favor of a
more modern system based on ORCID.
- `macrostrat.auth_system.core`: An ORCID-based user
authentication system. This system will become the primary authentication
system for Macrostrat, but it is still in development.

We plan to gradually converge the functionality of both versions while phasing
out the legacy system.

The system has tests that can be run with `poetry run pytest auth-system`
(currently, only the legacy system is covered).

## Key planned functionality

- Allow many Macrostrat-hosted services to easily integrate with Macrostrat's
login and token flow
- Allow APIs to easily validate user credentials and tokens with minimum
overhead
- Allow access to be checked in multiple ways:
- Cookies and headers
- Limited-time JWT tokens and long-duration, cancelable API tokens
- Verify against Macrostrat "user group" or application-specific criteria (
e.g., a list of authorized ORCID IDs)

68 changes: 68 additions & 0 deletions auth-system/docs/Version 1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Macrostrat auth system v1

The first version of Sparrow's auth system was built on Flask-JWT-Extended.

- https://flask-jwt-extended.readthedocs.io/en/stable/
- https://github.com/vimalloc/flask-jwt-extended

This library is something of a monolith, and Starlette's session middleware
infrastructure provides a way to build something much cleaner and more sophisticated.

Starlette authentication:

- https://www.starlette.io/authentication/

Prior art:

- https://github.com/amitripshtos/starlette-jwt/tree/master/starlette_jwt [Uses headers]
- https://github.com/retnikt/star_jwt/blob/master/star_jwt/backend.py [Uses cookies]
- https://www.starlette.io/authentication/

Context vars

- https://github.com/encode/starlette/issues/420

A nice explanation of JWT:

- https://jwt.io/

We store JWT tokens in cookies because it's more secure.
- https://flask-jwt-extended.readthedocs.io/en/latest/tokens_in_cookies.html

However, we might want to include storing tokens API headers for programmatic access...

## Misc. links

Implement authentication using JSON Web Tokens
https://codeburst.io/jwt-authorization-in-flask-c63c1acf4eeb
Authentication is managed on the browser using cookies
to protect against XSS attacks.
https://flask-jwt-extended.readthedocs.io/en/latest/tokens_in_cookies.html

ORCID login
https://members.orcid.org/api/integrate/orcid-sign-in

## Example usage

Usage in Sparrow:

```python
from macrostrat.auth_system.legacy.api import AuthAPI
from macrostrat.auth_system.legacy.backend import JWTBackend
from starlette.authentication import AuthenticationError # noqa
from starlette.middleware.authentication import AuthenticationMiddleware
from sparrow.plugins import SparrowCorePlugin


class AuthPlugin(SparrowCorePlugin):
name = "auth"

backend = JWTBackend(environ.get("SPARROW_SECRET_KEY", ""))

def on_asgi_setup(self, api):
api.add_middleware(AuthenticationMiddleware, backend=self.backend)

def on_api_initialized_v2(self, api):
api.mount("/auth", AuthAPI, name="auth")

```
Empty file.
7 changes: 7 additions & 0 deletions auth-system/macrostrat/auth_system/core/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from .main import (
get_groups_from_header_token,
get_user_token_from_cookie,
get_groups,
has_access,
get_user_id
)
77 changes: 77 additions & 0 deletions auth-system/macrostrat/auth_system/core/database.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
import datetime
from contextvars import ContextVar
from typing import Optional

from sqlalchemy import Engine
from sqlalchemy import select, update
from sqlalchemy.orm import sessionmaker, declarative_base, Session

from macrostrat.database import Database
from .schema import Token


def get_access_token(token: str):
"""The sole database call"""

session_maker = get_session_maker()
with session_maker() as session:

select_stmt = select(Token).where(Token.token == token)

# Check that the token exists
result = (session.scalars(select_stmt)).first()

# Check if it has expired
if result.expires_on < datetime.datetime.now(datetime.timezone.utc):
return None

# Update the used_on column
if result is not None:
stmt = (
update(Token)
.where(Token.token == token)
.values(used_on=datetime.datetime.utcnow())
)
session.execute(stmt)
session.commit()

return (session.scalars(select_stmt)).first()


_database: ContextVar[Optional[Database]] = ContextVar("database", default=None)
_base: ContextVar[Optional[declarative_base]] = ContextVar(
"declarative_base", default=None
)


def get_database():
return _database.get()


def get_engine() -> Engine:
return get_database().engine


def get_base() -> declarative_base:
return _base.get()


def connect_engine(uri: str, schema: str):
database = Database(uri)

base = declarative_base()
base.metadata.reflect(database.engine)
base.metadata.reflect(database.engine, schema=schema, views=True)


def dispose_engine():
get_engine().dispose()


def get_session_maker() -> sessionmaker:
return sessionmaker(autocommit=False, autoflush=False, bind=get_engine())


def get_session() -> Session:
with get_session_maker()() as s:
yield s
Loading

0 comments on commit 488d117

Please sign in to comment.