Skip to content

Latest commit

 

History

History
152 lines (108 loc) · 7.08 KB

README.md

File metadata and controls

152 lines (108 loc) · 7.08 KB

obstore

PyPI Conda Version

Simple, fast integration with object storage services like Amazon S3, Google Cloud Storage, Azure Blob Storage, and S3-compliant APIs like Cloudflare R2.

  • Sync and async API.
  • Streaming downloads with configurable chunking.
  • Streaming list, with no need to paginate.
  • File-like object API and fsspec integration.
  • Support for conditional put ("put if not exists"), as well as custom tags and attributes.
  • Automatically uses multipart uploads under the hood for large file objects.
  • Optionally return list results as Arrow, which is faster than materializing Python dict/list objects.
  • Easy to install with no required Python dependencies.
  • The underlying Rust library is production quality and used in large scale production systems, such as the Rust package registry crates.io.
  • Support for zero-copy data exchange from Rust into Python in get_range and get_ranges.
  • Simple API with static type checking.
  • Helpers for constructing from environment variables and boto3.Session objects

Installation

To install obstore using pip:

pip install obstore

Obstore is on conda-forge and can be installed using conda, mamba, or pixi. To install obstore using conda:

conda install -c conda-forge obstore

Documentation

Full documentation is available on the website.

Usage

Constructing a store

Classes to construct a store are exported from the obstore.store submodule:

  • S3Store: Configure a connection to Amazon S3.
  • GCSStore: Configure a connection to Google Cloud Storage.
  • AzureStore: Configure a connection to Microsoft Azure Blob Storage.
  • HTTPStore: Configure a connection to a generic HTTP server
  • LocalStore: Local filesystem storage providing the same object store interface.
  • MemoryStore: A fully in-memory implementation of ObjectStore.

Example

import boto3
from obstore.store import S3Store

session = boto3.Session()
store = S3Store.from_session(session, "bucket-name", config={"AWS_REGION": "us-east-1"})

Configuration

Each store class above has its own configuration, accessible through the config named parameter. This is covered in the docs, and string literals are in the type hints.

Additional HTTP client configuration is available via the client_options named parameter.

Interacting with a store

All methods for interacting with a store are exported as top-level functions (not methods on the store object):

  • copy: Copy an object from one path to another in the same object store.
  • delete: Delete the object at the specified location.
  • get: Return the bytes that are stored at the specified location.
  • head: Return the metadata for the specified location
  • list: List all the objects with the given prefix.
  • put: Save the provided bytes to the specified location
  • rename: Move an object from one path to another in the same object store.

There are a few additional APIs useful for specific use cases:

File-like object support is also provided:

All methods have a comparable async method with the same name plus an _async suffix.

Example

import obstore as obs

store = obs.store.MemoryStore()

obs.put(store, "file.txt", b"hello world!")
response = obs.get(store, "file.txt")
response.meta
# {'path': 'file.txt',
#  'last_modified': datetime.datetime(2024, 10, 21, 16, 19, 45, 102620, tzinfo=datetime.timezone.utc),
#  'size': 12,
#  'e_tag': '0',
#  'version': None}
assert response.bytes() == b"hello world!"

byte_range = obs.get_range(store, "file.txt", offset=0, length=5)
assert byte_range == b"hello"

obs.copy(store, "file.txt", "other.txt")
assert obs.get(store, "other.txt").bytes() == b"hello world!"

All of these methods also have async counterparts, suffixed with _async.

import obstore as obs

store = obs.store.MemoryStore()

await obs.put_async(store, "file.txt", b"hello world!")
response = await obs.get_async(store, "file.txt")
response.meta
# {'path': 'file.txt',
#  'last_modified': datetime.datetime(2024, 10, 21, 16, 20, 36, 477418, tzinfo=datetime.timezone.utc),
#  'size': 12,
#  'e_tag': '0',
#  'version': None}
assert await response.bytes_async() == b"hello world!"

byte_range = await obs.get_range_async(store, "file.txt", offset=0, length=5)
assert byte_range == b"hello"

await obs.copy_async(store, "file.txt", "other.txt")
resp = await obs.get_async(store, "other.txt")
assert await resp.bytes_async() == b"hello world!"

Comparison to object-store-python

Read a detailed comparison to object-store-python, a previous Python library that also wraps the same Rust object_store crate.