Skip to content

Releases: trojblue/unibox

v0.4.13: adding ub.label_gallery() tool for data labelling

17 Nov 11:17
Compare
Choose a tag to compare

feat: ub.label_gallery:

view and label images within jupyter notebook:

import unibox as ub

uris = ["https://cdn.donmai.us/180x180/8e/ea/8eea944690c0c0b27e303420cb1e65bd.jpg"] * 9
labels = ['Image 1', 'Image 2', 'Image 3'] * 3

# label data interactively
ub.label_gallery(uris, labels)

# or: view images only
# ub.gallery(uris, labels)

v0.4.12: allow human-readable date in ub.presigns()

30 Sep 19:08
Compare
Choose a tag to compare

feat:

  • allow human-readable date in ub.presigns():
import unibox as ub

uri = "s3://bucket-external/dataset/dataset_qft/moody_qft_danbooru.json"
signed = ub.presigns(uri, expiration="1y")  # format: https://github.com/xolox/python-humanfriendly
signed

v0.4.11: adding s3 presigning tools

30 Sep 18:06
Compare
Choose a tag to compare

feat:

  • ub.presigns(s3_uri): presigning a s3 uri to create an accessible url. useful for working with s3 uris in transformers pipelines:
import unibox as ub
from transformers import pipeline

# More models in the model hub.
model_name = "openai/clip-vit-large-patch14"
classifier = pipeline("zero-shot-image-classification", model = model_name, device="cuda")

# s3 uri to url
image_to_classify = 's3://bucket-external/dataset/dataset_qft/qft_v5c_twitter-logfav_9.6_60k/100006176_p0.webp'
image_url = ub.presigns(image_to_classify)

# get results
labels = ["a girl", "a boy"]
scores = classifier(image_url, candidate_labels = labels)
scores
# [{'score': 0.9802619218826294, 'label': 'a girl'},
# {'score': 0.0197380892932415, 'label': 'a boy'}]

tweak:

  • removing unused methods in s3_client.py

v0.4.10: further ipython import fix

22 Jul 16:52
Compare
Choose a tag to compare

fix:

  • import unibox: will not require ipython.

v0.4.9 IPython import fix

22 Jul 16:38
Compare
Choose a tag to compare

fix:

  • ub.peeks(): handle missing ipython dependency (when using python 3.8, etc) gracefully

v0.4.8 concurrent_loads() order fix

18 Jul 11:49
Compare
Choose a tag to compare

fix:

  • ub.concurrent_loads(): it should now return a list of files in the same order as input list

v0.4.7 New Tools Update

18 Jul 10:46
Compare
Choose a tag to compare
import unibox as ub

feat:

  • ub.concurrent_loads([uris]): allow concurrent loading of multiple uris into a list (see notebook)
uris = ["s3://dataset-pixiv/sagemaker/20240228_half_pixiv_optmize_workflow/0.todo.parquet",
        "s3://dataset-pixiv/sagemaker/20240228_half_pixiv_optmize_workflow/1.todo.parquet",
        "s3://dataset-pixiv/sagemaker/20240228_half_pixiv_optmize_workflow/2.todo.parquet",]

result_df = pd.concat(ub.concurrent_loads(uris))
  • ub.peeks: now supports prettier dataframe printouts:
df = pd.Dataframe(...)
ub.peeks(df)
  • ub.ls(...): new wrapper function that types faster than ub.traverses(...)

  • ub.gallery([image_uris]): allow fast preview of many images within the notebook

fix:

  • ub.loads(): fix windows loading from s3 tempfile naming issue

v0.4.6: UniSaver: Adding graceful handling for NaN when saving to json or jsonl

28 Jun 14:19
Compare
Choose a tag to compare

^title

this should not produce a file containing NaN anymore:

import numpy as np

test_dict = [{
    "id": 1,
    "name": "Alice",
    "age": None,  # Contains a None value
    "height": float('nan'), 
    "weight": np.nan, 
}]

ub.saves(test_dict, "test_invalid_values2.jsonl") # shouldn't contain "NaN" in the output file

v0.4.5: UniLoader: Adding graceful handling for broken jsonl lines

28 Jun 13:57
Compare
Choose a tag to compare

tweak:

  • ub.loads(): will attempt to convert NaN in jsonl lines to null, or skip the current line and continue, if a jsonl is partially corrupted

v0.4.4: FIX: ub.saves() bug

14 Jun 12:52
Compare
Choose a tag to compare

fix:

  • ub.saves(list[str]): fixed a bug that causes list of string to be unable to be saved