Releases: trojblue/unibox
Releases · trojblue/unibox
v0.4.13: adding ub.label_gallery() tool for data labelling
feat: ub.label_gallery
:
view and label images within jupyter notebook:
import unibox as ub
uris = ["https://cdn.donmai.us/180x180/8e/ea/8eea944690c0c0b27e303420cb1e65bd.jpg"] * 9
labels = ['Image 1', 'Image 2', 'Image 3'] * 3
# label data interactively
ub.label_gallery(uris, labels)
# or: view images only
# ub.gallery(uris, labels)
v0.4.12: allow human-readable date in ub.presigns()
feat:
- allow human-readable date in
ub.presigns()
:
import unibox as ub
uri = "s3://bucket-external/dataset/dataset_qft/moody_qft_danbooru.json"
signed = ub.presigns(uri, expiration="1y") # format: https://github.com/xolox/python-humanfriendly
signed
v0.4.11: adding s3 presigning tools
feat:
ub.presigns(s3_uri)
: presigning a s3 uri to create an accessible url. useful for working with s3 uris in transformers pipelines:
import unibox as ub
from transformers import pipeline
# More models in the model hub.
model_name = "openai/clip-vit-large-patch14"
classifier = pipeline("zero-shot-image-classification", model = model_name, device="cuda")
# s3 uri to url
image_to_classify = 's3://bucket-external/dataset/dataset_qft/qft_v5c_twitter-logfav_9.6_60k/100006176_p0.webp'
image_url = ub.presigns(image_to_classify)
# get results
labels = ["a girl", "a boy"]
scores = classifier(image_url, candidate_labels = labels)
scores
# [{'score': 0.9802619218826294, 'label': 'a girl'},
# {'score': 0.0197380892932415, 'label': 'a boy'}]
tweak:
- removing unused methods in s3_client.py
v0.4.10: further ipython import fix
fix:
import unibox
: will not require ipython.
v0.4.9 IPython import fix
fix:
ub.peeks()
: handle missing ipython dependency (when using python 3.8, etc) gracefully
v0.4.8 concurrent_loads() order fix
fix:
ub.concurrent_loads()
: it should now return a list of files in the same order as input list
v0.4.7 New Tools Update
import unibox as ub
feat:
ub.concurrent_loads([uris])
: allow concurrent loading of multiple uris into a list (see notebook)
uris = ["s3://dataset-pixiv/sagemaker/20240228_half_pixiv_optmize_workflow/0.todo.parquet",
"s3://dataset-pixiv/sagemaker/20240228_half_pixiv_optmize_workflow/1.todo.parquet",
"s3://dataset-pixiv/sagemaker/20240228_half_pixiv_optmize_workflow/2.todo.parquet",]
result_df = pd.concat(ub.concurrent_loads(uris))
ub.peeks
: now supports prettier dataframe printouts:
df = pd.Dataframe(...)
ub.peeks(df)
-
ub.ls(...)
: new wrapper function that types faster thanub.traverses(...)
-
ub.gallery([image_uris])
: allow fast preview of many images within the notebook
fix:
ub.loads()
: fix windows loading from s3 tempfile naming issue
v0.4.6: UniSaver: Adding graceful handling for NaN when saving to json or jsonl
^title
this should not produce a file containing NaN
anymore:
import numpy as np
test_dict = [{
"id": 1,
"name": "Alice",
"age": None, # Contains a None value
"height": float('nan'),
"weight": np.nan,
}]
ub.saves(test_dict, "test_invalid_values2.jsonl") # shouldn't contain "NaN" in the output file
v0.4.5: UniLoader: Adding graceful handling for broken jsonl lines
tweak:
ub.loads()
: will attempt to convertNaN
in jsonl lines tonull
, or skip the current line and continue, if a jsonl is partially corrupted
v0.4.4: FIX: ub.saves() bug
fix:
- ub.saves(list[str]): fixed a bug that causes list of string to be unable to be saved