diff --git a/README.md b/README.md index f9a18a5..83c3612 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,21 @@ -# 🦊 Fog-RT-X +# 🦊 Robo-DM -🦊 Fog-RT-X: An Efficient and Scalable Data Collection and Management Framework For Robotics Learning. Support [Open-X-Embodiment](https://robotics-transformer-x.github.io/), 🤗[HuggingFace](https://huggingface.co/). +🦊 Robo-DM : An Efficient and Scalable Data Collection and Management Framework For Robotics Learning. Support [Open-X-Embodiment](https://robotics-transformer-x.github.io/), 🤗[HuggingFace](https://huggingface.co/). -🦊 Fog-RT-X considers both speed 🚀 and memory efficiency 📈 with active metadata and lazily-loaded trajectory data. It supports flexible and distributed dataset partitioning. It provides native support to cloud storage. +🦊 Robo-DM (Former Name: fog_x) considers both speed 🚀 and memory efficiency 📈 with active metadata and lazily-loaded trajectory data. It supports flexible and distributed dataset partitioning. It provides native support to cloud storage. [Design Doc](https://docs.google.com/document/d/1woLQVLWsySGjFuz8aCsaLoc74dXQgIccnWRemjlNDws/edit#heading=h.irrfcedesnvr) | [Dataset Visualization](https://keplerc.github.io/openxvisualizer/) +## Note to ICRA Reviewers +We are actively developing the framework. See commit `a35a6` for the version we developed. + + ## Install ```bash -pip install fog_x +git clone https://github.com/BerkeleyAutomation/fog_x.git +cd fog_x +pip install -e . ``` ## Usage @@ -17,55 +23,30 @@ pip install fog_x ```py import fog_x -# 🦊 Dataset Creation -# from distributed dataset storage -dataset = fog_x.Dataset( - name="demo_ds", - path="~/test_dataset", # can be AWS S3, Google Bucket! -) +path = "/tmp/output.vla" # 🦊 Data collection: # create a new trajectory -episode = dataset.new_episode() -# collect step data for the episode -episode.add(feature = "arm_view", value = "image1.jpg") +traj = fog_x.Trajectory( + path = path +) + +traj.add(feature = "arm_view", value = "image1.jpg") # Automatically time-aligns and saves the trajectory -episode.close() +traj.close() -# 🦊 Data Loading: -# load from existing RT-X/Open-X datasets -dataset.load_rtx_episodes( - name="berkeley_autolab_ur5", - additional_metadata={"collector": "User 2"} +# load it +fog_x.Trajectory( + path = path ) - -# 🦊 Data Management and Analytics: -# Compute and memory efficient filter, map, aggregate, groupby -episode_info = dataset.get_episode_info() -desired_episodes = episode_info.filter(episode_info["collector"] == "User 2") - -# 🦊 Data Sharing and Usage: -# Export and share the dataset as standard Open-X-Embodiment format -# it also supports hugging face, and more! -dataset.export(desired_episodes, format="rtx") -# Load with pytorch dataloader -torch.utils.data.DataLoader(dataset.as_pytorch_dataset(desired_episodes)) ``` -## Design -🦊 Fog-RT-X recognizes most post-processing, analytics and management involves the trajectory-level data, such as tags, while actual trajectory steps are rarely read, written and transformed. Acessing and modifying trajectory data is very expensive and hard. - -As a result, 🦊 Fog-RT-X proposes -* a user-friendly metadata table via Pandas Datframe for speed and freedom -* a LazyFrame from Polars for the trajectory dataset that only loads and transform the data if needed -* parquet as storage format for distributed storage and columnar support compared to tensorflow records -* Easy and automatic RT-X/Open-X dataset export and pytorch dataloading - - -## More Coming Soon! -Currently we see a more than 60\% space saving on some existing RT-X datasets. This can be even more by re-paritioning the dataset. Our next steps can be found in the [planning doc](./design_doc/planning_doc.md). Feedback welcome through issues or PR to planning doc! +## Examples -We also note we are at a beta-testing phase. We make our best effort to be backward-compatible but interfaces may be unstable. +* [Data Collection and Loading](./examples/data_collection_and_load.py) +* [Convert From Open_X](./examples/openx_loader.py) +* [Convert From H5](./examples/h5_loader.py) +* [Running Benchmarks](./benchmarks/openx.py) ## Development