🍱 Add linkedin, tweet, fix typo

laminlabs · Apr 3, 2024 · a973cb8 · a973cb8
1 parent bad1cb8
commit a973cb8
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/docs/arrayloader-benchmarks.md b/docs/arrayloader-benchmarks.md
@@ -18,6 +18,8 @@ affiliation:
   falexwolf: Lamin Labs, Munich
 db: https://lamin.ai/laminlabs/arrayloader-benchmarks
 repo: https://github.com/laminlabs/arrayloader-benchmarks
+tweet: https://twitter.com/falexwolf/status/1775476575011553500
+linkedin: https://www.linkedin.com/posts/falexwolf_whats-a-good-way-of-organizing-scrna-seq-activity-7181245277415079937-caSw
 ---
 
 ---
@@ -72,7 +74,7 @@ Here, `MappedCollection` is a [map-style PyTorch data loader](https://lamin.ai/d
 
 ![](https://lamin-site-assets.s3.amazonaws.com/.lamindb/n9cf1yZzUpMNiPmZqo3m.svg)
 
-**Figure 1 ([source](https://lamin.ai/laminlabs/arrayloader-benchmarks/transform/faAhgiIDemaP4BB5))**: We compared NVIDIA Merlin based on a local collection of parquet files, `MappedCollection` based on a local collection of h5ad files, and `cellxgene_census` based on a `tiledbsoma` store in the cloud. Shown is the batch loading time (standard boxplot, **left**), the time per epoch (barplot, **center**), and the number of samples loaded per second (barplot, **right**) with statistics gathered across ~50k batch loading operations during 5 epochs for each method. The raw data consists of 138 `.h5ad` files hosted by CZI and was transformed into parquet files [here](https://lamin.ai/laminlabs/arrayloader-benchmarks/transform/GjHlkZOA4wKp5zKv). For `cellxgene_census`, we use the concatenated version `tiledbsoma` store hosted by CZI and access from within the same AWS data center `us-west-2` for maximal streaming speed ([benchmark](https://lamin.ai/laminlabs/arrayloader-benchmarks/transform/Md9ea0bLFozt65cN)). Outside of `us-west-2`, the speed is _much_ slower. We ran all benchmarks on AWS SageMaker using a `ml.g4dn.2xlarge` EC2 instance. NVIDIA Merlin runs into memory overflow during the benchmark, and we manually triggered the garbage collector.
+**Figure 1 ([source](https://lamin.ai/laminlabs/arrayloader-benchmarks/transform/faAhgiIDemaP4BB5))**: We compared NVIDIA Merlin based on a local collection of parquet files, `MappedCollection` based on a local collection of h5ad files, and `cellxgene_census` based on a `tiledbsoma` store in the cloud. Shown is the batch loading time (standard boxplot, **left**), the time per epoch (barplot, **center**), and the number of samples loaded per second (barplot, **right**) with statistics gathered across ~50k batch loading operations during 5 epochs for each method. The raw data consists of 138 `.h5ad` files hosted by CZI and was transformed into parquet files [here](https://lamin.ai/laminlabs/arrayloader-benchmarks/transform/GjHlkZOA4wKp5zKv). For `cellxgene_census`, we use the concatenated `tiledbsoma` store hosted by CZI and access it from within the same AWS data center `us-west-2` for maximal streaming speed ([benchmark](https://lamin.ai/laminlabs/arrayloader-benchmarks/transform/Md9ea0bLFozt65cN)). Outside of `us-west-2`, the speed is _much_ slower. We ran all benchmarks on AWS SageMaker using a `ml.g4dn.2xlarge` EC2 instance. NVIDIA Merlin runs into memory overflow during the benchmark, and we manually triggered the garbage collector.
 
 ### Sampling batches from large array collections