Skip to content

3.26

Latest
Compare
Choose a tag to compare
@alex-aizman alex-aizman released this 08 Feb 01:37
· 5 commits to main since this release

Version 3.26 arrives 4 months after the previous release and contains more than 400 commits.

The core changes in v3.26 include the resolution of the last remaining limitations. A major new feature is the scrub capability, which supports bidirectional diffing to detect remote out-of-band deletions and version changes. The cluster now also reloads updated user credentials at runtime without requiring downtime.

Enhancements to observability are detailed below, and performance optimizations include memory pooling for HTTP requests, global rebalance optimizations, and micro-optimizations across the entire codebase. Key fixes include improved error-handling logic (with an added category of IO errors and improved filesystem health checker operation) and enhanced object metadata caching.

For the detailed changelog, please see link.

Table of Contents


CLI

The CLI in v3.26 features revamped inline help, reorganized command-line options with clearer descriptions, and added usage examples. Fixes include support for multi-object PUT with client-side checksumming and universal prefix support for all multi-object commands.

A notable new feature is the ais scrub command for validating in-cluster content. Additionally, the ais performance command has received several updates, including improved calculation of cluster-wide throughput. Top-level commands and their options have been reorganized for better clarity.

The ais scrub command in v3.26 focuses on detection rather than correction. It detects:

  • Misplaced objects (cluster-wide or within a specific multi-disk target)
  • Objects missing from the remote backend, and vice versa
  • In-cluster objects that no longer exist remotely
  • Objects with insufficient replicas
  • Objects larger or smaller than a specified size

The command generates both summary statistics and detailed reports for each identified issue. However, it does not attempt to fix misplaced or corrupted objects (those with invalid checksums). The ability to correct such issues is planned for v3.27.

For more details, see the full changelog here.


Observability

Version 3.26 includes several important updates. First, all built-in default go_* counters and gauges provided by the Prometheus library have been removed, including metrics for tracking goroutines and garbage collection. Metrics are now updated in real-time, eliminating the previous periodic updates via the prometheus.Collect interface.

Latencies and throughputs are no longer published as internally computed metrics; instead, .ns.total (nanoseconds) and .size (bytes) metrics are used to compute latency and throughput based on time intervals controlled by the monitoring client.

In addition to the total aggregated metrics, separate latency and throughput metrics are now included for each backend.

Metrics resulting from actions on a specific bucket now include the bucket name as a Prometheus variable label.

In-cluster writing generated by xactions (jobs) also now includes xaction labels, including the respective kind and ID, which results in more PUT metrics, including those not generated from user PUT requests.

Finally, all GET, PUT, and DELETE errors include the bucket label, and FSHC-related IO errors now include the mount path (faulty disk) label.

Commit Highlights

  • Commit e6814a2: Added Prometheus variable labels; removed collector.
  • Commit 3b323ff: Polymorphic statsValue, removed switch kind.
  • Commit 9290dc5: Amended re-initializing backends.
  • Commit d2ceca3: Removed default metrics (go_gc_*, go_memstats_*), started counting PUTs generated by xactions.
  • Commit 118a821: Major update (with partial rewrite) - added variable labels.
  • Commit 2d181ab: Tracked and showed jobs run options (prefix, sync, range, etc.)
  • Commit 8690876: API change for xactions to provide initiating control message, added ctlmsg to all supported x-kinds.
  • Commit afef76b: Added CPU utilization tracking and alerts.

For more details, see the full changelog here.


Python SDK

Added the ObjectFileWriter class (extending io.BufferedWriter) for file-like writing operations. This enhancement builds upon the ObjectFile feature introduced in the previous release, providing zero-copy and resilient streaming capabilities. More information can be found in the tech blogs on enhancing ObjectFile performance and resilient streaming.

Additionally, this update includes various fixes and minor improvements, such as memory optimizations for ObjectFile, improved error handling, and enhancements to the API's usability and performance.

Support has also been added for:

  • multi-object transforms
  • OCI backend, and more.

Complete changelog is available here.


Erasure Coding

The v3.26 release introduces significant improvements to Erasure Coding in AIStore, focusing on enhanced performance, better data recovery, improved configuration options, and seamless integration with other features. Key updates include the ability to recover EC data in scenarios where multiple parts are lost, a reduced memory footprint, and improved CPU utilization during EC operations. Additionally, intra-cluster networking has been optimized, with reduced overhead when erasure coding is not in use.


Oracle (OCI) Object Storage

Until recently, AIStore natively supported three cloud storage providers: AWS S3, GCS, and Microsoft Azure Blob Storage. With the v3.26 release, OCI (Oracle Cloud Infrastructure) Object Storage has been added as the fourth supported backend. This enhancement allows AIStore to utilize OCI Object Storage directly, providing improved performance for large object uploads and downloads.

Native support for OCI Object Storage includes tunable optimizations for efficient data transfer between AIStore and OCI's infrastructure. This new addition ensures that AIStore offers the same level of support and value-added functionality for OCI as it does for AWS S3, GCS, and Microsoft Azure Blob Storage.

For more details, see:


Kubernetes Operator

The AIS K8s Operator version 2.0 delivers improvements to lifecycle and rebalance state management, proxy communication, cluster configuration, and the logging sidecar.


ETL

Introducing two ETL transformers for speech data:

  • A production-ready FFmpeg transformer (5x faster than legacy methods) for bulk audio tasks like format, channel, or bitrate adjustments, and
  • A still experimental Audio Split-Consolidate tool enabling scalable split-process-recombine workflows with ongoing edge-case refinements.

In addition,

  • Updated supported ETL Python runtimes: added Python 3.9, 3.11, 3.12, and 3.13 (default).
  • Added new capability that allows trusted (in-cluster) ETL clients to perform GET requests directly on AIS targets.
  • Default to metadata.annotations.communication_type in spec unless overridden.
  • Improved ETL API parameter validation.

Commit Highlights

  • Commit 9eef79b: Inline transform metadata passing as query parameter etl_meta.
  • Commit 4b02ab3: Default to metadata.annotations.communication_type in spec unless overridden.
  • Commit 1792475: Add Bash completion for ETL details command and fix output formatting.
  • Commit e43940a: ETL package maintenance, improved ETL API parameter validation.
  • Commit 35552d3: Revise stats handling for inline ETL.
  • Commit 85cf866: CLI support for ETL delete command.
  • Commit 502a5ee: Support embedded prefix for ais cp, ais prefetch, and ais etl.

For more details, see the full changelog here.