XAeroNet (#692)

* adding xaeronet-s model * add validation plots * xaeronet-v model * formatting * update changelog * remove json file * address review comments * multi-scale support, minor fixes
NVIDIA · Nov 5, 2024 · 3f7a8a4 · 3f7a8a4
1 parent 297297e
commit 3f7a8a4
Show file tree

Hide file tree

Showing 23 changed files with 3,128 additions and 8 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -18,6 +18,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Bistride Multiscale MeshGraphNet example.
 - FIGConvUNet model and example.
 - The Transolver model.
+- The XAeroNet model.
 - Incoporated CorrDiff-GEFS-HRRR model into CorrDiff, with lead-time aware SongUNet and
   cross entropy loss.
 

diff --git a/docs/img/xaeronet_s_results.png b/docs/img/xaeronet_s_results.png
diff --git a/docs/img/xaeronet_v_results.png b/docs/img/xaeronet_v_results.png
diff --git a/examples/cfd/xaeronet/README.md b/examples/cfd/xaeronet/README.md
@@ -0,0 +1,164 @@
+# XAeroNet: Scalable Neural Models for External Aerodynamics
+
+XAeroNet is a collection of scalable models for large-scale external
+aerodynamic evaluations. It consists of two models, XAeroNet-S and XAeroNet-V for
+surface and volume predictions, respectively.
+
+## Problem overview
+
+External aerodynamics plays a crucial role in the design and optimization of vehicles,
+aircraft, and other transportation systems. Accurate predictions of aerodynamic
+properties such as drag, pressure distribution, and airflow characteristics are
+essential for improving fuel efficiency, vehicle stability, and performance.
+Traditional approaches, such as computational fluid dynamics (CFD) simulations,
+are computationally expensive and time-consuming, especially when evaluating multiple
+design iterations or large datasets.
+
+XAeroNet addresses these challenges by leveraging neural network-based surrogate
+models to provide fast, scalable, and accurate predictions for both surface-level
+and volume-level aerodynamic properties. By using the DrivAerML dataset, which
+contains high-fidelity CFD data for a variety of vehicle geometries, XAeroNet aims
+to significantly reduce the computational cost while maintaining high prediction
+accuracy. The two models in XAeroNet—XAeroNet-S for surface predictions and XAeroNet-V
+for volume predictions—enable rapid aerodynamic evaluations across different design
+configurations, making it easier to incorporate aerodynamic considerations early in
+the design process.
+
+## Model Overview and Architecture
+
+### XAeroNet-S
+
+XAeroNet-S is a scalable MeshGraphNet model that partitions large input graphs into
+smaller subgraphs to reduce training memory overhead. Halo regions are added to these
+subgraphs to prevent message-passing truncations at the boundaries. Gradient aggregation
+is employed to accumulate gradients from each partition before updating the model parameters.
+This approach ensures that training on partitions is equivalent to training on the entire
+graph in terms of model updates and accuracy. Additionally, XAeroNet-S does not rely on
+simulation meshes for training and inference, overcoming a significant limitation of
+GNN models in simulation tasks.
+
+The input to the training pipeline is STL files, from which the model samples a point cloud
+on the surface. It then constructs a connectivity graph by linking the N nearest neighbors.
+This method also supports multi-mesh setups, where point clouds with different resolutions
+are generated, their connectivity graphs are created, and all are superimposed. The Metis
+library is used to partition the graph for efficient training.
+
+For the XAeroNet-S model, STL files are used to generate point clouds and establish graph
+connectivity. Additionally, the .vtp files are used to interpolate the solution fields onto
+the point clouds.
+
+### XAeroNet-V
+
+XAeroNet-V is a scalable 3D UNet model with attention gates, designed to partition large
+voxel grids into smaller sub-grids to reduce memory overhead during training. Halo regions
+are added to these partitions to avoid convolution truncations at the boundaries.
+Gradient aggregation is used to accumulate gradients from each partition before updating
+the model parameters, ensuring that training on partitions is equivalent to training on
+the entire voxel grid in terms of model updates and accuracy. Additionally, XAeroNet-V
+incorporates a continuity constraint as an additional loss term during training to
+enhance model interpretability.
+
+For the XAeroNet-V model, the .vtu files are used to interpolate the volumetric
+solution fields onto a voxel grid, while the .stl files are utilized to compute
+the signed distance field (SDF) and its derivatives on the voxel grid.
+
+## Dataset
+
+We trained our models using the DrivAerML dataset from the [CAE ML Dataset collection](https://caemldatasets.org/drivaerml/).
+This high-fidelity, open-source (CC-BY-SA) public dataset is specifically designed
+for automotive aerodynamics research. It comprises 500 parametrically morphed variants
+of the widely utilized DrivAer notchback generic vehicle. Mesh generation and scale-resolving
+computational fluid dynamics (CFD) simulations were executed using consistent and validated
+automatic workflows that represent the industrial state-of-the-art. Geometries and comprehensive
+aerodynamic data are published in open-source formats. For more technical details about this
+dataset, please refer to their [paper](https://arxiv.org/pdf/2408.11969).
+
+## Training the XAeroNet-S model
+
+To train the XAeroNet-S model, follow these steps:
+
+1. Download the DrivAer ML dataset using the provided `download_aws_dataset.sh` script.
+
+2. Navigate to the `surface` folder.
+
+3. Specify the configurations in `conf/config.yaml`. Make sure path to the dataset
+   is specified correctly.
+
+4. Run `combine_stl_solids.py`. The STL files in the DriveML dataset consist of multiple
+   solids. Those should be combined into a single solid to properly generate a surface point
+   cloud using the Modulus Tesselated geometry module.
+
+5. Run `preprocessing.py`. This will prepare and save the partitioned graphs.
+
+6. Create a `partitions_validation` folder, and move the samples you wish to use for
+   validation to that folder.
+
+7. Run `compute_stats.py` to compute the global mean and standard deviation from the
+   training samples.
+
+8. Run `train.py` to start the training.
+
+9. Download the validation results (saved in form of point clouds in `.vtp` format),
+   and visualize in Paraview.
+
+![XAeroNet-S Validation results for the sample #500.](../../../docs/img/xaeronet_s_results.png)
+
+## Training the XAeroNet-V model
+
+To train the XAeroNet-V model, follow these steps:
+
+1. Download the DrivAer ML dataset using the provided `download_aws_dataset.sh` script.
+
+2. Navigate to the `volume` folder.
+
+3. Specify the configurations in `conf/config.yaml`. Make sure path to the dataset
+   is specified correctly.
+
+4. Run `preprocessing.py`. This will prepare and save the voxel grids.
+
+5. Create a `drivaer_aws_h5_validation` folder, and move the samples you wish to
+   use for validation to that folder.
+
+6. Run `compute_stats.py` to compute the global mean and standard deviation from
+   the training samples.
+
+7. Run  `train.py` to start the training. Partitioning is performed prior to training.
+
+8. Download the validation results (saved in form of voxel grids in `.vti` format),
+   and visualize in Paraview.
+
+![XAeroNet-V Validation results.](../../../docs/img/xaeronet_v_results.png)
+
+## Logging
+
+We mainly use TensorBoard for logging training and validation losses, as well as
+the learning rate during training. You can also optionally use Weight & Biases to
+log training metrics. To visualize TensorBoard running in a
+Docker container on a remote server from your local desktop, follow these steps:
+
+1. **Expose the Port in Docker:**
+     Expose port 6006 in the Docker container by including
+     `-p 6006:6006` in your docker run command.
+
+2. **Launch TensorBoard:**
+   Start TensorBoard within the Docker container:
+
+     ```bash
+     tensorboard --logdir=/path/to/logdir --port=6006
+     ```
+
+3. **Set Up SSH Tunneling:**
+   Create an SSH tunnel to forward port 6006 from the remote server to your local machine:
+
+     ```bash
+     ssh -L 6006:localhost:6006 <user>@<remote-server-ip>
+     ```
+
+    Replace `<user>` with your SSH username and `<remote-server-ip>` with the IP address
+    of your remote server. You can use a different port if necessary.
+
+4. **Access TensorBoard:**
+   Open your web browser and navigate to `http://localhost:6006` to view TensorBoard.
+
+**Note:** Ensure the remote server’s firewall allows connections on port `6006`
+and that your local machine’s firewall allows outgoing connections.
diff --git a/examples/cfd/xaeronet/cleanup_corrupted_downloads.sh b/examples/cfd/xaeronet/cleanup_corrupted_downloads.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+
+# This is a Bash script designed to identify and remove corrupted files after downloading the AWS DrivAer dataset.
+# The script defines two functions: check_and_remove_corrupted_extension and check_all_runs.
+# The check_and_remove_corrupted_extension function checks for files in a given directory that have extra characters after their extension.
+# If such a file is found, it is considered corrupted, and the function removes it.
+# The check_all_runs function iterates over all directories in a specified local directory (LOCAL_DIR), checking for corrupted files with the extensions ".vtu", ".stl", and ".vtp".
+# The script begins the cleanup process by calling the check_all_runs function. The target directory for this operation is set as "./drivaer_data_full".
+
+# Set the local directory to check the files
+LOCAL_DIR="./drivaer_data_full"  # <--- This is the directory where the files are downloaded.
+
+# Function to check if a file has extra characters after the extension and remove it
+check_and_remove_corrupted_extension() {
+    local dir=$1
+    local base_filename=$2
+    local extension=$3
+
+    # Find any files with extra characters after the extension
+    for file in "$dir/$base_filename"$extension*; do
+        if [[ -f "$file" && "$file" != "$dir/$base_filename$extension" ]]; then
+            echo "Corrupted file detected: $file (extra characters after extension), removing it."
+            rm "$file"
+        fi
+    done
+}
+
+# Function to go over all the run directories and check files
+check_all_runs() {
+    for RUN_DIR in "$LOCAL_DIR"/run_*; do
+        echo "Checking folder: $RUN_DIR"
+
+        # Check for corrupted .vtu files
+        base_vtu="volume_${RUN_DIR##*_}"
+        check_and_remove_corrupted_extension "$RUN_DIR" "$base_vtu" ".vtu"
+
+        # Check for corrupted .stl files
+        base_stl="drivaer_${RUN_DIR##*_}"
+        check_and_remove_corrupted_extension "$RUN_DIR" "$base_stl" ".stl"
+
+        # Check for corrupted .vtp files
+        base_stl="drivaer_${RUN_DIR##*_}"
+        check_and_remove_corrupted_extension "$RUN_DIR" "$base_stl" ".vtp"
+    done
+}
+
+# Start checking
+check_all_runs
diff --git a/examples/cfd/xaeronet/download_aws_dataset.sh b/examples/cfd/xaeronet/download_aws_dataset.sh
@@ -0,0 +1,64 @@
+#!/bin/bash
+
+# This Bash script downloads the AWS DrivAer files from the Amazon S3 bucket to a local directory.
+# Only the volume files (.vtu), STL files (.stl), and VTP files (.vtp) are downloaded.
+# It uses a function, download_run_files, to check for the existence of three specific files (".vtu", ".stl", ".vtp") in a run directory.
+# If a file doesn't exist, it's downloaded from the S3 bucket. If it does exist, the download is skipped.
+# The script runs multiple downloads in parallel, both within a single run and across multiple runs.
+# It also includes checks to prevent overloading the system by limiting the number of parallel downloads.
+
+# Set the local directory to download the files
+LOCAL_DIR="./drivaer_data_full"  # <--- This is the directory where the files will be downloaded.
+
+# Set the S3 bucket and prefix
+S3_BUCKET="caemldatasets"
+S3_PREFIX="drivaer/dataset"
+
+# Create the local directory if it doesn't exist
+mkdir -p "$LOCAL_DIR"
+
+# Function to download files for a specific run
+download_run_files() {
+    local i=$1
+    RUN_DIR="run_$i"
+    RUN_LOCAL_DIR="$LOCAL_DIR/$RUN_DIR"
+
+    # Create the run directory if it doesn't exist
+    mkdir -p "$RUN_LOCAL_DIR"
+
+    # Check if the .vtu file exists before downloading
+    if [ ! -f "$RUN_LOCAL_DIR/volume_$i.vtu" ]; then
+        aws s3 cp --no-sign-request "s3://$S3_BUCKET/$S3_PREFIX/$RUN_DIR/volume_$i.vtu" "$RUN_LOCAL_DIR/" &
+    else
+        echo "File volume_$i.vtu already exists, skipping download."
+    fi
+
+    # Check if the .stl file exists before downloading
+    if [ ! -f "$RUN_LOCAL_DIR/drivaer_$i.stl" ]; then
+        aws s3 cp --no-sign-request "s3://$S3_BUCKET/$S3_PREFIX/$RUN_DIR/drivaer_$i.stl" "$RUN_LOCAL_DIR/" &
+    else
+        echo "File drivaer_$i.stl already exists, skipping download."
+    fi
+
+    # Check if the .vtp file exists before downloading
+    if [ ! -f "$RUN_LOCAL_DIR/boundary_$i.vtp" ]; then
+        aws s3 cp --no-sign-request "s3://$S3_BUCKET/$S3_PREFIX/$RUN_DIR/boundary_$i.vtp" "$RUN_LOCAL_DIR/" &
+    else
+        echo "File boundary_$i.vtp already exists, skipping download."
+    fi
+
+    wait # Ensure that both files for this run are downloaded before moving to the next run
+}
+
+# Loop through the run folders and download the files
+for i in $(seq 1 500); do
+    download_run_files "$i" &
+
+    # Limit the number of parallel jobs to avoid overloading the system
+    if (( $(jobs -r | wc -l) >= 8 )); then
+        wait -n # Wait for the next background job to finish before starting a new one
+    fi
+done
+
+# Wait for all remaining background jobs to finish
+wait
diff --git a/examples/cfd/xaeronet/requirements.txt b/examples/cfd/xaeronet/requirements.txt
@@ -0,0 +1 @@
+trimesh==4.5.0
diff --git a/examples/cfd/xaeronet/surface/combine_stl_solids.py b/examples/cfd/xaeronet/surface/combine_stl_solids.py
@@ -0,0 +1,91 @@
+# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES.
+# SPDX-FileCopyrightText: All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+This module provides functionality to convert STL files with multiple solids
+to another STL file with a single combined solid. It includes support for
+processing multiple files in parallel with progress tracking.
+"""
+
+import os
+import trimesh
+import hydra
+
+from multiprocessing import Pool
+from tqdm import tqdm
+from hydra.utils import to_absolute_path
+from omegaconf import DictConfig
+
+
+def process_stl_file(task):
+    stl_path = task
+
+    # Load the STL file using trimesh
+    mesh = trimesh.load_mesh(stl_path)
+
+    # If the STL file contains multiple solids (as a Scene object)
+    if isinstance(mesh, trimesh.Scene):
+        # Extract all geometries (solids) from the scene
+        meshes = list(mesh.geometry.values())
+
+        # Combine all the solids into a single mesh
+        combined_mesh = trimesh.util.concatenate(meshes)
+    else:
+        # If it's a single solid, no need to combine
+        combined_mesh = mesh
+
+    # Prepare the output file path (next to the original file)
+    base_name, ext = os.path.splitext(stl_path)
+    output_file_path = to_absolute_path(f"{base_name}_single_solid{ext}")
+
+    # Save the new combined mesh as an STL file
+    combined_mesh.export(output_file_path)
+
+    return f"Processed: {stl_path} -> {output_file_path}"
+
+
+def process_directory(data_path, num_workers=16):
+    """Process all STL files in the given directory using multiprocessing with progress tracking."""
+    tasks = []
+    for root, _, files in os.walk(data_path):
+        stl_files = [f for f in files if f.endswith(".stl")]
+        for stl_file in stl_files:
+            stl_path = os.path.join(root, stl_file)
+
+            # Add the STL file to the tasks list (no need for output dir, saving next to the original)
+            tasks.append(stl_path)
+
+    # Use multiprocessing to process the tasks with progress tracking
+    with Pool(num_workers) as pool:
+        for _ in tqdm(
+            pool.imap_unordered(process_stl_file, tasks),
+            total=len(tasks),
+            desc="Processing STL Files",
+            unit="file",
+        ):
+            pass
+
+
+@hydra.main(version_base="1.3", config_path="conf", config_name="config")
+def main(cfg: DictConfig) -> None:
+    # Process the directory with multiple STL files
+    process_directory(
+        to_absolute_path(cfg.data_path), num_workers=cfg.num_preprocess_workers
+    )
+
+
+if __name__ == "__main__":
+    main()