Skip to content

zezhishao/DailyArXiv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Daily Papers

The project automatically fetches the latest papers from arXiv based on keywords.

The subheadings in the README file represent the search keywords.

Only the most recent articles for each keyword are retained, up to a maximum of 100 papers.

You can click the 'Watch' button to receive daily email notifications.

Last update: 2025-01-31

Time Series

Title Date Abstract Comment
Long-term prediction of El Niño-Southern Oscillation using reservoir computing with data-driven realtime filter 2025-01-29
Show

In recent years, the application of machine learning approaches to time-series forecasting of climate dynamical phenomena has become increasingly active. It is known that applying a band-pass filter to a time-series data is a key to obtaining a high-quality data-driven model. Here, to obtain longer-term predictability of machine learning models, we introduce a new type of band-pass filter. It can be applied to realtime operational prediction workflows since it relies solely on past time series. We combine the filter with reservoir computing, which is a machine-learning technique that employs a data-driven dynamical system. As an application, we predict the multi-year dynamics of the El Ni~no-Southern Oscillation with the prediction horizon of 24 months using only past time series.

21 pages, 7 figures
What is different between these datasets? 2025-01-29
Show

The performance of machine learning models relies heavily on the quality of input data, yet real-world applications often face significant data-related challenges. A common issue arises when curating training data or deploying models: two datasets from the same domain may exhibit differing distributions. While many techniques exist for detecting such distribution shifts, there is a lack of comprehensive methods to explain these differences in a human-understandable way beyond opaque quantitative metrics. To bridge this gap, we propose a versatile toolbox of interpretable methods for comparing datasets. Using a variety of case studies, we demonstrate the effectiveness of our approach across diverse data modalities -- including tabular data, text data, images, time series signals -- in both low and high-dimensional settings. These methods complement existing techniques by providing actionable and interpretable insights to better understand and address distribution shifts.

Fundamentals of non-parametric statistical inference for integrated quantiles 2025-01-29
Show

We present a general non-parametric statistical inference theory for integrals of quantiles without assuming any specific sampling design or dependence structure. Technical considerations are accompanied by examples and discussions, including those pertaining to the bias of empirical estimators. To illustrate how the general results can be adapted to specific situations, we derive - at a stroke and under minimal conditions - consistency and asymptotic normality of the empirical tail-value-at-risk, Lorenz and Gini curves at any probability level in the case of the simple random sampling, thus facilitating a comparison of our results with what is already known in the literature. Notes and references concerning underlying technicalities in the case of dependent (i.e., time series) data are offered. As a by-product, our general results provide new and unified proofs of large-sample properties of a number of classical statistical estimators, such as trimmed means, and give additional insights into the origins of, and the reasons for, various necessary and sufficient conditions.

66 pa...

66 pages, 6 figures, 1 table

Gradient-free training of recurrent neural networks 2025-01-29
Show

Recurrent neural networks are a successful neural architecture for many time-dependent problems, including time series analysis, forecasting, and modeling of dynamical systems. Training such networks with backpropagation through time is a notoriously difficult problem because their loss gradients tend to explode or vanish. In this contribution, we introduce a computational approach to construct all weights and biases of a recurrent neural network without using gradient-based methods. The approach is based on a combination of random feature networks and Koopman operator theory for dynamical systems. The hidden parameters of a single recurrent block are sampled at random, while the outer weights are constructed using extended dynamic mode decomposition. This approach alleviates all problems with backpropagation commonly related to recurrent networks. The connection to Koopman operator theory also allows us to start using results in this area to analyze recurrent neural networks. In computational experiments on time series, forecasting for chaotic dynamical systems, and control problems, as well as on weather data, we observe that the training time and forecasting accuracy of the recurrent neural networks we construct are improved when compared to commonly used gradient-based methods.

Closing the Gap Between Synthetic and Ground Truth Time Series Distributions via Neural Mapping 2025-01-29
Show

In this paper, we introduce Neural Mapper for Vector Quantized Time Series Generator (NM-VQTSG), a novel method aimed at addressing fidelity challenges in vector quantized (VQ) time series generation. VQ-based methods, such as TimeVQVAE, have demonstrated success in generating time series but are hindered by two critical bottlenecks: information loss during compression into discrete latent spaces and deviations in the learned prior distribution from the ground truth distribution. These challenges result in synthetic time series with compromised fidelity and distributional accuracy. To overcome these limitations, NM-VQTSG leverages a U-Net-based neural mapping model to bridge the distributional gap between synthetic and ground truth time series. To be more specific, the model refines synthetic data by addressing artifacts introduced during generation, effectively aligning the distributions of synthetic and real data. Importantly, NM-VQTSG can be used for synthetic time series generated by any VQ-based generative method. We evaluate NM-VQTSG across diverse datasets from the UCR Time Series Classification archive, demonstrating its capability to consistently enhance fidelity in both unconditional and conditional generation tasks. The improvements are evidenced by significant improvements in FID, IS, and conditional FID, additionally backed up by visual inspection in a data space and a latent space. Our findings establish NM-VQTSG as a new method to improve the quality of synthetic time series. Our implementation is available on \url{https://github.com/ML4ITS/TimeVQVAE}.

Large Language Models for Single-Step and Multi-Step Flight Trajectory Prediction 2025-01-29
Show

Flight trajectory prediction is a critical time series task in aviation. While deep learning methods have shown significant promise, the application of large language models (LLMs) to this domain remains underexplored. This study pioneers the use of LLMs for flight trajectory prediction by reframing it as a language modeling problem. Specifically, We extract features representing the aircraft's position and status from ADS-B flight data to construct a prompt-based dataset, where trajectory waypoints are converted into language tokens. The dataset is then employed to fine-tune LLMs, enabling them to learn complex spatiotemporal patterns for accurate predictions. Comprehensive experiments demonstrate that LLMs achieve notable performance improvements in both single-step and multi-step predictions compared to traditional methods, with LLaMA-3.1 model achieving the highest overall accuracy. However, the high inference latency of LLMs poses a challenge for real-time applications, underscoring the need for further research in this promising direction.

9 pages, 7 figures
NF-MKV Net: A Constraint-Preserving Neural Network Approach to Solving Mean-Field Games Equilibrium 2025-01-29
Show

Neural network-based methods for solving Mean-Field Games (MFGs) equilibria have garnered significant attention for their effectiveness in high-dimensional problems. However, many algorithms struggle with ensuring that the evolution of the density distribution adheres to the required mathematical constraints. This paper investigates a neural network approach to solving MFGs equilibria through a stochastic process perspective. It integrates process-regularized Normalizing Flow (NF) frameworks with state-policy-connected time-series neural networks to address McKean-Vlasov-type Forward-Backward Stochastic Differential Equation (MKV FBSDE) fixed-point problems, equivalent to MFGs equilibria.

7 pages
Applying non-negative matrix factorization with covariates to multivariate time series data as a vector autoregression model 2025-01-29
Show

Non-negative matrix factorization (NMF) is a powerful technique for dimensionality reduction, but its application to time series data remains limited. This paper proposes a novel framework that integrates NMF with a vector autoregression (VAR) model to capture both latent structure and temporal dependencies in multivariate time series data. By representing the NMF coefficient matrix as a VAR model, the framework leverages the interpretability of NMF while incorporating the dynamic characteristics of time series data. This approach allows for the extraction of meaningful features and accurate predictions in time series data.

7 figures
A large synthetic dataset for machine learning applications in power transmission grids 2025-01-29
Show

With the ongoing energy transition, power grids are evolving fast. They operate more and more often close to their technical limit, under more and more volatile conditions. Fast, essentially real-time computational approaches to evaluate their operational safety, stability and reliability are therefore highly desirable. Machine Learning methods have been advocated to solve this challenge, however they are heavy consumers of training and testing data, while historical operational data for real-world power grids are hard if not impossible to access. This manuscript presents a large synthetic dataset of power injections in an electric transmission grid model of continental Europe, and describes the algorithm developed for its generation. The method allows one to generate arbitrarily large time series from the knowledge of the grid -- the admittance of its lines as well as the location, type and capacity of its power generators -- and aggregated power consumption data, such as the national load data given by ENTSO-E. The obtained datasets are statistically validated against real-world data.

17 pa...

17 pages, 8 figures, 3 tables. Dataset available at https://zenodo.org/records/13378476

Gaze Prediction as a Function of Eye Movement Type and Individual Differences 2025-01-28
Show

Eye movement prediction is a promising area of research with the potential to improve performance and the user experience of systems based on eye-tracking technology. In this study, we analyze individual differences in gaze prediction performance. We use three fundamentally different models within the analysis: the lightweight Long Short-Term Memory network (LSTM), the transformer-based network for multivariate time series representation learning (TST), and the Oculomotor Plant Mathematical Model wrapped in the Kalman Filter framework (OPKF). Each solution was assessed on different eye-movement types. We show important subject-to-subject variation for all models and eye-movement types. We found that fixation noise is associated with poorer gaze prediction in fixation. For saccades, higher velocities are associated with poorer gaze prediction performance. We think these individual differences are important and propose that future research should report statistics related to inter-subject variation. We also propose that future models should be designed to reduce subject-to-subject variation.

12 pages
A 1-D CNN inference engine for constrained platforms 2025-01-28
Show

1D-CNNs are used for time series classification in various domains with a high degree of accuracy. Most implementations collect the incoming data samples in a buffer before performing inference on it. On edge devices, which are typically constrained and single-threaded, such an implementation may interfere with time-critical tasks. One such task is that of sample acquisition. In this work, we propose an inference scheme that interleaves the convolution operations between sample intervals, which allows us to reduce the inference latency. Furthermore, our scheme is well-suited for storing data in ring buffers, yielding a small memory footprint. We demonstrate these improvements by comparing our approach to TFLite's inference method, giving a 10% reduction in the inference delay while almost halving the memory usage. Our approach is feasible on common consumer devices, which we show using an AVR-based Arduino board and an ARM-based Arduino board.

Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting 2025-01-28
Show

We propose an energy amplification technique to address the issue that existing models easily overlook low-energy components in time series forecasting. This technique comprises an energy amplification block and an energy restoration block. The energy amplification block enhances the energy of low-energy components to improve the model's learning efficiency for these components, while the energy restoration block returns the energy to its original level. Moreover, considering that the energy-amplified data typically displays two distinct energy peaks in the frequency spectrum, we integrate the energy amplification technique with a seasonal-trend forecaster to model the temporal relationships of these two peaks independently, serving as the backbone for our proposed model, Amplifier. Additionally, we propose a semi-channel interaction temporal relationship enhancement block for Amplifier, which enhances the model's ability to capture temporal relationships from the perspective of the commonality and specificity of each channel in the data. Extensive experiments on eight time series forecasting benchmarks consistently demonstrate our model's superiority in both effectiveness and efficiency compared to state-of-the-art methods.

Accep...

Accepted by AAAI 2025

Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting 2025-01-28
Show

Large Language Models (LLMs) have recently demonstrated significant potential in the field of time series forecasting, offering impressive capabilities in handling complex temporal data. However, their robustness and reliability in real-world applications remain under-explored, particularly concerning their susceptibility to adversarial attacks. In this paper, we introduce a targeted adversarial attack framework for LLM-based time series forecasting. By employing both gradient-free and black-box optimization methods, we generate minimal yet highly effective perturbations that significantly degrade the forecasting accuracy across multiple datasets and LLM architectures. Our experiments, which include models like TimeGPT and LLM-Time with GPT-3.5, GPT-4, LLaMa, and Mistral, show that adversarial attacks lead to much more severe performance degradation than random noise, and demonstrate the broad effectiveness of our attacks across different LLMs. The results underscore the critical vulnerabilities of LLMs in time series forecasting, highlighting the need for robust defense mechanisms to ensure their reliable deployment in practical applications.

AISTATS 2025
Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection 2025-01-28
Show

This study evaluates the use of Quantum Convolutional Neural Networks (QCNNs) for identifying signals resembling Gamma-Ray Bursts (GRBs) within simulated astrophysical datasets in the form of light curves. The task addressed here focuses on distinguishing GRB-like signals from background noise in simulated Cherenkov Telescope Array Observatory (CTAO) data, the next-generation astrophysical observatory for very high-energy gamma-ray science. QCNNs, a quantum counterpart of classical Convolutional Neural Networks (CNNs), leverage quantum principles to process and analyze high-dimensional data efficiently. We implemented a hybrid quantum-classical machine learning technique using the Qiskit framework, with the QCNNs trained on a quantum simulator. Several QCNN architectures were tested, employing different encoding methods such as Data Reuploading and Amplitude encoding. Key findings include that QCNNs achieved accuracy comparable to classical CNNs, often surpassing 90%, while using fewer parameters, potentially leading to more efficient models in terms of computational resources. A benchmark study further examined how hyperparameters like the number of qubits and encoding methods affected performance, with more qubits and advanced encoding methods generally enhancing accuracy but increasing complexity. QCNNs showed robust performance on time-series datasets, successfully detecting GRB signals with high precision. The research is a pioneering effort in applying QCNNs to astrophysics, offering insights into their potential and limitations. This work sets the stage for future investigations to fully realize the advantages of QCNNs in astrophysical data analysis.

9 pag...

9 pages, Accepted for publication in 33rd Euromicro/IEEE International Conference on Parallel, Distributed and Network-Based Processing (PDP 2025)

Stack Overflow Meets Replication: Security Research Amid Evolving Code Snippets (Extended Version) 2025-01-28
Show

We study the impact of Stack Overflow code evolution on the stability of prior research findings derived from Stack Overflow data and provide recommendations for future studies. We systematically reviewed papers published between 2005--2023 to identify key aspects of Stack Overflow that can affect study results, such as the language or context of code snippets. Our analysis reveals that certain aspects are non-stationary over time, which could lead to different conclusions if experiments are repeated at different times. We replicated six studies using a more recent dataset to demonstrate this risk. Our findings show that four papers produced significantly different results than the original findings, preventing the same conclusions from being drawn with a newer dataset version. Consequently, we recommend treating Stack Overflow as a time series data source to provide context for interpreting cross-sectional research conclusions.

CENTS: Generating synthetic electricity consumption time series for rare and unseen scenarios 2025-01-28
Show

Recent breakthroughs in large-scale generative modeling have demonstrated the potential of foundation models in domains such as natural language, computer vision, and protein structure prediction. However, their application in the energy and smart grid sector remains limited due to the scarcity and heterogeneity of high-quality data. In this work, we propose a method for creating high-fidelity electricity consumption time series data for rare and unseen context variables (e.g. location, building type, photovoltaics). Our approach, Context Encoding and Normalizing Time Series Generation, or CENTS, includes three key innovations: (i) A context normalization approach that enables inverse transformation for time series context variables unseen during training, (ii) a novel context encoder to condition any state-of-the-art time-series generator on arbitrary numbers and combinations of context variables, (iii) a framework for training this context encoder jointly with a time-series generator using an auxiliary context classification loss designed to increase expressivity of context embeddings and improve model performance. We further provide a comprehensive overview of different evaluation metrics for generative time series models. Our results highlight the efficacy of the proposed method in generating realistic household-level electricity consumption data, paving the way for training larger foundation models in the energy domain on synthetic as well as real-world data.

Data-Driven vs Traditional Approaches to Power Transformer's Top-Oil Temperature Estimation 2025-01-28
Show

Power transformers are subjected to electrical currents and temperature fluctuations that, if not properly controlled, can lead to major deterioration of their insulation system. Therefore, monitoring the temperature of a power transformer is fundamental to ensure a long-term operational life. Models presented in the IEC 60076-7 and IEEE standards, for example, monitor the temperature by calculating the top-oil and the hot-spot temperatures. However, these models are not very accurate and rely on the power transformers' properties. This paper focuses on finding an alternative method to predict the top-oil temperatures given previous measurements. Given the large quantities of data available, machine learning methods for time series forecasting are analyzed and compared to the real measurements and the corresponding prediction of the IEC standard. The methods tested are Artificial Neural Networks (ANNs), Time-series Dense Encoder (TiDE), and Temporal Convolutional Networks (TCN) using different combinations of historical measurements. Each of these methods outperformed the IEC 60076-7 model and they are extended to estimate the temperature rise over ambient. To enhance prediction reliability, we explore the application of quantile regression to construct prediction intervals for the expected top-oil temperature ranges. The best-performing model successfully estimates conditional quantiles that provide sufficient coverage.

Toward Relative Positional Encoding in Spiking Transformers 2025-01-28
Show

Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes, which have great potential in various tasks due to their energy efficiency and temporal processing capabilities. SNNs with self-attention mechanisms (Spiking Transformers) have recently shown great advancements in various tasks such as sequential modeling and image classifications. However, integrating positional information, which is essential for capturing sequential relationships in data, remains a challenge in Spiking Transformers. In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers, leveraging Gray Code as the foundation for our approach. We provide comprehensive proof of the method's effectiveness in partially capturing relative positional information for sequential tasks. Additionally, we extend our RPE approach by adapting it to a two-dimensional form suitable for image patch processing. We evaluate the proposed RPE methods on several tasks, including time series forecasting, text classification, and patch-based image classification. Our experimental results demonstrate that the incorporation of RPE significantly enhances performance by effectively capturing relative positional information.

LLM Assisted Anomaly Detection Service for Site Reliability Engineers: Enhancing Cloud Infrastructure Resilience 2025-01-28
Show

This paper introduces a scalable Anomaly Detection Service with a generalizable API tailored for industrial time-series data, designed to assist Site Reliability Engineers (SREs) in managing cloud infrastructure. The service enables efficient anomaly detection in complex data streams, supporting proactive identification and resolution of issues. Furthermore, it presents an innovative approach to anomaly modeling in cloud infrastructure by utilizing Large Language Models (LLMs) to understand key components, their failure modes, and behaviors. A suite of algorithms for detecting anomalies is offered in univariate and multivariate time series data, including regression-based, mixture-model-based, and semi-supervised approaches. We provide insights into the usage patterns of the service, with over 500 users and 200,000 API calls in a year. The service has been successfully applied in various industrial settings, including IoT-based AI applications. We have also evaluated our system on public anomaly benchmarks to show its effectiveness. By leveraging it, SREs can proactively identify potential issues before they escalate, reducing downtime and improving response times to incidents, ultimately enhancing the overall customer experience. We plan to extend the system to include time series foundation models, enabling zero-shot anomaly detection capabilities.

Accep...

Accepted at the AAAI-2025 Deployable AI Workshop

Explainability and AI Confidence in Clinical Decision Support Systems: Effects on Trust, Diagnostic Performance, and Cognitive Load in Breast Cancer Care 2025-01-28
Show

Artificial Intelligence (AI) has demonstrated potential in healthcare, particularly in enhancing diagnostic accuracy and decision-making through Clinical Decision Support Systems (CDSSs). However, the successful implementation of these systems relies on user trust and reliance, which can be influenced by explainable AI. This study explores the impact of varying explainability levels on clinicians trust, cognitive load, and diagnostic performance in breast cancer detection. Utilizing an interrupted time series design, we conducted a web-based experiment involving 28 healthcare professionals. The results revealed that high confidence scores substantially increased trust but also led to overreliance, reducing diagnostic accuracy. In contrast, low confidence scores decreased trust and agreement while increasing diagnosis duration, reflecting more cautious behavior. Some explainability features influenced cognitive load by increasing stress levels. Additionally, demographic factors such as age, gender, and professional role shaped participants' perceptions and interactions with the system. This study provides valuable insights into how explainability impact clinicians' behavior and decision-making. The findings highlight the importance of designing AI-driven CDSSs that balance transparency, usability, and cognitive demands to foster trust and improve integration into clinical workflows.

Variational Schrödinger Momentum Diffusion 2025-01-28
Show

The momentum Schr"odinger Bridge (mSB) has emerged as a leading method for accelerating generative diffusion processes and reducing transport costs. However, the lack of simulation-free properties inevitably results in high training costs and affects scalability. To obtain a trade-off between transport properties and scalability, we introduce variational Schr"odinger momentum diffusion (VSMD), which employs linearized forward score functions (variational scores) to eliminate the dependence on simulated forward trajectories. Our approach leverages a multivariate diffusion process with adaptively transport-optimized variational scores. Additionally, we apply a critical-damping transform to stabilize training by removing the need for score estimations for both velocity and samples. Theoretically, we prove the convergence of samples generated with optimal variational scores and momentum diffusion. Empirical results demonstrate that VSMD efficiently generates anisotropic shapes while maintaining transport efficacy, outperforming overdamped alternatives, and avoiding complex denoising processes. Our approach also scales effectively to real-world data, achieving competitive results in time series and image generation.

AISTATS 25
Bubble Modeling and Tagging: A Stochastic Nonlinear Autoregression Approach 2025-01-28
Show

Economic and financial time series can feature locally explosive behavior when a bubble is formed. The economic or financial bubble, especially its dynamics, is an intriguing topic that has been attracting longstanding attention. To illustrate the dynamics of the local explosion itself, the paper presents a novel, simple, yet useful time series model, called the stochastic nonlinear autoregressive model, which is always strictly stationary and geometrically ergodic and can create long swings or persistence observed in many macroeconomic variables. When a nonlinear autoregressive coefficient is outside of a certain range, the model has periodically explosive behaviors and can then be used to portray the bubble dynamics. Further, the quasi-maximum likelihood estimation (QMLE) of our model is considered, and its strong consistency and asymptotic normality are established under minimal assumptions on innovation. A new model diagnostic checking statistic is developed for model fitting adequacy. In addition, two methods for bubble tagging are proposed, one from the residual perspective and the other from the null-state perspective. Monte Carlo simulation studies are conducted to assess the performances of the QMLE and the two bubble tagging methods in finite samples. Finally, the usefulness of the model is illustrated by an empirical application to the monthly Hang Seng Index.

41 pages, 6 figures
Applying Ensemble Models based on Graph Neural Network and Reinforcement Learning for Wind Power Forecasting 2025-01-28
Show

Accurately predicting the wind power output of a wind farm across various time scales utilizing Wind Power Forecasting (WPF) is a critical issue in wind power trading and utilization. The WPF problem remains unresolved due to numerous influencing variables, such as wind speed, temperature, latitude, and longitude. Furthermore, achieving high prediction accuracy is crucial for maintaining electric grid stability and ensuring supply security. In this paper, we model all wind turbines within a wind farm as graph nodes in a graph built by their geographical locations. Accordingly, we propose an ensemble model based on graph neural networks and reinforcement learning (EMGRL) for WPF. Our approach includes: (1) applying graph neural networks to capture the time-series data from neighboring wind farms relevant to the target wind farm; (2) establishing a general state embedding that integrates the target wind farm's data with the historical performance of base models on the target wind farm; (3) ensembling and leveraging the advantages of all base models through an actor-critic reinforcement learning framework for WPF.

Tailored Forecasting from Short Time Series via Meta-learning 2025-01-27
Show

Machine learning (ML) models can be effective for forecasting the dynamics of unknown systems from time-series data, but they often require large amounts of data and struggle to generalize across systems with varying dynamics. Combined, these issues make forecasting from short time series particularly challenging. To address this problem, we introduce Meta-learning for Tailored Forecasting from Related Time Series (METAFORS), which uses related systems with longer time-series data to supplement limited data from the system of interest. By leveraging a library of models trained on related systems, METAFORS builds tailored models to forecast system evolution with limited data. Using a reservoir computing implementation and testing on simulated chaotic systems, we demonstrate METAFORS' ability to predict both short-term dynamics and long-term statistics, even when test and related systems exhibit significantly different behaviors and the available data are scarce, highlighting its robustness and versatility in data-limited scenarios.

25 pages, 14 figures
Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models 2025-01-27
Show

Large language models (LLMs) exhibit remarkable capabilities in visual inspection of medical time-series data, achieving proficiency comparable to human clinicians. However, their broad scope limits domain-specific precision, and proprietary weights hinder fine-tuning for specialized datasets. In contrast, small specialized models (SSMs) excel in targeted tasks but lack the contextual reasoning required for complex clinical decision-making. To address these challenges, we propose ConMIL (Conformalized Multiple Instance Learning), a decision-support SSM that integrates seamlessly with LLMs. By using Multiple Instance Learning (MIL) to identify clinically significant signal segments and conformal prediction for calibrated set-valued outputs, ConMIL enhances LLMs' interpretative capabilities for medical time-series analysis. Experimental results demonstrate that ConMIL significantly improves the performance of state-of-the-art LLMs, such as ChatGPT4.0 and Qwen2-VL-7B. Specifically, \ConMIL{}-supported Qwen2-VL-7B achieves 94.92% and 96.82% precision for confident samples in arrhythmia detection and sleep staging, compared to standalone LLM accuracy of 46.13% and 13.16%. These findings highlight the potential of ConMIL to bridge task-specific precision and broader contextual reasoning, enabling more reliable and interpretable AI-driven clinical decision support.

SWIFT: Mapping Sub-series with Wavelet Decomposition Improves Time Series Forecasting 2025-01-27
Show

In recent work on time-series prediction, Transformers and even large language models have garnered significant attention due to their strong capabilities in sequence modeling. However, in practical deployments, time-series prediction often requires operation in resource-constrained environments, such as edge devices, which are unable to handle the computational overhead of large models. To address such scenarios, some lightweight models have been proposed, but they exhibit poor performance on non-stationary sequences. In this paper, we propose $\textit{SWIFT}$, a lightweight model that is not only powerful, but also efficient in deployment and inference for Long-term Time Series Forecasting (LTSF). Our model is based on three key points: (i) Utilizing wavelet transform to perform lossless downsampling of time series. (ii) Achieving cross-band information fusion with a learnable filter. (iii) Using only one shared linear layer or one shallow MLP for sub-series' mapping. We conduct comprehensive experiments, and the results show that $\textit{SWIFT}$ achieves state-of-the-art (SOTA) performance on multiple datasets, offering a promising method for edge computing and deployment in this task. Moreover, it is noteworthy that the number of parameters in $\textit{SWIFT-Linear}$ is only 25% of what it would be with a single-layer linear model for time-domain prediction. Our code is available at https://github.com/LancelotXWX/SWIFT.

Contrastive Representation Learning Helps Cross-institutional Knowledge Transfer: A Study in Pediatric Ventilation Management 2025-01-27
Show

Clinical machine learning deployment across institutions faces significant challenges when patient populations and clinical practices differ substantially. We present a systematic framework for cross-institutional knowledge transfer in clinical time series, demonstrated through pediatric ventilation management between a general pediatric intensive care unit (PICU) and a cardiac-focused unit. Using contrastive predictive coding (CPC) for representation learning, we investigate how different data regimes and fine-tuning strategies affect knowledge transfer across institutional boundaries. Our results show that while direct model transfer performs poorly, CPC with appropriate fine-tuning enables effective knowledge sharing between institutions, with benefits particularly evident in limited data scenarios. Analysis of transfer patterns reveals an important asymmetry: temporal progression patterns transfer more readily than point-of-care decisions, suggesting practical pathways for cross-institutional deployment. Through a systematic evaluation of fine-tuning approaches and transfer patterns, our work provides insights for developing more generalizable clinical decision support systems while enabling smaller specialized units to leverage knowledge from larger centers.

Modeling Latent Non-Linear Dynamical System over Time Series 2025-01-27
Show

We study the problem of modeling a non-linear dynamical system when given a time series by deriving equations directly from the data. Despite the fact that time series data are given as input, models for dynamics and estimation algorithms that incorporate long-term temporal dependencies are largely absent from existing studies. In this paper, we introduce a latent state to allow time-dependent modeling and formulate this problem as a dynamics estimation problem in latent states. We face multiple technical challenges, including (1) modeling latent non-linear dynamics and (2) solving circular dependencies caused by the presence of latent states. To tackle these challenging problems, we propose a new method, Latent Non-Linear equation modeling (LaNoLem), that can model a latent non-linear dynamical system and a novel alternating minimization algorithm for effectively estimating latent states and model parameters. In addition, we introduce criteria to control model complexity without human intervention. Compared with the state-of-the-art model, LaNoLem achieves competitive performance for estimating dynamics while outperforming other methods in prediction.

Accepted by AAAI'25
TimeHF: Billion-Scale Time Series Models Guided by Human Feedback 2025-01-27
Show

Time series neural networks perform exceptionally well in real-world applications but encounter challenges such as limited scalability, poor generalization, and suboptimal zero-shot performance. Inspired by large language models, there is interest in developing large time series models (LTM) to address these issues. However, current methods struggle with training complexity, adapting human feedback, and achieving high predictive accuracy. We introduce TimeHF, a novel pipeline for creating LTMs with 6 billion parameters, incorporating human feedback. We use patch convolutional embedding to capture long time series information and design a human feedback mechanism called time-series policy optimization. Deployed in JD.com's supply chain, TimeHF handles automated replenishment for over 20,000 products, improving prediction accuracy by 33.21% over existing methods. This work advances LTM technology and shows significant industrial benefits.

T-Graphormer: Using Transformers for Spatiotemporal Forecasting 2025-01-27
Show

Multivariate time series data is ubiquitous, and forecasting it has important applications in many domains. However, its complex spatial dependencies and non-linear temporal dynamics can be challenging for traditional techniques. Existing methods tackle these challenges by learning the two dimensions separately. Here, we introduce Temporal Graphormer (T-Graphormer), a Transformer-based approach capable of modelling spatiotemporal correlations simultaneously. By incorporating temporal dynamics in the Graphormer architecture, each node attends to all other nodes within the graph sequence. Our design enables the model to capture rich spatiotemporal patterns with minimal reliance on predefined spacetime inductive biases. We validate the effectiveness of T-Graphormer on real-world traffic prediction benchmark datasets. Compared to state-of-the-art methods, T-Graphormer reduces root mean squared error (RMSE) and mean absolute percentage error (MAPE) by up to 10%.

Deterministic Reservoir Computing for Chaotic Time Series Prediction 2025-01-26
Show

Reservoir Computing was shown in recent years to be useful as efficient to learn networks in the field of time series tasks. Their randomized initialization, a computational benefit, results in drawbacks in theoretical analysis of large random graphs, because of which deterministic variations are an still open field of research. Building upon Next-Gen Reservoir Computing and the Temporal Convolution Derived Reservoir Computing, we propose a deterministic alternative to the higher-dimensional mapping therein, TCRC-LM and TCRC-CM, utilizing the parametrized but deterministic Logistic mapping and Chebyshev maps. To further enhance the predictive capabilities in the task of time series forecasting, we propose the novel utilization of the Lobachevsky function as non-linear activation function. As a result, we observe a new, fully deterministic network being able to outperform TCRCs and classical Reservoir Computing in the form of the prominent Echo State Networks by up to $99.99%$ for the non-chaotic time series and $87.13%$ for the chaotic ones.

Enhancing Glucose Level Prediction of ICU Patients through Hierarchical Modeling of Irregular Time-Series 2025-01-26
Show

Accurately predicting blood glucose (BG) levels of ICU patients is critical, as both hypoglycemia (BG < 70 mg/dL) and hyperglycemia (BG > 180 mg/dL) are associated with increased morbidity and mortality. This study presents a proof-of-concept machine learning framework, the Multi-source Irregular Time-Series Transformer (MITST), designed to predict blood glucose (BG) levels in ICU patients. Unlike existing approaches that rely on manual feature engineering or are limited to a small number of Electronic Health Record (EHR) data sources, MITST demonstrates the feasibility of integrating diverse clinical data (e.g., lab results, medications, vital signs) and handling irregular time-series data without predefined aggregation. MITST employs a hierarchical architecture of Transformers, comprising feature-level, timestamp-level, and source-level components, to capture fine-grained temporal dynamics and enable learning-based data integration. This eliminates the need for traditional aggregation and manual feature engineering. In a large-scale evaluation using the eICU database (200,859 ICU stays across 208 hospitals), MITST achieves an average improvement of 1.7% (p < 0.001) in AUROC and 1.8% (p < 0.001) in AUPRC over a state-of-the-art baseline. For hypoglycemia, MITST achieves an AUROC of 0.915 and an AUPRC of 0.247, both significantly outperforming the baseline. The flexible architecture of MITST allows seamless integration of new data sources without retraining the entire model, enhancing its adaptability for clinical decision support. While this study focuses on predicting BG levels, MITST can easily be extended to other critical event prediction tasks in ICU settings, offering a robust solution for analyzing complex, multi-source, irregular time-series data.

18 pa...

18 pages, 7 figures V2: Updated the title and abstract. Added the Related work section. Added a few notes

Guaranteed Multidimensional Time Series Prediction via Deterministic Tensor Completion Theory 2025-01-26
Show

In recent years, the prediction of multidimensional time series data has become increasingly important due to its wide-ranging applications. Tensor-based prediction methods have gained attention for their ability to preserve the inherent structure of such data. However, existing approaches, such as tensor autoregression and tensor decomposition, often have consistently failed to provide clear assertions regarding the number of samples that can be exactly predicted. While matrix-based methods using nuclear norms address this limitation, their reliance on matrices limits accuracy and increases computational costs when handling multidimensional data. To overcome these challenges, we reformulate multidimensional time series prediction as a deterministic tensor completion problem and propose a novel theoretical framework. Specifically, we develop a deterministic tensor completion theory and introduce the Temporal Convolutional Tensor Nuclear Norm (TCTNN) model. By convolving the multidimensional time series along the temporal dimension and applying the tensor nuclear norm, our approach identifies the maximum forecast horizon for exact predictions. Additionally, TCTNN achieves superior performance in prediction accuracy and computational efficiency compared to existing methods across diverse real-world datasets, including climate temperature, network flow, and traffic ride data. Our implementation is publicly available at https://github.com/HaoShu2000/TCTNN.

A Transfer Learning Framework for Anomaly Detection in Multivariate IoT Traffic Data 2025-01-26
Show

In recent years, rapid technological advancements and expanded Internet access have led to a significant rise in anomalies within network traffic and time-series data. Prompt detection of these irregularities is crucial for ensuring service quality, preventing financial losses, and maintaining robust security standards. While machine learning algorithms have shown promise in achieving high accuracy for anomaly detection, their performance is often constrained by the specific conditions of their training data. A persistent challenge in this domain is the scarcity of labeled data for anomaly detection in time-series datasets. This limitation hampers the training efficacy of both traditional machine learning and advanced deep learning models. To address this, unsupervised transfer learning emerges as a viable solution, leveraging unlabeled data from a source domain to identify anomalies in an unlabeled target domain. However, many existing approaches still depend on a small amount of labeled data from the target domain. To overcome these constraints, we propose a transfer learning-based model for anomaly detection in multivariate time-series datasets. Unlike conventional methods, our approach does not require labeled data in either the source or target domains. Empirical evaluations on novel intrusion detection datasets demonstrate that our model outperforms existing techniques in accurately identifying anomalies within an entirely unlabeled target domain.

6 pages, 3 figures
Stochastic Volatility under Informative Missingness 2025-01-25
Show

Stochastic volatility models that treat the variance of a time series as a stochastic process have proven to be important tools for analyzing dynamic variability. Current methods for fitting and conducting inference on stochastic volatility models are limited by the assumption that any missing data are missing at random. With a recent explosion in technology to facilitate the collection of dynamic self-response data for which mechanisms underlying missing data are inherently scientifically informative, this limitation in statistical methodology also limits scientific advancement. The goal of this article is to develop the first statistical methodology for modeling, fitting, and conducting inference on stochastic volatility with data that are missing not at random. The approach is based upon a novel imputation method derived using Tukey's representation, which utilizes the Markovian nature of stochastic volatility models to overcome unidentifiable components often faced when modeling informative missingness in other settings. This imputation method is combined with a new conditional particle filtering with ancestor sampling procedure that accounts for variability in imputation to formulate a complete particle Gibbs sampling scheme. The use of the method is illustrated through the analysis of mobile phone self-reported mood from individuals being monitored after unsuccessful suicide attempts.

41 to...

41 total pages, 1 cover page, 27 pages main text, 4 pages references, 9 pages appendices, 7 figures, 8 tables

Grid Monitoring with Synchro-Waveform and AI Foundation Model Technologies 2025-01-25
Show

Purpose:This article advocates for the development of a next-generation grid monitoring and control system designed for future grids dominated by inverter-based resources. Leveraging recent progress in generative artificial intelligence (AI), machine learning, and networking technology, we develop a physics-based AI foundation model with high-resolution synchro-waveform measurement technology to enhance grid resilience and reduce economic losses from outages. Methods and Results:The proposed framework adopts the AI Foundation Model paradigm, where a generative and pre-trained (GPT) foundation model extracts physical features from power system measurements, enabling adaptation to a wide range of grid operation tasks. Replacing the large language models used in popular AI foundation models, this approach is based on the Wiener-Kallianpur-Rosenblatt innovation model for power system time series, trained to capture the physical laws of power flows and sinusoidal characteristics of grid measurements. The pre-trained foundation model causally extracts sufficient statistics from grid measurement time series for various downstream applications, including anomaly detection, over-current protection, probabilistic forecasting, and data compression for streaming synchro-waveform data. Numerical simulations using field-collected data demonstrate significantly improved fault detection accuracy and detection speed. Conclusion:The future grid will be rich in inverter-based resources, making it highly dynamic, stochastic, and low inertia. This work underscores the limitations of existing Supervisory-Control-and-Data-Acquisition and Phasor-Measurement-Unit monitoring systems and advocates for AI-enabled monitoring and control with high-resolution synchro-waveform technology to provide accurate situational awareness, rapid response to faults, and robust network protection.

Ister: Inverted Seasonal-Trend Decomposition Transformer for Explainable Multivariate Time Series Forecasting 2025-01-25
Show

In long-term time series forecasting, Transformer-based models have achieved great success, due to its ability to capture long-range dependencies. However, existing models face challenges in identifying critical components for prediction, leading to limited interpretability and suboptimal performance. To address these issues, we propose the Inverted Seasonal-Trend Decomposition Transformer (Ister), a novel Transformer-based model for multivariate time series forecasting. Ister decomposes time series into seasonal and trend components, further modeling multi-periodicity and inter-series dependencies using a Dual Transformer architecture. We introduce a novel Dot-attention mechanism that improves interpretability, computational efficiency, and predictive accuracy. Comprehensive experiments on benchmark datasets demonstrate that Ister outperforms existing state-of-the-art models, achieving up to 10% improvement in MSE. Moreover, Ister enables intuitive visualization of component contributions, shedding lights on model's decision process and enhancing transparency in prediction results.

Determining The Number of Factors in Two-Way Factor Model of High-Dimensional Matrix-Variate Time Series: A White-Noise based Method for Serial Correlation Models 2025-01-25
Show

In this paper, we study a new two-way factor model for high-dimensional matrix-variate time series. To estimate the number of factors in this two-way factor model, we decompose the series into two parts: one being a non-weakly correlated series and the other being a weakly correlated noise. By comparing the difference between two series, we can construct white-noise based signal statistics to determine the number of factors in row loading matrix (column loading matrix). Furthermore, to mitigate the negative impact on the accuracy of the estimation, which is caused by the interaction between the row loading matrix and the column loading matrix, we propose a transformation so that the transformed model only contains the row loading matrix (column loading matrix). We define sequences of ratios of two test statistics as signal statistics to determine the number of factors and derive the consistence of the estimation. We implement the numerical studies to examine the performance of the new methods.

A Review on Self-Supervised Learning for Time Series Anomaly Detection: Recent Advances and Open Challenges 2025-01-25
Show

Time series anomaly detection presents various challenges due to the sequential and dynamic nature of time-dependent data. Traditional unsupervised methods frequently encounter difficulties in generalization, often overfitting to known normal patterns observed during training and struggling to adapt to unseen normality. In response to this limitation, self-supervised techniques for time series have garnered attention as a potential solution to undertake this obstacle and enhance the performance of anomaly detectors. This paper presents a comprehensive review of the recent methods that make use of self-supervised learning for time series anomaly detection. A taxonomy is proposed to categorize these methods based on their primary characteristics, facilitating a clear understanding of their diversity within this field. The information contained in this survey, along with additional details that will be periodically updated, is available on the following GitHub repository: https://github.com/Aitorzan3/Awesome-Self-Supervised-Time-Series-Anomaly-Detection.

FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts 2025-01-25
Show

Long-term time series forecasting is essential in areas like finance and weather prediction. Besides traditional methods that operate in the time domain, many recent models transform time series data into the frequency domain to better capture complex patterns. However, these methods often use filtering techniques to remove certain frequency signals as noise, which may unintentionally discard important information and reduce prediction accuracy. To address this, we propose the Frequency Decomposition Mixture of Experts (FreqMoE) model, which dynamically decomposes time series data into frequency bands, each processed by a specialized expert. A gating mechanism adjusts the importance of each output of expert based on frequency characteristics, and the aggregated results are fed into a prediction module that iteratively refines the forecast using residual connections. Our experiments demonstrate that FreqMoE outperforms state-of-the-art models, achieving the best performance on 51 out of 70 metrics across all tested datasets, while significantly reducing the number of required parameters to under 50k, providing notable efficiency advantages.

Inter...

International Conference on Artificial Intelligence and Statistics 2025

Unifying Prediction and Explanation in Time-Series Transformers via Shapley-based Pretraining 2025-01-25
Show

In this paper, we propose ShapTST, a framework that enables time-series transformers to efficiently generate Shapley-value-based explanations alongside predictions in a single forward pass. Shapley values are widely used to evaluate the contribution of different time-steps and features in a test sample, and are commonly generated through repeatedly inferring on each sample with different parts of information removed. Therefore, it requires expensive inference-time computations that occur at every request for model explanations. In contrast, our framework unifies the explanation and prediction in training through a novel Shapley-based pre-training design, which eliminates the undesirable test-time computation and replaces it with a single-time pre-training. Moreover, this specialized pre-training benefits the prediction performance by making the transformer model more effectively weigh different features and time-steps in the time-series, particularly improving the robustness against data noise that is common to raw time-series data. We experimentally validated our approach on eight public datasets, where our time-series model achieved competitive results in both classification and regression tasks, while providing Shapley-based explanations similar to those obtained with post-hoc computation. Our work offers an efficient and explainable solution for time-series analysis tasks in the safety-critical applications.

6 pag...

6 pages, Accepted to 21st IEEE CSPA 2025

Exploring the impact of Optimised Hyperparameters on Bi-LSTM-based Contextual Anomaly Detector 2025-01-25
Show

The exponential growth in the usage of Internet of Things in daily life has caused immense increase in the generation of time series data. Smart homes is one such domain where bulk of data is being generated and anomaly detection is one of the many challenges addressed by researchers in recent years. Contextual anomaly is a kind of anomaly that may show deviation from the normal pattern like point or sequence anomalies, but it also requires prior knowledge about the data domain and the actions that caused the deviation. Recent studies based on Recurrent Neural Networks (RNN) have demonstrated strong performance in anomaly detection. This study explores the impact of automatically tuned hyperparamteres on Unsupervised Online Contextual Anomaly Detection (UoCAD) approach by proposing UoCAD with Optimised Hyperparamnters (UoCAD-OH). UoCAD-OH conducts hyperparameter optimisation on Bi-LSTM model in an offline phase and uses the fine-tuned hyperparameters to detect anomalies during the online phase. The experiments involve evaluating the proposed framework on two smart home air quality datasets containing contextual anomalies. The evaluation metrics used are Precision, Recall, and F1 score.

6 pages, 1 figure
Harnessing Contrastive Learning and Neural Transformation for Time Series Anomaly Detection 2025-01-25
Show

Time series anomaly detection (TSAD) plays a vital role in many industrial applications. While contrastive learning has gained momentum in the time series domain for its prowess in extracting meaningful representations from unlabeled data, its straightforward application to anomaly detection is not without hurdles. Firstly, contrastive learning typically requires negative sampling to avoid the representation collapse issue, where the encoder converges to a constant solution. However, drawing from the same dataset for dissimilar samples is ill-suited for TSAD as most samples are ``normal'' in the training dataset. Secondly, conventional contrastive learning focuses on instance discrimination, which may overlook anomalies that are detectable when compared to their temporal context. In this study, we propose a novel approach, CNT, that incorporates a window-based contrastive learning strategy fortified with learnable transformations. This dual configuration focuses on capturing temporal anomalies in local regions while simultaneously mitigating the representation collapse issue. Our theoretical analysis validates the effectiveness of CNT in circumventing constant encoder solutions. Through extensive experiments on diverse real-world industrial datasets, we show the superiority of our framework by outperforming various baselines and model variants.

Accep...

Accepted by 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)

Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series Data 2025-01-24
Show

Irregular sampling intervals and missing values in real-world time series data present challenges for conventional methods that assume consistent intervals and complete data. Neural Ordinary Differential Equations (Neural ODEs) offer an alternative approach, utilizing neural networks combined with ODE solvers to learn continuous latent representations through parameterized vector fields. Neural Stochastic Differential Equations (Neural SDEs) extend Neural ODEs by incorporating a diffusion term, although this addition is not trivial, particularly when addressing irregular intervals and missing values. Consequently, careful design of drift and diffusion functions is crucial for maintaining stability and enhancing performance, while incautious choices can result in adverse properties such as the absence of strong solutions, stochastic destabilization, or unstable Euler discretizations, significantly affecting Neural SDEs' performance. In this study, we propose three stable classes of Neural SDEs: Langevin-type SDE, Linear Noise SDE, and Geometric SDE. Then, we rigorously demonstrate their robustness in maintaining excellent performance under distribution shift, while effectively preventing overfitting. To assess the effectiveness of our approach, we conduct extensive experiments on four benchmark datasets for interpolation, forecasting, and classification tasks, and analyze the robustness of our methods with 30 public datasets under different missing rates. Our results demonstrate the efficacy of the proposed method in handling real-world irregular time series data.

Publi...

Published at the Twelfth International Conference on Learning Representations (ICLR 2024), Spotlight presentation (Notable Top 5%). https://openreview.net/forum?id=4VIgNuQ1pY

DualDynamics: Synergizing Implicit and Explicit Methods for Robust Irregular Time Series Analysis 2025-01-24
Show

Real-world time series analysis faces significant challenges when dealing with irregular and incomplete data. While Neural Differential Equation (NDE) based methods have shown promise, they struggle with limited expressiveness, scalability issues, and stability concerns. Conversely, Neural Flows offer stability but falter with irregular data. We introduce 'DualDynamics', a novel framework that synergistically combines NDE-based method and Neural Flow-based method. This approach enhances expressive power while balancing computational demands, addressing critical limitations of existing techniques. We demonstrate DualDynamics' effectiveness across diverse tasks: classification of robustness to dataset shift, irregularly-sampled series analysis, interpolation of missing data, and forecasting with partial observations. Our results show consistent outperformance over state-of-the-art methods, indicating DualDynamics' potential to advance irregular time series analysis significantly.

Publi...

Published at the 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025)

Hierarchical Count Echo State Network Models with Application to Graduate Student Enrollments 2025-01-24
Show

Poisson autoregressive count models have evolved into a time series staple for correlated count data. This paper proposes an alternative to Poisson autoregressions: count echo state networks. Echo state networks can be statistically analyzed in frequentist manners via optimizing penalized likelihoods, or in Bayesian manners via MCMC sampling. This paper develops Poisson echo state techniques for count data and applies them to a massive count data set containing the number of graduate students from 1,758 United States universities during the years 1972-2021 inclusive. Negative binomial models are also implemented to better handle overdispersion in the counts. Performance of the proposed models are compared via their forecasting performance as judged by several methods. In the end, a hierarchical negative binomial based echo state network is judged as the superior model.

Time-MMD: Multi-Domain Multimodal Dataset for Time Series Analysis 2025-01-24
Show

Time series data are ubiquitous across a wide range of real-world domains. While real-world time series analysis (TSA) requires human experts to integrate numerical series data with multimodal domain-specific knowledge, most existing TSA models rely solely on numerical data, overlooking the significance of information beyond numerical series. This oversight is due to the untapped potential of textual series data and the absence of a comprehensive, high-quality multimodal dataset. To overcome this obstacle, we introduce Time-MMD, the first multi-domain, multimodal time series dataset covering 9 primary data domains. Time-MMD ensures fine-grained modality alignment, eliminates data contamination, and provides high usability. Additionally, we develop MM-TSFlib, the first-cut multimodal time-series forecasting (TSF) library, seamlessly pipelining multimodal TSF evaluations based on Time-MMD for in-depth analyses. Extensive experiments conducted on Time-MMD through MM-TSFlib demonstrate significant performance enhancements by extending unimodal TSF to multimodality, evidenced by over 15% mean squared error reduction in general, and up to 40% in domains with rich textual data. More importantly, our datasets and library revolutionize broader applications, impacts, research topics to advance TSA. The dataset is available at https://github.com/AdityaLab/Time-MMD.

Accep...

Accepted by NeurIPS 2024 Datasets and Benchmarks Track

CAMEO: Autocorrelation-Preserving Line Simplification for Lossy Time Series Compression 2025-01-24
Show

Time series data from a variety of sensors and IoT devices need effective compression to reduce storage and I/O bandwidth requirements. While most time series databases and systems rely on lossless compression, lossy techniques offer even greater space-saving with a small loss in precision. However, the unknown impact on downstream analytics applications requires a semi-manual trial-and-error exploration. We initiate work on lossy compression that provides guarantees on complex statistical features (which are strongly correlated with the accuracy of the downstream analytics). Specifically, we propose a new lossy compression method that provides guarantees on the autocorrelation and partial-autocorrelation functions (ACF/PACF) of a time series. Our method leverages line simplification techniques as well as incremental maintenance of aggregates, blocking, and parallelization strategies for effective and efficient compression. The results show that our method improves compression ratios by 2x on average and up to 54x on selected datasets, compared to previous lossy and lossless compression methods. Moreover, we maintain -- and sometimes even improve -- the forecasting accuracy by preserving the autocorrelation properties of the time series. Our framework is extensible to multivariate time series and other statistical features of the time series.

14 pages, 13 figures
A Survey on Diffusion Models for Anomaly Detection 2025-01-24
Show

Diffusion models (DMs) have emerged as a powerful class of generative AI models, showing remarkable potential in anomaly detection (AD) tasks across various domains, such as cybersecurity, fraud detection, healthcare, and manufacturing. The intersection of these two fields, termed diffusion models for anomaly detection (DMAD), offers promising solutions for identifying deviations in increasingly complex and high-dimensional data. In this survey, we review recent advances in DMAD research. We begin by presenting the fundamental concepts of AD and DMs, followed by a comprehensive analysis of classic DM architectures including DDPMs, DDIMs, and Score SDEs. We further categorize existing DMAD methods into reconstruction-based, density-based, and hybrid approaches, providing detailed examinations of their methodological innovations. We also explore the diverse tasks across different data modalities, encompassing image, time series, video, and multimodal data analysis. Furthermore, we discuss critical challenges and emerging research directions, including computational efficiency, model interpretability, robustness enhancement, edge-cloud collaboration, and integration with large language models. The collection of DMAD research papers and resources is available at https://github.com/fdjingliu/DMAD.

Detecting Cointegrating Relations in Non-stationary Matrix-Valued Time Series 2025-01-24
Show

This paper proposes a Matrix Error Correction Model to identify cointegration relations in matrix-valued time series. We hereby allow separate cointegrating relations along the rows and columns of the matrix-valued time series and use information criteria to select the cointegration ranks. Through Monte Carlo simulations and a macroeconomic application, we demonstrate that our approach provides a reliable estimation of the number of cointegrating relationships.

10 pages, 2 figures
VarDrop: Enhancing Training Efficiency by Reducing Variate Redundancy in Periodic Time Series Forecasting 2025-01-24
Show

Variate tokenization, which independently embeds each variate as separate tokens, has achieved remarkable improvements in multivariate time series forecasting. However, employing self-attention with variate tokens incurs a quadratic computational cost with respect to the number of variates, thus limiting its training efficiency for large-scale applications. To address this issue, we propose VarDrop, a simple yet efficient strategy that reduces the token usage by omitting redundant variate tokens during training. VarDrop adaptively excludes redundant tokens within a given batch, thereby reducing the number of tokens used for dot-product attention while preserving essential information. Specifically, we introduce k-dominant frequency hashing (k-DFH), which utilizes the ranked dominant frequencies in the frequency domain as a hash value to efficiently group variate tokens exhibiting similar periodic behaviors. Then, only representative tokens in each group are sampled through stratified sampling. By performing sparse attention with these selected tokens, the computational cost of scaled dot-product attention is significantly alleviated. Experiments conducted on public benchmark datasets demonstrate that VarDrop outperforms existing efficient baselines.

Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models 2025-01-24
Show

Observability in cloud infrastructure is critical for service providers, driving the widespread adoption of anomaly detection systems for monitoring metrics. However, existing systems often struggle to simultaneously achieve explainability, reproducibility, and autonomy, which are three indispensable properties for production use. We introduce Argos, an agentic system for detecting time-series anomalies in cloud infrastructure by leveraging large language models (LLMs). Argos proposes to use explainable and reproducible anomaly rules as intermediate representation and employs LLMs to autonomously generate such rules. The system will efficiently train error-free and accuracy-guaranteed anomaly rules through multiple collaborative agents and deploy the trained rules for low-cost online anomaly detection. Through evaluation results, we demonstrate that Argos outperforms state-of-the-art methods, increasing $F_1$ scores by up to $9.5%$ and $28.3%$ on public anomaly detection datasets and an internal dataset collected from Microsoft, respectively.

Look Into the LITE in Deep Learning for Time Series Classification 2025-01-23
Show

Deep learning models have been shown to be a powerful solution for Time Series Classification (TSC). State-of-the-art architectures, while producing promising results on the UCR and the UEA archives , present a high number of trainable parameters. This can lead to long training with high CO2 emission, power consumption and possible increase in the number of FLoating-point Operation Per Second (FLOPS). In this paper, we present a new architecture for TSC, the Light Inception with boosTing tEchnique (LITE) with only 2.34% of the number of parameters of the state-of-the-art InceptionTime model, while preserving performance. This architecture, with only 9, 814 trainable parameters due to the usage of DepthWise Separable Convolutions (DWSC), is boosted by three techniques: multiplexing, custom filters, and dilated convolution. The LITE architecture, trained on the UCR, is 2.78 times faster than InceptionTime and consumes 2.79 times less CO2 and power. To evaluate the performance of the proposed architecture on multivariate time series data, we adapt LITE to handle multivariate time series, we call this version LITEMV. To bring theory into application, we also conducted experiments using LITEMV on multivariate time series representing human rehabilitation movements, showing that LITEMV not only is the most efficient model but also the best performing for this application on the Kimore dataset, a skeleton based human rehabilitation exercises dataset. Moreover, to address the interpretability of LITEMV, we present a study using Class Activation Maps to understand the classification decision taken by the model during evaluation.

One Transformer for All Time Series: Representing and Training with Time-Dependent Heterogeneous Tabular Data 2025-01-23
Show

There is a recent growing interest in applying Deep Learning techniques to tabular data, in order to replicate the success of other Artificial Intelligence areas in this structured domain. Specifically interesting is the case in which tabular data have a time dependence, such as, for instance financial transactions. However, the heterogeneity of the tabular values, in which categorical elements are mixed with numerical items, makes this adaptation difficult. In this paper we propose a Transformer architecture to represent heterogeneous time-dependent tabular data, in which numerical features are represented using a set of frequency functions and the whole network is uniformly trained with a unique loss function.

29 pa...

29 pages, 2 figures, 16 tables

Reconstructing ecological community dynamics from limited observations 2025-01-23
Show

Ecosystems tend to fluctuate around stable equilibria in response to internal dynamics and environmental factors. Occasionally, they enter an unstable tipping region and collapse into an alternative stable state. Our understanding of how ecological communities vary over time and respond to perturbations depends on our ability to quantify and predict these dynamics. However, the scarcity of long, dense time series data poses a severe bottleneck for characterising community dynamics using existing methods. We overcome this limitation by combining information across multiple short time series using Bayesian inference. By decomposing dynamics into deterministic and stochastic components using Gaussian process priors, we predict stable and tipping regions along the community landscape and quantify resilience while addressing uncertainty. After validation with simulated and real ecological time series, we use the model to question common assumptions underlying classical potential analysis and re-evaluate the stability of previously proposed "tipping elements" in the human gut microbiota.

Information-theoretic limits and approximate message-passing for high-dimensional time series 2025-01-23
Show

High-dimensional time series appear in many scientific setups, demanding a nuanced approach to model and analyze the underlying dependence structure. However, theoretical advancements so far often rely on stringent assumptions regarding the sparsity of the underlying signals. In this contribution, we expand the scope by investigating a high-dimensional time series model wherein the number of features grows proportionally to the number of sampling points, without assuming sparsity in the signal. Specifically, we consider the stochastic regression model and derive a single-letter formula for the normalized mutual information between observations and the signal. We also empirically study the vector approximate message passing (VAMP) algorithm and show that, despite a lack of theoretical guarantees, its performance for inference in our time series model is robust and often statistically optimal.

GCAD: Anomaly Detection in Multivariate Time Series from the Perspective of Granger Causality 2025-01-23
Show

Multivariate time series anomaly detection has numerous real-world applications and is being extensively studied. Modeling pairwise correlations between variables is crucial. Existing methods employ learnable graph structures and graph neural networks to explicitly model the spatial dependencies between variables. However, these methods are primarily based on prediction or reconstruction tasks, which can only learn similarity relationships between sequence embeddings and lack interpretability in how graph structures affect time series evolution. In this paper, we designed a framework that models spatial dependencies using interpretable causal relationships and detects anomalies through changes in causal patterns. Specifically, we propose a method to dynamically discover Granger causality using gradients in nonlinear deep predictors and employ a simple sparsification strategy to obtain a Granger causality graph, detecting anomalies from a causal perspective. Experiments on real-world datasets demonstrate that the proposed model achieves more accurate anomaly detection compared to baseline methods.

Accep...

Accepted to AAAI 2025

FreEformer: Frequency Enhanced Transformer for Multivariate Time Series Forecasting 2025-01-23
Show

This paper presents \textbf{FreEformer}, a simple yet effective model that leverages a \textbf{Fre}quency \textbf{E}nhanced Trans\textbf{former} for multivariate time series forecasting. Our work is based on the assumption that the frequency spectrum provides a global perspective on the composition of series across various frequencies and is highly suitable for robust representation learning. Specifically, we first convert time series into the complex frequency domain using the Discrete Fourier Transform (DFT). The Transformer architecture is then applied to the frequency spectra to capture cross-variate dependencies, with the real and imaginary parts processed independently. However, we observe that the vanilla attention matrix exhibits a low-rank characteristic, thus limiting representation diversity. This could be attributed to the inherent sparsity of the frequency domain and the strong-value-focused nature of Softmax in vanilla attention. To address this, we enhance the vanilla attention mechanism by introducing an additional learnable matrix to the original attention matrix, followed by row-wise L1 normalization. Theoretical analysis~demonstrates that this enhanced attention mechanism improves both feature diversity and gradient flow. Extensive experiments demonstrate that FreEformer consistently outperforms state-of-the-art models on eighteen real-world benchmarks covering electricity, traffic, weather, healthcare and finance. Notably, the enhanced attention mechanism also consistently improves the performance of state-of-the-art Transformer-based forecasters.

S4Sleep: Elucidating the design space of deep-learning-based sleep stage classification models 2025-01-23
Show

Scoring sleep stages in polysomnography recordings is a time-consuming task plagued by significant inter-rater variability. Therefore, it stands to benefit from the application of machine learning algorithms. While many algorithms have been proposed for this purpose, certain critical architectural decisions have not received systematic exploration. In this study, we meticulously investigate these design choices within the broad category of encoder-predictor architectures. We identify robust architectures applicable to both time series and spectrogram input representations. These architectures incorporate structured state space models as integral components and achieve statistically significant performance improvements compared to state-of-the-art approaches on the extensive Sleep Heart Health Study dataset. We anticipate that the architectural insights gained from this study along with the refined methodology for architecture search demonstrated herein will not only prove valuable for future research in sleep staging but also hold relevance for other time series annotation tasks.

versi...

version accepted by Computers in Biology and Medicine, code available at https://github.com/AI4HealthUOL/s4sleep

RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data 2025-01-23
Show

We present RelCon, a novel self-supervised Relative Contrastive learning approach that uses a learnable distance measure in combination with a softened contrastive loss for training an motion foundation model from wearable sensors. The learnable distance measure captures motif similarity and domain-specific semantic information such as rotation invariance. The learned distance provides a measurement of semantic similarity between a pair of accelerometer time-series segments, which is used to measure the distance between an anchor and various other sampled candidate segments. The self-supervised model is trained on 1 billion segments from 87,376 participants from a large wearables dataset. The model achieves strong performance across multiple downstream tasks, encompassing both classification and regression. To our knowledge, we are the first to show the generalizability of a self-supervised learning model with motion data from wearables across distinct evaluation tasks.

Accep...

Accepted to ICLR 2025

Numerically Robust Fixed-Point Smoothing Without State Augmentation 2025-01-23
Show

Practical implementations of Gaussian smoothing algorithms have received a great deal of attention in the last 60 years. However, almost all work focuses on estimating complete time series (''fixed-interval smoothing'', $\mathcal{O}(K)$ memory) through variations of the Rauch--Tung--Striebel smoother, rarely on estimating the initial states (''fixed-point smoothing'', $\mathcal{O}(1)$ memory). Since fixed-point smoothing is a crucial component of algorithms for dynamical systems with unknown initial conditions, we close this gap by introducing a new formulation of a Gaussian fixed-point smoother. In contrast to prior approaches, our perspective admits a numerically robust Cholesky-based form (without downdates) and avoids state augmentation, which would needlessly inflate the state-space model and reduce the numerical practicality of any fixed-point smoother code. The experiments demonstrate how a JAX implementation of our algorithm matches the runtime of the fastest methods and the robustness of the most robust techniques while existing implementations must always sacrifice one for the other.

Time Series Embedding Methods for Classification Tasks: A Review 2025-01-23
Show

Time series analysis has become crucial in various fields, from engineering and finance to healthcare and social sciences. In this paper, we present a comprehensive review and evaluation of time series embedding methods for effective representations in machine learning and deep learning models. We introduce a taxonomy of embedding techniques, categorizing them based on their theoretical foundations and application contexts. Unlike previous surveys, our work provides a quantitative evaluation of representative methods from each category by assessing their performance on downstream classification tasks across diverse real-world datasets. Our experimental results demonstrate that the performance of embedding methods varies significantly depending on the dataset and classification algorithm used, highlighting the importance of careful model selection and extensive experimentation for specific applications, including engineering systems. To facilitate further research and practical applications, we provide an open-source code repository implementing these embedding methods. This study contributes to the field by offering a systematic comparison of time series embedding techniques, guiding practitioners in selecting appropriate methods for their specific applications, and providing a foundation for future advancements in time series analysis.

Synergizing Large Language Models and Task-specific Models for Time Series Anomaly Detection 2025-01-23
Show

In anomaly detection, methods based on large language models (LLMs) can incorporate expert knowledge by reading professional document, while task-specific small models excel at extracting normal data patterns and detecting value fluctuations from training data of target applications. Inspired by the human nervous system, where the brain stores expert knowledge and the peripheral nervous system and spinal cord handle specific tasks like withdrawal and knee-jerk reflexes, we propose CoLLaTe, a framework designed to facilitate collaboration between LLMs and task-specific models, leveraging the strengths of both models for anomaly detection. In particular, we first formulate the collaboration process and identify two key challenges in the collaboration: (1) the misalignment between the expression domains of the LLMs and task-specific small models, and (2) error accumulation arising from the predictions of both models. To address these challenges, we then introduce two key components in CoLLaTe: a model alignment module and a collaborative loss function. Through theoretical analysis and experimental validation, we demonstrate that these components effectively mitigate the identified challenges and achieve better performance than both LLM-based and task-specific models.

The Dynamic Triple Gamma Prior as a Shrinkage Process Prior for Time-Varying Parameter Models 2025-01-22
Show

Many existing shrinkage approaches for time-varying parameter (TVP) models assume constant innovation variances across time points, inducing sparsity by shrinking these variances toward zero. However, this assumption falls short when states exhibit large jumps or structural changes, as often seen in empirical time series analysis. To address this, we propose the dynamic triple gamma prior -- a stochastic process that induces time-dependent shrinkage by modeling dependence among innovations while retaining a well-known triple gamma marginal distribution. This framework encompasses various special and limiting cases, including the horseshoe shrinkage prior, making it highly flexible. We derive key properties of the dynamic triple gamma that highlight its dynamic shrinkage behavior and develop an efficient Markov chain Monte Carlo algorithm for posterior sampling. The proposed approach is evaluated through sparse covariance modeling and forecasting of the returns of the EURO STOXX 50 index, demonstrating favorable forecasting performance.

Multiple change point detection based on Hodrick-Prescott and $l_1$ filtering method for random walk time series data 2025-01-22
Show

We propose new methods for detecting multiple change points in time series, specifically designed for random walk processes, where stationarity and variance changes present challenges. Our approach combines two trend estimation methods: the Hodrick Prescott (HP) filter and the l1 filter. A major challenge in these methods is selecting the tuning parameter lambda, which we address by introducing two selection techniques. For the HP based change point detection, we propose a probability-based threshold to select lambda under the assumption of an exponential distribution. For the l1 based method, we suggest a selection strategy assuming normality. Additionally, we introduce a technique to estimate the maximum number of change points in time segments using the l1 based method. We validate our methods by comparing them to similar techniques, such as PELT, using simulated data. We also demonstrate the practical application of our approach to real-world SNP stock data, showcasing its effectiveness in detecting change points.

Real-Time Multi-Modal Subcomponent-Level Measurements for Trustworthy System Monitoring and Malware Detection 2025-01-22
Show

With increasingly sophisticated cyber-adversaries able to access a wider repertoire of mechanisms to implant malware such as ransomware, CPU/GPU keyloggers, and stealthy kernel rootkits, there is an urgent need for techniques to detect and mitigate such attacks. While state of the art relies on digital and analog side channel measurements assuming trustworthiness of measurements obtained on the main processor, such an approach has limitations since processor-based side channel measurements are potentially untrustworthy. Sophisticated adversaries (especially in late stage cyber attacks when they have breached the computer and network security systems such as firewalls and antivirus and penetrated the computer's OS) can compromise user-space and kernel-space measurements. To address this key limitation of state of the art, we propose a "subcomponent-level" approach to collect side channel measurements so as to enable robust anomaly detection in a modern computer even when the main processor is compromised. Our proposed approach leverages the fact that modern computers are complex systems with multiple interacting subcomponents and measurements from subcomponents can be used to detect anomalies even when the main processor is no longer trustworthy. We develop mechanisms to obtain time series measurements of activity of several subcomponents and methodologies to process and fuse these measurements for anomaly detection. The subcomponents include network interface controller, GPU, CPU Hardware Performance Counters, CPU power, and keyboard. Our main hypothesis is that subcomponent measurements can enable detection of security threats without requiring a trustworthy main processor. By enabling real-time measurements from multiple subcomponents, the goal is to provide a deeper visibility into system operation, thereby yielding a powerful tool to track system operation and detect anomalies.

12 pages, 29 figures
Developing Cryptocurrency Trading Strategy Based on Autoencoder-CNN-GANs Algorithms 2025-01-22
Show

This paper leverages machine learning algorithms to forecast and analyze financial time series. The process begins with a denoising autoencoder to filter out random noise fluctuations from the main contract price data. Then, one-dimensional convolution reduces the dimensionality of the filtered data and extracts key information. The filtered and dimensionality-reduced price data is fed into a GANs network, and its output serve as input of a fully connected network. Through cross-validation, a model is trained to capture features that precede large price fluctuations. The model predicts the likelihood and direction of significant price changes in real-time price sequences, placing trades at moments of high prediction accuracy. Empirical results demonstrate that using autoencoders and convolution to filter and denoise financial data, combined with GANs, achieves a certain level of predictive performance, validating the capabilities of machine learning algorithms to discover underlying patterns in financial sequences. Keywords - CNN;GANs; Cryptocurrency; Prediction.

The p...

The paper was accepted by 2024 4th International Conference on Artificial Intelligence, Robotics, and Communication(ICAIRC 2024)

TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting 2025-01-22
Show

Current time series forecasting methods can be broadly classified into two categories: Channel Independent (CI) and Channel Dependent (CD) strategies, both aiming to capture the complex dependencies within time series data. However, the CI strategy fails to exploit highly correlated covariate information, while the CD strategy integrates all dependencies, including irrelevant or noisy ones, thus compromising generalization. To mitigate these issues, recent works have introduced the Channel Clustering (CC) strategy by grouping channels with similar characteristics and applying different modeling techniques to each cluster. However, coarse-grained clustering cannot flexibly capture complex, time-varying interactions. Addressing the above challenges, we propose TimeFilter, a graph-based framework for adaptive and fine-grained dependency modeling. Specifically, after constructing the graph with the input sequence, TimeFilter filters out irrelevant correlations and preserves the most critical ones through patch-specific filtering. Extensive experiments on 13 real-world datasets from various application domains demonstrate the state-of-the-art performance of TimeFilter. The code is available at https://github.com/TROUBADOUR000/TimeFilter.

Causality for VARMA processes with instantaneous effects: The global Markov property, faithfulness and instrumental variables 2025-01-22
Show

Causal reasoning has gained great attention over the last half century as it allows (or at least intends) to answer questions which go above those within the capabilities of classical inferential statistics using just observational data. So far, causal research has been focused mostly on the i.i.d. setting. However, many are the situations where there exists a non-trivial dependence structure between sequential observations. Motivated by this fact, the main purpose of this work is to study causal properties of time series under the structural assumption of a VARMA model with instantaneous effects. First, the global Markov property is studied, building on existing work for VAR processes without instantaneous effects. Infinite graphs which represent the dependencies of the process are defined so that separation statements translate to conditional independencies in the stationary distribution of the process. Second, faithfulness is examined as a counterpart of this Markov property. Conditions are given so that the stationary distribution of the process is almost surely faithful to said infinite graphs. In addition, an instrumental variable regression framework is developed for VARMA models with instantaneous effects. This allows to identify and consistently estimate total causal effects.

42 pages, 9 figures
Multifractal hopscotch in "Hopscotch" by Julio Cortazar 2025-01-22
Show

Punctuation is the main factor introducing correlations in natural language written texts and it crucially impacts their overall effectiveness, expressiveness, and readability. Punctuation marks at the end of sentences are of particular importance as their distribution can determine various complexity features of written natural language. Here, the sentence length variability (SLV) time series representing "Hopscotch" by Julio Cortazar are subjected to quantitative analysis with an attempt to identify their distribution type, long-memory effects, and potential multiscale patterns. The analyzed novel is an important and innovative piece of literature whose essential property is freedom of movement between its building blocks given to a reader by the author. The statistical consequences of this freedom are closely investigated in both the original, Spanish version of the novel, and its translations into English and Polish. Clear evidence of rich multifractality in the SLV dynamics, with a left-sided asymmetry, however, is observed in all three language versions as well as in the versions with differently ordered chapters.

Budget-constrained Collaborative Renewable Energy Forecasting Market 2025-01-22
Show

Accurate power forecasting from renewable energy sources (RES) is crucial for integrating additional RES capacity into the power system and realizing sustainability goals. This work emphasizes the importance of integrating decentralized spatio-temporal data into forecasting models. However, decentralized data ownership presents a critical obstacle to the success of such spatio-temporal models, and incentive mechanisms to foster data-sharing need to be considered. The main contributions are a) a comparative analysis of the forecasting models, advocating for efficient and interpretable spline LASSO regression models, and b) a bidding mechanism within the data/analytics market to ensure fair compensation for data providers and enable both buyers and sellers to express their data price requirements. Furthermore, an incentive mechanism for time series forecasting is proposed, effectively incorporating price constraints and preventing redundant feature allocation. Results show significant accuracy improvements and potential monetary gains for data sellers. For wind power data, an average root mean squared error improvement of over 10% was achieved by comparing forecasts generated by the proposal with locally generated ones.

Treatment-aware Diffusion Probabilistic Model for Longitudinal MRI Generation and Diffuse Glioma Growth Prediction 2025-01-22
Show

Diffuse gliomas are malignant brain tumors that grow widespread through the brain. The complex interactions between neoplastic cells and normal tissue, as well as the treatment-induced changes often encountered, make glioma tumor growth modeling challenging. In this paper, we present a novel end-to-end network capable of future predictions of tumor masks and multi-parametric magnetic resonance images (MRI) of how the tumor will look at any future time points for different treatment plans. Our approach is based on cutting-edge diffusion probabilistic models and deep-segmentation neural networks. We included sequential multi-parametric MRI and treatment information as conditioning inputs to guide the generative diffusion process as well as a joint segmentation process. This allows for tumor growth estimates and realistic MRI generation at any given treatment and time point. We trained the model using real-world postoperative longitudinal MRI data with glioma tumor growth trajectories represented as tumor segmentation maps over time. The model demonstrates promising performance across various tasks, including generating high-quality multi-parametric MRI with tumor masks, performing time-series tumor segmentations, and providing uncertainty estimates. Combined with the treatment-aware generated MRI, the tumor growth predictions with uncertainty estimates can provide useful information for clinical decision-making.

prepr...

preprints in IEEE-TMI, 14 pages

Data re-uploading in Quantum Machine Learning for time series: application to traffic forecasting 2025-01-22
Show

Accurate traffic forecasting plays a crucial role in modern Intelligent Transportation Systems (ITS), as it enables real-time traffic flow management, reduces congestion, and improves the overall efficiency of urban transportation networks. With the rise of Quantum Machine Learning (QML), it has emerged a new paradigm possessing the potential to enhance predictive capabilities beyond what classical machine learning models can achieve. In the present work we pursue a heuristic approach to explore the potential of QML, and focus on a specific transport issue. In particular, as a case study we investigate a traffic forecast task for a major urban area in Athens (Greece), for which we possess high-resolution data. In this endeavor we explore the application of Quantum Neural Networks (QNN), and, notably, we present the first application of quantum data re-uploading in the context of transport forecasting. This technique allows quantum models to better capture complex patterns, such as traffic dynamics, by repeatedly encoding classical data into a quantum state. Aside from providing a prediction model, we spend considerable effort in comparing the performance of our hybrid quantum-classical neural networks with classical deep learning approaches. Our results show that hybrid models achieve competitive accuracy with state-of-the-art classical methods, especially when the number of qubits and re-uploading blocks is increased. While the classical models demonstrate lower computational demands, we provide evidence that increasing the complexity of the quantum model improves predictive accuracy. These findings indicate that QML techniques, and specifically the data re-uploading approach, hold promise for advancing traffic forecasting models and could be instrumental in addressing challenges inherent in ITS environments.

Trajectory

Title Date Abstract Comment
Large Language Models for Single-Step and Multi-Step Flight Trajectory Prediction 2025-01-29
Show

Flight trajectory prediction is a critical time series task in aviation. While deep learning methods have shown significant promise, the application of large language models (LLMs) to this domain remains underexplored. This study pioneers the use of LLMs for flight trajectory prediction by reframing it as a language modeling problem. Specifically, We extract features representing the aircraft's position and status from ADS-B flight data to construct a prompt-based dataset, where trajectory waypoints are converted into language tokens. The dataset is then employed to fine-tune LLMs, enabling them to learn complex spatiotemporal patterns for accurate predictions. Comprehensive experiments demonstrate that LLMs achieve notable performance improvements in both single-step and multi-step predictions compared to traditional methods, with LLaMA-3.1 model achieving the highest overall accuracy. However, the high inference latency of LLMs poses a challenge for real-time applications, underscoring the need for further research in this promising direction.

9 pages, 7 figures
Realtime Limb Trajectory Optimization for Humanoid Running Through Centroidal Angular Momentum Dynamics 2025-01-29
Show

One of the essential aspects of humanoid robot running is determining the limb-swinging trajectories. During the flight phases, where the ground reaction forces are not available for regulation, the limb swinging trajectories are significant for the stability of the next stance phase. Due to the conservation of angular momentum, improper leg and arm swinging results in highly tilted and unsustainable body configurations at the next stance phase landing. In such cases, the robotic system fails to maintain locomotion independent of the stability of the center of mass trajectories. This problem is more apparent for fast and high flight time trajectories. This paper proposes a real-time nonlinear limb trajectory optimization problem for humanoid running. The optimization problem is tested on two different humanoid robot models, and the generated trajectories are verified using a running algorithm for both robots in a simulation environment.

Submi...

Submitted to IEEE ICRA2025

Target-driven Self-Distillation for Partial Observed Trajectories Forecasting 2025-01-28
Show

Accurate prediction of future trajectories of traffic agents is essential for ensuring safe autonomous driving. However, partially observed trajectories can significantly degrade the performance of even state-of-the-art models. Previous approaches often rely on knowledge distillation to transfer features from fully observed trajectories to partially observed ones. This involves firstly training a fully observed model and then using a distillation process to create the final model. While effective, they require multi-stage training, making the training process very expensive. Moreover, knowledge distillation can lead to a performance degradation of the model. In this paper, we introduce a Target-driven Self-Distillation method (TSD) for motion forecasting. Our method leverages predicted accurate targets to guide the model in making predictions under partial observation conditions. By employing self-distillation, the model learns from the feature distributions of both fully observed and partially observed trajectories during a single end-to-end training process. This enhances the model's ability to predict motion accurately in both fully observed and partially observed scenarios. We evaluate our method on multiple datasets and state-of-the-art motion forecasting models. Extensive experimental results demonstrate that our approach achieves significant performance improvements in both settings. To facilitate further research, we will release our code and model checkpoints.

Hierarchical Trajectory (Re)Planning for a Large Scale Swarm 2025-01-28
Show

We consider the trajectory replanning problem for a large-scale swarm in a cluttered environment. Our path planner replans for robots by utilizing a hierarchical approach, dividing the workspace, and computing collision-free paths for robots within each cell in parallel. Distributed trajectory optimization generates a deadlock-free trajectory for efficient execution and maintains the control feasibility even when the optimization fails. Our hierarchical approach combines the benefits of both centralized and decentralized methods, achieving a high task success rate while providing real-time replanning capability. Compared to decentralized approaches, our approach effectively avoids deadlocks and collisions, significantly increasing the task success rate. We demonstrate the real-time performance of our algorithm with up to 142 robots in simulation, and a representative 24 physical Crazyflie nano-quadrotor experiment.

13 pa...

13 pages, 14 figures. arXiv admin note: substantial text overlap with arXiv:2407.02777

Toward Safe Integration of UAM in Terminal Airspace: UAM Route Feasibility Assessment using Probabilistic Aircraft Trajectory Prediction 2025-01-28
Show

Integrating Urban Air Mobility (UAM) into airspace managed by Air Traffic Control (ATC) poses significant challenges, particularly in congested terminal environments. This study proposes a framework to assess the feasibility of UAM route integration using probabilistic aircraft trajectory prediction. By leveraging conditional Normalizing Flows, the framework predicts short-term trajectory distributions of conventional aircraft, enabling UAM vehicles to dynamically adjust speeds and maintain safe separations. The methodology was applied to airspace over Seoul metropolitan area, encompassing interactions between UAM and conventional traffic at multiple altitudes and lanes. The results reveal that different physical locations of lanes and routes experience varying interaction patterns and encounter dynamics. For instance, Lane 1 at lower altitudes (1,500 ft and 2,000 ft) exhibited minimal interactions with conventional aircraft, resulting in the largest separations and the most stable delay proportions. In contrast, Lane 4 near the airport experienced more frequent and complex interactions due to its proximity to departing traffic. The limited trajectory data for departing aircraft in this region occasionally led to tighter separations and increased operational challenges. This study underscores the potential of predictive modeling in facilitating UAM integration while highlighting critical trade-offs between safety and efficiency. The findings contribute to refining airspace management strategies and offer insights for scaling UAM operations in complex urban environments.

10 pages, 7 figures
On characterizing optimal learning trajectories in a class of learning problems 2025-01-27
Show

In this brief paper, we provide a mathematical framework that exploits the relationship between the maximum principle and dynamic programming for characterizing optimal learning trajectories in a class of learning problem, which is related to point estimations for modeling of high-dimensional nonlinear functions. Here, such characterization for the optimal learning trajectories is associated with the solution of an optimal control problem for a weakly-controlled gradient system with small parameters, whose time-evolution is guided by a model training dataset and its perturbed version, while the optimization problem consists of a cost functional that summarizes how to gauge the quality/performance of the estimated model parameters at a certain fixed final time w.r.t. a model validating dataset. Moreover, using a successive Galerkin approximation method, we provide an algorithmic recipe how to construct the corresponding optimal learning trajectories leading to the optimal estimated model parameters for such a class of learning problem.

5 Pag...

5 Pages (A further extension of the paper: arXiv:2412.08772)

Beyond In-Distribution Performance: A Cross-Dataset Study of Trajectory Prediction Robustness 2025-01-27
Show

We study the Out-of-Distribution (OoD) generalization ability of three SotA trajectory prediction models with comparable In-Distribution (ID) performance but different model designs. We investigate the influence of inductive bias, size of training data and data augmentation strategy by training the models on Argoverse 2 (A2) and testing on Waymo Open Motion (WO) and vice versa. We find that the smallest model with highest inductive bias exhibits the best OoD generalization across different augmentation strategies when trained on the smaller A2 dataset and tested on the large WO dataset. In the converse setting, training all models on the larger WO dataset and testing on the smaller A2 dataset, we find that all models generalize poorly, even though the model with the highest inductive bias still exhibits the best generalization ability. We discuss possible reasons for this surprising finding and draw conclusions about the design and test of trajectory prediction models and benchmarks.

arXiv...

arXiv admin note: text overlap with arXiv:2407.13431

Error-State LQR Formulation for Quadrotor UAV Trajectory Tracking 2025-01-27
Show

This article presents an error-state Linear Quadratic Regulator (LQR) formulation for robust trajectory tracking in quadrotor Unmanned Aerial Vehicles (UAVs). The proposed approach leverages error-state dynamics and employs exponential coordinates to represent orientation errors, enabling a linearized system representation for real-time control. The control strategy integrates an LQR-based full-state feedback controller for trajectory tracking, combined with a cascaded bodyrate controller to handle actuator dynamics. Detailed derivations of the error-state dynamics, the linearization process, and the controller design are provided, highlighting the applicability of the method for precise and stable quadrotor control in dynamic environments.

TEA: Trajectory Encoding Augmentation for Robust and Transferable Policies in Offline Reinforcement Learning 2025-01-26
Show

In this paper, we investigate offline reinforcement learning (RL) with the goal of training a single robust policy that generalizes effectively across environments with unseen dynamics. We propose a novel approach, Trajectory Encoding Augmentation (TEA), which extends the state space by integrating latent representations of environmental dynamics obtained from sequence encoders, such as AutoEncoders. Our findings show that incorporating these encodings with TEA improves the transferability of a single policy to novel environments with new dynamics, surpassing methods that rely solely on unmodified states. These results indicate that TEA captures critical, environment-specific characteristics, enabling RL agents to generalize effectively across dynamic conditions.

Accep...

Accepted to ESANN 2025

Improving Out-of-Distribution Generalization of Trajectory Prediction for Autonomous Driving via Polynomial Representations 2025-01-25
Show

Robustness against Out-of-Distribution (OoD) samples is a key performance indicator of a trajectory prediction model. However, the development and ranking of state-of-the-art (SotA) models are driven by their In-Distribution (ID) performance on individual competition datasets. We present an OoD testing protocol that homogenizes datasets and prediction tasks across two large-scale motion datasets. We introduce a novel prediction algorithm based on polynomial representations for agent trajectory and road geometry on both the input and output sides of the model. With a much smaller model size, training effort, and inference time, we reach near SotA performance for ID testing and significantly improve robustness in OoD testing. Within our OoD testing protocol, we further study two augmentation strategies of SotA models and their effects on model generalization. Highlighting the contrast between ID and OoD performance, we suggest adding OoD testing to the evaluation criteria of trajectory prediction models.

Towards Robust Spacecraft Trajectory Optimization via Transformers 2025-01-25
Show

Future multi-spacecraft missions require robust autonomous trajectory optimization capabilities to ensure safe and efficient rendezvous operations. This capability hinges on solving non-convex optimal control problems in real-time, although traditional iterative methods such as sequential convex programming impose significant computational challenges. To mitigate this burden, the Autonomous Rendezvous Transformer (ART) introduced a generative model trained to provide near-optimal initial guesses. This approach provides convergence to better local optima (e.g., fuel optimality), improves feasibility rates, and results in faster convergence speed of optimization algorithms through warm-starting. This work extends the capabilities of ART to address robust chance-constrained optimal control problems. Specifically, ART is applied to challenging rendezvous scenarios in Low Earth Orbit (LEO), ensuring fault-tolerant behavior under uncertainty. Through extensive experimentation, the proposed warm-starting strategy is shown to consistently produce high-quality reference trajectories, achieving up to 30% cost improvement and 50% reduction in infeasible cases compared to conventional methods, demonstrating robust performance across multiple state representations. Additionally, a post hoc evaluation framework is proposed to assess the quality of generated trajectories and mitigate runtime failures, marking an initial step toward the reliable deployment of AI-driven solutions in safety-critical autonomous systems such as spacecraft.

Submi...

Submitted to the IEEE Aerospace Conference 2025. 13 pages, 10 figures

Where Do You Go? Pedestrian Trajectory Prediction using Scene Features 2025-01-23
Show

Accurate prediction of pedestrian trajectories is crucial for enhancing the safety of autonomous vehicles and reducing traffic fatalities involving pedestrians. While numerous studies have focused on modeling interactions among pedestrians to forecast their movements, the influence of environmental factors and scene-object placements has been comparatively underexplored. In this paper, we present a novel trajectory prediction model that integrates both pedestrian interactions and environmental context to improve prediction accuracy. Our approach captures spatial and temporal interactions among pedestrians within a sparse graph framework. To account for pedestrian-scene interactions, we employ advanced image enhancement and semantic segmentation techniques to extract detailed scene features. These scene and interaction features are then fused through a cross-attention mechanism, enabling the model to prioritize relevant environmental factors that influence pedestrian movements. Finally, a temporal convolutional network processes the fused features to predict future pedestrian trajectories. Experimental results demonstrate that our method significantly outperforms existing state-of-the-art approaches, achieving ADE and FDE values of 0.252 and 0.372 meters, respectively, underscoring the importance of incorporating both social interactions and environmental context in pedestrian trajectory prediction.

Accep...

Accepted by 2024 International Conference on Intelligent Computing and its Emerging Applications

In-Trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates 2025-01-23
Show

Inverse reinforcement learning (IRL) aims to learn a reward function and a corresponding policy that best fit the demonstrated trajectories of an expert. However, current IRL works cannot learn incrementally from an ongoing trajectory because they have to wait to collect at least one complete trajectory to learn. To bridge the gap, this paper considers the problem of learning a reward function and a corresponding policy while observing the initial state-action pair of an ongoing trajectory and keeping updating the learned reward and policy when new state-action pairs of the ongoing trajectory are observed. We formulate this problem as an online bi-level optimization problem where the upper level dynamically adjusts the learned reward according to the newly observed state-action pairs with the help of a meta-regularization term, and the lower level learns the corresponding policy. We propose a novel algorithm to solve this problem and guarantee that the algorithm achieves sub-linear local regret $O(\sqrt{T}+\log T+\sqrt{T}\log T)$. If the reward function is linear, we prove that the proposed algorithm achieves sub-linear regret $O(\log T)$. Experiments are used to validate the proposed algorithm.

Towards spiking analog hardware implementation of a trajectory interpolation mechanism for smooth closed-loop control of a spiking robot arm 2025-01-23
Show

Neuromorphic engineering aims to incorporate the computational principles found in animal brains, into modern technological systems. Following this approach, in this work we propose a closed-loop neuromorphic control system for an event-based robotic arm. The proposed system consists of a shifted Winner-Take-All spiking network for interpolating a reference trajectory and a spiking comparator network responsible for controlling the flow continuity of the trajectory, which is fed back to the actual position of the robot. The comparator model is based on a differential position comparison neural network, which governs the execution of the next trajectory points to close the control loop between both components of the system. To evaluate the system, we implemented and deployed the model on a mixed-signal analog-digital neuromorphic platform, the DYNAP-SE2, to facilitate integration and communication with the ED-Scorbot robotic arm platform. Experimental results on one joint of the robot validate the use of this architecture and pave the way for future neuro-inspired control of the entire robot.

5 pag...

5 pages, 7 figures, conference, ISCAS 2025, accepted for publication, Spiking Neural Network

Knowledge-Informed Multi-Agent Trajectory Prediction at Signalized Intersections for Infrastructure-to-Everything 2025-01-23
Show

Multi-agent trajectory prediction at signalized intersections is crucial for developing efficient intelligent transportation systems and safe autonomous driving systems. Due to the complexity of intersection scenarios and the limitations of single-vehicle perception, the performance of vehicle-centric prediction methods has reached a plateau. Furthermore, most works underutilize critical intersection information, including traffic signals, and behavior patterns induced by road structures. Therefore, we propose a multi-agent trajectory prediction framework at signalized intersections dedicated to Infrastructure-to-Everything (I2XTraj). Our framework leverages dynamic graph attention to integrate knowledge from traffic signals and driving behaviors. A continuous signal-informed mechanism is proposed to adaptively process real-time traffic signals from infrastructure devices. Additionally, leveraging the prior knowledge of the intersection topology, we propose a driving strategy awareness mechanism to model the joint distribution of goal intentions and maneuvers. To the best of our knowledge, I2XTraj represents the first multi-agent trajectory prediction framework explicitly designed for infrastructure deployment, supplying subscribable prediction services to all vehicles at intersections. I2XTraj demonstrates state-of-the-art performance on both the Vehicle-to-Infrastructure dataset V2X-Seq and the aerial-view dataset SinD for signalized intersections. Quantitative evaluations show that our approach outperforms existing methods by more than 30% in both multi-agent and single-agent scenarios.

Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks 2025-01-23
Show

Signal Temporal Logic (STL) is a powerful specification language for describing complex temporal behaviors of continuous signals, making it well-suited for high-level robotic task descriptions. However, generating executable plans for STL tasks is challenging, as it requires consideration of the coupling between the task specification and the system dynamics. Existing approaches either follow a model-based setting that explicitly requires knowledge of the system dynamics or adopt a task-oriented data-driven approach to learn plans for specific tasks. In this work, we investigate the problem of generating executable STL plans for systems whose dynamics are unknown a priori. We propose a new planning framework that uses only task-agnostic data during the offline training stage, enabling zero-shot generalization to new STL tasks. Our framework is hierarchical, involving: (i) decomposing the STL task into a set of progress and time constraints, (ii) searching for time-aware waypoints guided by task-agnostic data, and (iii) generating trajectories using a pre-trained safe diffusion model. Simulation results demonstrate the effectiveness of our method indeed in achieving zero-shot generalization to various STL tasks.

submitted
One Fits All: General Mobility Trajectory Modeling via Masked Conditional Diffusion 2025-01-23
Show

Trajectory data play a crucial role in many applications, ranging from network optimization to urban planning. Existing studies on trajectory data are task-specific, and their applicability is limited to the specific tasks on which they have been trained, such as generation, recovery, or prediction. However, the potential of a unified model has not yet been fully explored in trajectory modeling. Although various trajectory tasks differ in inputs, outputs, objectives, and conditions, they share common mobility patterns. Based on these common patterns, we can construct a general framework that enables a single model to address different tasks. However, building a trajectory task-general framework faces two critical challenges: 1) the diversity in the formats of different tasks and 2) the complexity of the conditions imposed on different tasks. In this work, we propose a general trajectory modeling framework via masked conditional diffusion (named GenMove). Specifically, we utilize mask conditions to unify diverse formats. To adapt to complex conditions associated with different tasks, we utilize historical trajectory data to obtain contextual trajectory embeddings, which include rich contexts such as spatiotemporal characteristics and user preferences. Integrating the contextual trajectory embedding into diffusion models through a classifier-free guidance approach allows the model to flexibly adjust its outputs based on different conditions. Extensive experiments on mainstream tasks demonstrate that our model significantly outperforms state-of-the-art baselines, with the highest performance improvement exceeding 13% in generation tasks.

A Spatio-temporal Graph Network Allowing Incomplete Trajectory Input for Pedestrian Trajectory Prediction 2025-01-22
Show

Pedestrian trajectory prediction is important in the research of mobile robot navigation in environments with pedestrians. Most pedestrian trajectory prediction algorithms require the input historical trajectories to be complete. If a pedestrian is unobservable in any frame in the past, then its historical trajectory become incomplete, the algorithm will not predict its future trajectory. To address this limitation, we propose the STGN-IT, a spatio-temporal graph network allowing incomplete trajectory input, which can predict the future trajectories of pedestrians with incomplete historical trajectories. STGN-IT uses the spatio-temporal graph with an additional encoding method to represent the historical trajectories and observation states of pedestrians. Moreover, STGN-IT introduces static obstacles in the environment that may affect the future trajectories as nodes to further improve the prediction accuracy. A clustering algorithm is also applied in the construction of spatio-temporal graphs. Experiments on public datasets show that STGN-IT outperforms state of the art algorithms on these metrics.

Trajectory tracking model-following control using Lyapunov redesign with output time-derivatives to compensate unmatched uncertainties 2025-01-22
Show

We study trajectory tracking for flat nonlinear systems with unmatched uncertainties using the model-following control (MFC) architecture. We apply state feedback linearisation control for the process and propose a simplified implementation of the model control loop which results in a simple model in Brunovsky-form that represents the nominal feedback linearised dynamics of the nonlinear process. To compensate possibly unmatched model uncertainties, we employ Lyapunov redesign with numeric derivatives of the output. It turns out that for a special initialisation of the model, the MFC reduces to a single-loop control design. We illustrate our results by a numerical example.

Learning segmentation from point trajectories 2025-01-21
Show

We consider the problem of segmenting objects in videos based on their motion and no other forms of supervision. Prior work has often approached this problem by using the principle of common fate, namely the fact that the motion of points that belong to the same object is strongly correlated. However, most authors have only considered instantaneous motion from optical flow. In this work, we present a way to train a segmentation network using long-term point trajectories as a supervisory signal to complement optical flow. The key difficulty is that long-term motion, unlike instantaneous motion, is difficult to model -- any parametric approximation is unlikely to capture complex motion patterns over long periods of time. We instead draw inspiration from subspace clustering approaches, proposing a loss function that seeks to group the trajectories into low-rank matrices where the motion of object points can be approximately explained as a linear combination of other point tracks. Our method outperforms the prior art on motion-based segmentation, which shows the utility of long-term motion and the effectiveness of our formulation.

NeurI...

NeurIPS 2024 Spotlight. Project https://www.robots.ox.ac.uk/~vgg/research/lrtl/

Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model 2025-01-20
Show

Recent approaches to training algorithm selectors in the black-box optimisation domain have advocated for the use of training data that is algorithm-centric in order to encapsulate information about how an algorithm performs on an instance, rather than relying on information derived from features of the instance itself. Probing-trajectories that consist of a sequence of objective performance per function evaluation obtained from a short run of an algorithm have recently shown particular promise in training accurate selectors. However, training models on this type of data requires an appropriately chosen classifier given the sequential nature of the data. There are currently no clear guidelines for choosing the most appropriate classifier for algorithm selection using time-series data from the plethora of models available. To address this, we conduct a large benchmark study using 17 different classifiers and three types of trajectory on a classification task using the BBOB benchmark suite using both leave-one-instance out and leave-one-problem out cross-validation. In contrast to previous studies using tabular data, we find that the choice of classifier has a significant impact, showing that feature-based and interval-based models are the best choices.

To ap...

To appear in Applications of Evolutionary Computation 28th International Conference, EvoApplications 2025

Spatio-temporal characterisation of underwater noise through semantic trajectories 2025-01-19
Show

Underwater noise pollution from human activities, particularly shipping, has been recognised as a serious threat to marine life. The sound generated by vessels can have various adverse effects on fish and aquatic ecosystems in general. In this setting, the estimation and analysis of the underwater noise produced by vessels is an important challenge for the preservation of the marine environment. In this paper we propose a model for the spatio-temporal characterisation of the underwater noise generated by vessels. The approach is based on the reconstruction of the vessels' trajectories from Automatic Identification System (AIS) data and on their deployment in a spatio-temporal database. Trajectories are enriched with semantic information like the acoustic characteristics of the vessels' engines or the activity performed by the vessels. We define a model for underwater noise propagation and use the trajectories' information to infer how noise propagates in the area of interest. We develop our approach for the case study of the fishery activities in the Northern Adriatic sea, an area of the Mediterranean sea which is well known to be highly exploited. We implement our approach using MobilityDB, an open source geospatial trajectory data management and analysis platform, which offers spatio-temporal operators and indexes improving the efficiency of our system. We use this platform to conduct various analyses of the underwater noise generated in the Northern Adriatic Sea, aiming at estimating the impact of fishing activities on underwater noise pollution and at demonstrating the flexibility and expressiveness of our approach.

TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification 2025-01-19
Show

The increasing prevalence of compact UAVs has introduced significant risks to public safety, while traditional drone detection systems are often bulky and costly. To address these challenges, we present TAME, the Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification. This innovative anti-UAV detection model leverages a parallel selective state-space model to simultaneously capture and learn both the temporal and spectral features of audio, effectively analyzing propagation of sound. To further enhance temporal features, we introduce a Temporal Feature Enhancement Module, which integrates spectral features into temporal data using residual cross-attention. This enhanced temporal information is then employed for precise 3D trajectory estimation and classification. Our model sets a new standard of performance on the MMUAD benchmarks, demonstrating superior accuracy and effectiveness. The code and trained models are publicly available on GitHub \url{https://github.com/AmazingDay1/TAME}.

This ...

This paper has been accepted for presentation at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2025. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

Audio Array-Based 3D UAV Trajectory Estimation with LiDAR Pseudo-Labeling 2025-01-19
Show

As small unmanned aerial vehicles (UAVs) become increasingly prevalent, there is growing concern regarding their impact on public safety and privacy, highlighting the need for advanced tracking and trajectory estimation solutions. In response, this paper introduces a novel framework that utilizes audio array for 3D UAV trajectory estimation. Our approach incorporates a self-supervised learning model, starting with the conversion of audio data into mel-spectrograms, which are analyzed through an encoder to extract crucial temporal and spectral information. Simultaneously, UAV trajectories are estimated using LiDAR point clouds via unsupervised methods. These LiDAR-based estimations act as pseudo labels, enabling the training of an Audio Perception Network without requiring labeled data. In this architecture, the LiDAR-based system operates as the Teacher Network, guiding the Audio Perception Network, which serves as the Student Network. Once trained, the model can independently predict 3D trajectories using only audio signals, with no need for LiDAR data or external ground truth during deployment. To further enhance precision, we apply Gaussian Process modeling for improved spatiotemporal tracking. Our method delivers top-tier performance on the MMAUD dataset, establishing a new benchmark in trajectory estimation using self-supervised learning techniques without reliance on ground truth annotations.

Accepted for ICASSP
Risk-Informed Diffusion Transformer for Long-Tail Trajectory Prediction in the Crash Scenario 2025-01-18
Show

Trajectory prediction methods have been widely applied in autonomous driving technologies. Although the overall performance accuracy of trajectory prediction is relatively high, the lack of trajectory data in critical scenarios in the training data leads to the long-tail phenomenon. Normally, the trajectories of the tail data are more critical and more difficult to predict and may include rare scenarios such as crashes. To solve this problem, we extracted the trajectory data from real-world crash scenarios, which contain more long-tail data. Meanwhile, based on the trajectory data in this scenario, we integrated graph-based risk information and diffusion with transformer and proposed the Risk-Informed Diffusion Transformer (RI-DiT) trajectory prediction method. Extensive experiments were conducted on trajectory data in the real-world crash scenario, and the results show that the algorithm we proposed has good performance. When predicting the data of the tail 10% (Top 10%), the minADE and minFDE indicators are 0.016/2.667 m. At the same time, we showed the trajectory conditions of different long-tail distributions. The distribution of trajectory data is closer to the tail, the less smooth the trajectory is. Through the trajectory data in real-world crash scenarios, Our work expands the methods to overcome the long-tail challenges in trajectory prediction. Our method, RI-DiT, integrates inverse time to collision (ITTC) and the feature of traffic flow, which can predict long-tail trajectories more accurately and improve the safety of autonomous driving systems.

Three-dimensional Trajectory Optimization for Quadrotor Tail-sitter UAVs: Traversing through Given Waypoints 2025-01-18
Show

Given the evolving application scenarios of current fixed-wing unmanned aerial vehicles (UAVs), it is necessary for UAVs to possess agile and rapid 3-dimensional flight capabilities. Typically, the trajectory of a tail-sitter is generated separately for vertical and level flights. This limits the tail-sitter's ability to move in a 3-dimensional airspace and makes it difficult to establish a smooth transition between vertical and level flights. In the present work, a 3-dimensional trajectory optimization method is proposed for quadrotor tail-sitters. Especially, the differential dynamics constraints are eliminated when generating the trajectory of the tail-sitter by utilizing differential flatness method. Additionally, the temporal parameters of the trajectory are generated using the state-of-the-art trajectory generation method called MINCO (minimum control). Subsequently, we convert the speed constraint on the vehicle into a soft constraint by discretizing the trajectory in time. This increases the likelihood that the control input limits are satisfied and the trajectory is feasible. Then, we utilize a kind of model predictive control (MPC) method to track trajectories. Even if restricting the tail-sitter's motion to a 2-dimensional horizontal plane, the solutions still outperform those of the L1 Guidance Law and Dubins path.

Efficient and Safe Trajectory Planning for Autonomous Agricultural Vehicle Headland Turning in Cluttered Orchard Environments 2025-01-18
Show

Autonomous agricultural vehicles (AAVs), including field robots and autonomous tractors, are becoming essential in modern farming by improving efficiency and reducing labor costs. A critical task in AAV operations is headland turning between crop rows. This task is challenging in orchards with limited headland space, irregular boundaries, operational constraints, and static obstacles. While traditional trajectory planning methods work well in arable farming, they often fail in cluttered orchard environments. This letter presents a novel trajectory planner that enhances the safety and efficiency of AAV headland maneuvers, leveraging advancements in autonomous driving. Our approach includes an efficient front-end algorithm and a high-performance back-end optimization. Applied to vehicles with various implements, it outperforms state-of-the-art methods in both standard and challenging orchard fields. This work bridges agricultural and autonomous driving technologies, facilitating a broader adoption of AAVs in complex orchards.

On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression 2025-01-17
Show

In real-world sequential decision making tasks like autonomous driving, robotics, and healthcare, learning from observed state-action trajectories is critical for tasks like imitation, classification, and clustering. For example, self-driving cars must replicate human driving behaviors, while robots and healthcare systems benefit from modeling decision sequences, whether or not they come from expert data. Existing trajectory encoding methods often focus on specific tasks or rely on reward signals, limiting their ability to generalize across domains and tasks. Inspired by the success of embedding models like CLIP and BERT in static domains, we propose a novel method for embedding state-action trajectories into a latent space that captures the skills and competencies in the dynamic underlying decision-making processes. This method operates without the need for reward labels, enabling better generalization across diverse domains and tasks. Our contributions are threefold: (1) We introduce a trajectory embedding approach that captures multiple abilities from state-action data. (2) The learned embeddings exhibit strong representational power across downstream tasks, including imitation, classification, clustering, and regression. (3) The embeddings demonstrate unique properties, such as controlling agent behaviors in IQ-Learn and an additive structure in the latent space. Experimental results confirm that our method outperforms traditional approaches, offering more flexible and powerful trajectory representations for various applications. Our code is available at https://github.com/Erasmo1015/vte.

AAMAS 2025
STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a Non-Autoregressive Transformer for Robot Following Ahead 2025-01-17
Show

In this paper, we develop a neural network model to predict future human motion from an observed human motion history. We propose a non-autoregressive transformer architecture to leverage its parallel nature for easier training and fast, accurate predictions at test time. The proposed architecture divides human motion prediction into two parts: 1) the human trajectory, which is the hip joint 3D position over time and 2) the human pose which is the all other joints 3D positions over time with respect to a fixed hip joint. We propose to make the two predictions simultaneously, as the shared representation can improve the model performance. Therefore, the model consists of two sets of encoders and decoders. First, a multi-head attention module applied to encoder outputs improves human trajectory. Second, another multi-head self-attention module applied to encoder outputs concatenated with decoder outputs facilitates learning of temporal dependencies. Our model is well-suited for robotic applications in terms of test accuracy and speed, and compares favorably with respect to state-of-the-art methods. We demonstrate the real-world applicability of our work via the Robot Follow-Ahead task, a challenging yet practical case study for our proposed model.

BILTS: A Bi-Invariant Similarity Measure for Robust Object Trajectory Recognition under Reference Frame Variations 2025-01-17
Show

When similar object motions are performed in diverse contexts but are meant to be recognized under a single classification, these contextual variations act as disturbances that negatively affect accurate motion recognition. In this paper, we focus on contextual variations caused by reference frame variations. To robustly deal with these variations, similarity measures have been introduced that compare object motion trajectories in a context-invariant manner. However, most are highly sensitive to noise near singularities, where the measure is not uniquely defined, and lack bi-invariance (invariance to both world and body frame variations). To address these issues, we propose the novel \textit{Bi-Invariant Local Trajectory-Shape Similarity} (BILTS) measure. Compared to other measures, the BILTS measure uniquely offers bi-invariance, boundedness, and third-order shape identity. Aimed at practical implementations, we devised a discretized and regularized version of the BILTS measure which shows exceptional robustness to singularities. This is demonstrated through rigorous recognition experiments using multiple datasets. On average, BILTS attained the highest recognition ratio and least sensitivity to contextual variations compared to other invariant object motion similarity measures. We believe that the BILTS measure is a valuable tool for recognizing motions performed in diverse contexts and has potential in other applications, including the recognition, segmentation, and adaptation of both motion and force trajectories.

This ...

This work has been submitted as a regular research paper for consideration in the Journal of Intelligent & Robotic Systems. The content in this preprint is identical to the version submitted for peer review, except for formatting differences required by the journal

ASTRA: A Scene-aware TRAnsformer-based model for trajectory prediction 2025-01-16
Show

We present ASTRA (A} Scene-aware TRAnsformer-based model for trajectory prediction), a light-weight pedestrian trajectory forecasting model that integrates the scene context, spatial dynamics, social inter-agent interactions and temporal progressions for precise forecasting. We utilised a U-Net-based feature extractor, via its latent vector representation, to capture scene representations and a graph-aware transformer encoder for capturing social interactions. These components are integrated to learn an agent-scene aware embedding, enabling the model to learn spatial dynamics and forecast the future trajectory of pedestrians. The model is designed to produce both deterministic and stochastic outcomes, with the stochastic predictions being generated by incorporating a Conditional Variational Auto-Encoder (CVAE). ASTRA also proposes a simple yet effective weighted penalty loss function, which helps to yield predictions that outperform a wide array of state-of-the-art deterministic and generative models. ASTRA demonstrates an average improvement of 27%/10% in deterministic/stochastic settings on the ETH-UCY dataset, and 26% improvement on the PIE dataset, respectively, along with seven times fewer parameters than the existing state-of-the-art model (see Figure 1). Additionally, the model's versatility allows it to generalize across different perspectives, such as Bird's Eye View (BEV) and Ego-Vehicle View (EVV).

Real-Time Generation of Near-Minimum-Energy Trajectories via Constraint-Informed Residual Learning 2025-01-16
Show

Industrial robotics demands significant energy to operate, making energy-reduction methodologies increasingly important. Strategies for planning minimum-energy trajectories typically involve solving nonlinear optimal control problems (OCPs), which rarely cope with real-time requirements. In this paper, we propose a paradigm for generating near minimum-energy trajectories for manipulators by learning from optimal solutions. Our paradigm leverages a residual learning approach, which embeds boundary conditions while focusing on learning only the adjustments needed to steer a standard solution to an optimal one. Compared to a computationally expensive OCP-based planner, our paradigm achieves 87.3% of the performance near the training dataset and 50.8% far from the dataset, while being two to three orders of magnitude faster.

Control Barrier Function-Based Safety Filters: Characterization of Undesired Equilibria, Unbounded Trajectories, and Limit Cycles 2025-01-16
Show

This paper focuses on safety filters designed based on Control Barrier Functions (CBFs): these are modifications of a nominal stabilizing controller typically utilized in safety-critical control applications to render a given subset of states forward invariant. The paper investigates the dynamical properties of the closed-loop systems, with a focus on characterizing undesirable behaviors that may emerge due to the use of CBF-based filters. These undesirable behaviors include unbounded trajectories, limit cycles, and undesired equilibria, which can be locally stable and even form a continuum. Our analysis offer the following contributions: (i) conditions under which trajectories remain bounded and (ii) conditions under which limit cycles do not exist; (iii) we show that undesired equilibria can be characterized by solving an algebraic equation, and (iv) we provide examples that show that asymptotically stable undesired equilibria can exist for a large class of nominal controllers and design parameters of the safety filter (even for convex safe sets). Further, for the specific class of planar systems, (v) we provide explicit formulas for the total number of undesired equilibria and the proportion of saddle points and asymptotically stable equilibria, and (vi) in the case of linear planar systems, we present an exhaustive analysis of their global stability properties. Examples throughout the paper illustrate the results.

Estimation-Aware Trajectory Optimization with Set-Valued Measurement Uncertainties 2025-01-15
Show

In this paper, we present an optimization-based framework for generating estimation-aware trajectories in scenarios where measurement (output) uncertainties are state-dependent and set-valued. The framework leverages the concept of regularity for set-valued output maps. Specifically, we demonstrate that, for output-regular maps, one can utilize a set-valued observability measure that is concave with respect to finite-horizon state trajectories. By maximizing this measure, optimized estimation-aware trajectories can be designed for a broad class of systems, including those with locally linearized dynamics. To illustrate the effectiveness of the proposed approach, we provide a representative example in the context of trajectory planning for vision-based estimation. We present an estimation-aware trajectory for an uncooperative target-tracking problem that uses a machine learning (ML)-based estimation module on an ego-satellite.

25 pages, 5 figures
MGF: Mixed Gaussian Flow for Diverse Trajectory Prediction 2025-01-15
Show

To predict future trajectories, the normalizing flow with a standard Gaussian prior suffers from weak diversity. The ineffectiveness comes from the conflict between the fact of asymmetric and multi-modal distribution of likely outcomes and symmetric and single-modal original distribution and supervision losses. Instead, we propose constructing a mixed Gaussian prior for a normalizing flow model for trajectory prediction. The prior is constructed by analyzing the trajectory patterns in the training samples without requiring extra annotations while showing better expressiveness and being multi-modal and asymmetric. Besides diversity, it also provides better controllability for probabilistic trajectory generation. We name our method Mixed Gaussian Flow (MGF). It achieves state-of-the-art performance in the evaluation of both trajectory alignment and diversity on the popular UCY/ETH and SDD datasets. Code is available at https://github.com/mulplue/MGF.

Accep...

Accepted by Neurips 2024. Code: https://github.com/mulplue/MGF

Predicting 4D Hand Trajectory from Monocular Videos 2025-01-14
Show

We present HaPTIC, an approach that infers coherent 4D hand trajectories from monocular videos. Current video-based hand pose reconstruction methods primarily focus on improving frame-wise 3D pose using adjacent frames rather than studying consistent 4D hand trajectories in space. Despite the additional temporal cues, they generally underperform compared to image-based methods due to the scarcity of annotated video data. To address these issues, we repurpose a state-of-the-art image-based transformer to take in multiple frames and directly predict a coherent trajectory. We introduce two types of lightweight attention layers: cross-view self-attention to fuse temporal information, and global cross-attention to bring in larger spatial context. Our method infers 4D hand trajectories similar to the ground truth while maintaining strong 2D reprojection alignment. We apply the method to both egocentric and allocentric videos. It significantly outperforms existing methods in global trajectory accuracy while being comparable to the state-of-the-art in single-image pose estimation. Project website: https://judyye.github.io/haptic-www

Pedestrian Trajectory Prediction Based on Social Interactions Learning With Random Weights 2025-01-13
Show

Pedestrian trajectory prediction is a critical technology in the evolution of self-driving cars toward complete artificial intelligence. Over recent years, focusing on the trajectories of pedestrians to model their social interactions has surged with great interest in more accurate trajectory predictions. However, existing methods for modeling pedestrian social interactions rely on pre-defined rules, struggling to capture non-explicit social interactions. In this work, we propose a novel framework named DTGAN, which extends the application of Generative Adversarial Networks (GANs) to graph sequence data, with the primary objective of automatically capturing implicit social interactions and achieving precise predictions of pedestrian trajectory. DTGAN innovatively incorporates random weights within each graph to eliminate the need for pre-defined interaction rules. We further enhance the performance of DTGAN by exploring diverse task loss functions during adversarial training, which yields improvements of 16.7% and 39.3% on metrics ADE and FDE, respectively. The effectiveness and accuracy of our framework are verified on two public datasets. The experimental results show that our proposed DTGAN achieves superior performance and is well able to understand pedestrians' intentions.

13 pa...

13 pages,7 figures,Accepted to IEEE Transactions on Multimedia (TMM)

Computing Safety Margins of Parameterized Nonlinear Systems for Vulnerability Assessment via Trajectory Sensitivities 2025-01-13
Show

Physical systems experience nonlinear disturbances which have the potential to disrupt desired behavior. For a particular disturbance, whether or not the system recovers from the disturbance to a desired stable equilibrium point depends on system parameter values, which are typically uncertain and time-varying. Therefore, to quantify proximity to vulnerability we define the safety margin to be the smallest change in parameter values from a nominal value such that the system will no longer be able to recover from the disturbance. Safety margins are valuable but challenging to compute as related methods, such as those for robust region of attraction estimation, are often either overly conservative or computationally intractable for high dimensional systems. Recently, we developed algorithms to compute safety margins efficiently and non-conservatively by exploiting the large sensitivity of the system trajectory near the region of attraction boundary to small perturbations. Although these algorithms have enjoyed empirical success, they lack theoretical guarantees that would ensure their generalizability. This work develops a novel characterization of safety margins in terms of trajectory sensitivities, and uses this to derive well-posedness and convergence guarantees for these algorithms, enabling their generalizability and successful application to a large class of nonlinear systems.

16 pages
Global Search for Optimal Low Thrust Spacecraft Trajectories using Diffusion Models and the Indirect Method 2025-01-13
Show

Long time-duration low-thrust nonlinear optimal spacecraft trajectory global search is a computationally and time expensive problem characterized by clustering patterns in locally optimal solutions. During preliminary mission design, mission parameters are subject to frequent changes, necessitating that trajectory designers efficiently generate high-quality control solutions for these new scenarios. Generative machine learning models can be trained to learn how the solution structure varies with respect to a conditional parameter, thereby accelerating the global search for missions with updated parameters. In this work, state-of-the-art diffusion models are integrated with the indirect approach for trajectory optimization within a global search framework. This framework is tested on two low-thrust transfers of different complexity in the circular restricted three-body problem. By generating and analyzing a training data set, we develop mathematical relations and techniques to understand the complex structures in the costate domain of locally optimal solutions for these problems. A diffusion model is trained on this data and successfully accelerates the global search for both problems. The model predicts how the costate solution structure changes, based on the maximum spacecraft thrust magnitude. Warm-starting a numerical solver with diffusion model samples for the costates at the initial time increases the number of solutions generated per minute for problems with unseen thrust magnitudes by one to two orders of magnitude in comparison to samples from a uniform distribution and from an adjoint control transformation.

Efficient Estimation of Relaxed Model Parameters for Robust UAV Trajectory Optimization 2025-01-13
Show

Online trajectory optimization and optimal control methods are crucial for enabling sustainable unmanned aerial vehicle (UAV) services, such as agriculture, environmental monitoring, and transportation, where available actuation and energy are limited. However, optimal controllers are highly sensitive to model mismatch, which can occur due to loaded equipment, packages to be delivered, or pre-existing variability in fundamental structural and thrust-related parameters. To circumvent this problem, optimal controllers can be paired with parameter estimators to improve their trajectory planning performance and perform adaptive control. However, UAV platforms are limited in terms of onboard processing power, oftentimes making nonlinear parameter estimation too computationally expensive to consider. To address these issues, we propose a relaxed, affine-in-parameters multirotor model along with an efficient optimal parameter estimator. We convexify the nominal Moving Horizon Parameter Estimation (MHPE) problem into a linear-quadratic form (LQ-MHPE) via an affine-in-parameter relaxation on the nonlinear dynamics, resulting in fast quadratic programs (QPs) that facilitate adaptive Model Predictve Control (MPC) in real time. We compare this approach to the equivalent nonlinear estimator in Monte Carlo simulations, demonstrating a decrease in average solve time and trajectory optimality cost by 98.2% and 23.9-56.2%, respectively.

8 pag...

8 pages, 5 figures, to be published in IEEE Sustech 2025

Snapshot: Towards Application-centered Models for Pedestrian Trajectory Prediction in Urban Traffic Environments 2025-01-09
Show

This paper explores pedestrian trajectory prediction in urban traffic while focusing on both model accuracy and real-world applicability. While promising approaches exist, they often revolve around pedestrian datasets excluding traffic-related information, or resemble architectures that are either not real-time capable or robust. To address these limitations, we first introduce a dedicated benchmark based on Argoverse 2, specifically targeting pedestrians in traffic environments. Following this, we present Snapshot, a modular, feed-forward neural network that outperforms the current state of the art, reducing the Average Displacement Error (ADE) by 8.8% while utilizing significantly less information. Despite its agent-centric encoding scheme, Snapshot demonstrates scalability, real-time performance, and robustness to varying motion histories. Moreover, by integrating Snapshot into a modular autonomous driving software stack, we showcase its real-world applicability.

8 Pages, 9 Figures
Pitch Plane Trajectory Tracking Control for Sounding Rockets via Adaptive Feedback Linearization 2025-01-09
Show

This paper proposes a pitch plane trajectory tacking control solution for suborbital launch vehicles relying on adaptive feedback linearization. Initially, the 2D dynamics and kinematics for a single-engine, thrust-vector-controlled sounding rocket are obtained for control design purposes. Then, an inner-outer control strategy, which simultaneously tackles attitude and position control, is adopted, with the inner-loop comprising the altitude and pitch control and the outer-loop addressing the horizontal (downrange) position control. Feedback linearization is used to cancel out the non-linearities in both the inner and outer dynamics. Making use of Lyapunov stability theory, an adaptation law, which provides online estimates on the inner-loop aerodynamic uncertainty, is jointly designed with the output tracking controller via adaptive backstepping, ensuring global reference tracking in the region where the feedback linearization is well-defined. The zero dynamics of the inner-stabilized system are then exploited to obtain the outerloop dynamics and derive a Linear Quadratic Regulator (LQR) with integral action, which can stabilize them as well as reject external disturbances. In the outermost loop, the estimate on the correspondent aerodynamic uncertainty is indirectly obtained by using the inner loop estimates together with known aerodynamics relations. The resulting inner-outer position control solution is proven to be asymptotically stable in the region of interest. Using a single-stage sounding rocket, propelled by a liquid engine, as reference vehicle, different mission scenarios are tested in a simulation environment to verify the adaptability of the proposed control strategy. The system is able to track the requested trajectories while rejecting external wind disturbances. Furthermore, the need to re-tune the control gains in between different mission scenarios is minimal to none.

Paper...

Paper accepted to the IEEE Aerospace Conference 2025. Copyright: 979-8-3503-5597-0/25/$31.00 @2025 IEEE

Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting 2025-01-08
Show

Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions. It is often due to limitations like complex architectures customized for a specific dataset and inefficient multimodal handling. We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and fine-grained details. Additionally, our approach of reconstructing segmentlevel trajectories and lane segments from masked inputs with query drop, enables effective use of contextual information and improves generalization; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation. PerReg+ sets a new state-of-the-art performance on nuScenes [1], Argoverse 2 [2], and Waymo Open Motion Dataset (WOMD) [3]. Remarkable, our pretrained model reduces the error by 6.8% on smaller datasets, and multi-dataset training enhances generalization. In cross-domain tests, PerReg+ reduces B-FDE by 11.8% compared to its non-pretrained variant.

Development of an Adaptive Sliding Mode Controller using Neural Networks for Trajectory Tracking of a Cylindrical Manipulator 2025-01-08
Show

Cylindrical manipulators are extensively used in industrial automation, especially in emerging technologies like 3D printing, which represents a significant future trend. However, controlling the trajectory of nonlinear models with system uncertainties remains a critical challenge, often leading to reduced accuracy and reliability. To address this, the study develops an Adaptive Sliding Mode Controller (ASMC) integrated with Neural Networks (NNs) to improve trajectory tracking for cylindrical manipulators. The ASMC leverages the robustness of sliding mode control and the adaptability of neural networks to handle uncertainties and dynamic variations effectively. Simulation results validate that the proposed ASMC-NN achieves high trajectory tracking accuracy, fast response time, and enhanced reliability, making it a promising solution for applications in 3D printing and beyond.

Task Coordination and Trajectory Optimization for Multi-Aerial Systems via Signal Temporal Logic: A Wind Turbine Inspection Study 2025-01-08
Show

This paper presents a method for task allocation and trajectory generation in cooperative inspection missions using a fleet of multirotor drones, with a focus on wind turbine inspection. The approach generates safe, feasible flight paths that adhere to time-sensitive constraints and vehicle limitations by formulating an optimization problem based on Signal Temporal Logic (STL) specifications. An event-triggered replanning mechanism addresses unexpected events and delays, while a generalized robustness scoring method incorporates user preferences and minimizes task conflicts. The approach is validated through simulations in MATLAB and Gazebo, as well as field experiments in a mock-up scenario.

2 pag...

2 pages, Accepted for discussion at the workshop session "Formal methods techniques in robotics systems: Design and control" at IROS'24 in Abu Dhabi, UAE

Motion Manifold Flow Primitives for Task-Conditioned Trajectory Generation under Complex Task-Motion Dependencies 2025-01-08
Show

Effective movement primitives should be capable of encoding and generating a rich repertoire of trajectories -- typically collected from human demonstrations -- conditioned on task-defining parameters such as vision or language inputs. While recent methods based on the motion manifold hypothesis, which assumes that a set of trajectories lies on a lower-dimensional nonlinear subspace, address challenges such as limited dataset size and the high dimensionality of trajectory data, they often struggle to capture complex task-motion dependencies, i.e., when motion distributions shift drastically with task variations. To address this, we introduce Motion Manifold Flow Primitives (MMFP), a framework that decouples the training of the motion manifold from task-conditioned distributions. Specifically, we employ flow matching models, state-of-the-art conditional deep generative models, to learn task-conditioned distributions in the latent coordinate space of the learned motion manifold. Experiments are conducted on language-guided trajectory generation tasks, where many-to-many text-motion correspondences introduce complex task-motion dependencies, highlighting MMFP's superiority over existing methods.

8 pages, 11 figures
Future Success Prediction in Open-Vocabulary Object Manipulation Tasks Based on End-Effector Trajectories 2025-01-08
Show

This study addresses a task designed to predict the future success or failure of open-vocabulary object manipulation. In this task, the model is required to make predictions based on natural language instructions, egocentric view images before manipulation, and the given end-effector trajectories. Conventional methods typically perform success prediction only after the manipulation is executed, limiting their efficiency in executing the entire task sequence. We propose a novel approach that enables the prediction of success or failure by aligning the given trajectories and images with natural language instructions. We introduce Trajectory Encoder to apply learnable weighting to the input trajectories, allowing the model to consider temporal dynamics and interactions between objects and the end effector, improving the model's ability to predict manipulation outcomes accurately. We constructed a dataset based on the RT-1 dataset, a large-scale benchmark for open-vocabulary object manipulation tasks, to evaluate our method. The experimental results show that our method achieved a higher prediction accuracy than baseline approaches.

Accep...

Accepted for presentation at LangRob @ CoRL 2024

Frenet-Serret-Based Trajectory Prediction 2025-01-08
Show

Trajectory prediction is a crucial element of guidance, navigation, and control systems. This paper presents two novel trajectory-prediction methods based on real-time position measurements and adaptive input and state estimation (AISE). The first method, called AISE/va, uses position measurements to estimate the target velocity and acceleration. The second method, called AISE/FS, models the target trajectory as a 3D curve using the Frenet-Serret formulas, which require estimates of velocity, acceleration, and jerk. To estimate velocity, acceleration, and jerk in real time, AISE computes first, second, and third derivatives of the position measurements. AISE does not rely on assumptions about the target maneuver, measurement noise, or disturbances. For trajectory prediction, both methods use measurements of the target position and estimates of its derivatives to extrapolate from the current position. The performance of AISE/va and AISE/FS is compared numerically with the $\alpha$-$\beta$-$\gamma$ filter, which shows that AISE/FS provides more accurate trajectory prediction than AISE/va and traditional methods, especially for complex target maneuvers.

8 pag...

8 pages, 6 figures. Submitted to ACC 2025

ImageFlowNet: Forecasting Multiscale Image-Level Trajectories of Disease Progression with Irregularly-Sampled Longitudinal Medical Images 2025-01-07
Show

Advances in medical imaging technologies have enabled the collection of longitudinal images, which involve repeated scanning of the same patients over time, to monitor disease progression. However, predictive modeling of such data remains challenging due to high dimensionality, irregular sampling, and data sparsity. To address these issues, we propose ImageFlowNet, a novel model designed to forecast disease trajectories from initial images while preserving spatial details. ImageFlowNet first learns multiscale joint representation spaces across patients and time points, then optimizes deterministic or stochastic flow fields within these spaces using a position-parameterized neural ODE/SDE framework. The model leverages a UNet architecture to create robust multiscale representations and mitigates data scarcity by combining knowledge from all patients. We provide theoretical insights that support our formulation of ODEs, and motivate our regularizations involving high-level visual features, latent space organization, and trajectory smoothness. We validate ImageFlowNet on three longitudinal medical image datasets depicting progression in geographic atrophy, multiple sclerosis, and glioblastoma, demonstrating its ability to effectively forecast disease progression and outperform existing methods. Our contributions include the development of ImageFlowNet, its theoretical underpinnings, and empirical validation on real-world datasets. The official implementation is available at https://github.com/KrishnaswamyLab/ImageFlowNet.

Accep...

Accepted to ICASSP 2025

Hierarchical Light Transformer Ensembles for Multimodal Trajectory Forecasting 2025-01-07
Show

Accurate trajectory forecasting is crucial for the performance of various systems, such as advanced driver-assistance systems and self-driving vehicles. These forecasts allow us to anticipate events that lead to collisions and, therefore, to mitigate them. Deep Neural Networks have excelled in motion forecasting, but overconfidence and weak uncertainty quantification persist. Deep Ensembles address these concerns, yet applying them to multimodal distributions remains challenging. In this paper, we propose a novel approach named Hierarchical Light Transformer Ensembles (HLT-Ens) aimed at efficiently training an ensemble of Transformer architectures using a novel hierarchical loss function. HLT-Ens leverages grouped fully connected layers, inspired by grouped convolution techniques, to capture multimodal distributions effectively. We demonstrate that HLT-Ens achieves state-of-the-art performance levels through extensive experimentation, offering a promising avenue for improving trajectory forecasting techniques.

WACV 2025
Hybrid Machine Learning Model with a Constrained Action Space for Trajectory Prediction 2025-01-07
Show

Trajectory prediction is crucial to advance autonomous driving, improving safety, and efficiency. Although end-to-end models based on deep learning have great potential, they often do not consider vehicle dynamic limitations, leading to unrealistic predictions. To address this problem, this work introduces a novel hybrid model that combines deep learning with a kinematic motion model. It is able to predict object attributes such as acceleration and yaw rate and generate trajectories based on them. A key contribution is the incorporation of expert knowledge into the learning objective of the deep learning model. This results in the constraint of the available action space, thus enabling the prediction of physically feasible object attributes and trajectories, thereby increasing safety and robustness. The proposed hybrid model facilitates enhanced interpretability, thereby reinforcing the trustworthiness of deep learning methods and promoting the development of safe planning solutions. Experiments conducted on the publicly available real-world Argoverse dataset demonstrate realistic driving behaviour, with benchmark comparisons and ablation studies showing promising results.

Submi...

Submitted to 2025 IEEE Intelligent Vehicles Symposium (IV)

Collision Risk Quantification and Conflict Resolution in Trajectory Tracking for Acceleration-Actuated Multi-Robot Systems 2025-01-07
Show

One of the pivotal challenges in a multi-robot system is how to give attention to accuracy and efficiency while ensuring safety. Prior arts cannot strictly guarantee collision-free for an arbitrarily large number of robots or the results are considerably conservative. Smoothness of the avoidance trajectory also needs to be further optimized. This paper proposes an accelerationactuated simultaneous obstacle avoidance and trajectory tracking method for arbitrarily large teams of robots, that provides a nonconservative collision avoidance strategy and gives approaches for deadlock avoidance. We propose two ways of deadlock resolution, one involves incorporating an auxiliary velocity vector into the error function of the trajectory tracking module, which is proven to have no influence on global convergence of the tracking error. Furthermore, unlike the traditional methods that they address conflicts after a deadlock occurs, our decision-making mechanism avoids the near-zero velocity, which is much more safer and efficient in crowed environments. Extensive comparison show that the proposed method is superior to the existing studies when deployed in a large-scale robot system, with minimal invasiveness.

Modeling Cell Type Developmental Trajectory using Multinomial Unbalanced Optimal Transport 2025-01-07
Show

Single-cell trajectory analysis aims to reconstruct the biological developmental processes of cells as they evolve over time, leveraging temporal correlations in gene expression. During cellular development, gene expression patterns typically change and vary across different cell types. A significant challenge in this analysis is that RNA sequencing destroys the cell, making it impossible to track gene expression across multiple stages for the same cell. Recent advances have introduced the use of optimal transport tools to model the trajectory of individual cells. In this paper, our focus shifts to a question of greater practical importance: we examine the differentiation of cell types over time. Specifically, we propose a novel method based on discrete unbalanced optimal transport to model the developmental trajectory of cell types. Our method detects biological changes in cell types and infers their transitions to different states by analyzing the transport matrix. We validated our method using single-cell RNA sequencing data from mouse embryonic fibroblasts. The results accurately identified major developmental changes in cell types, which were corroborated by experimental evidence. Furthermore, the inferred transition probabilities between cell types are highly congruent to biological ground truth.

Physics-Constrained Generative Artificial Intelligence for Rapid Takeoff Trajectory Design 2025-01-07
Show

To aid urban air mobility (UAM), electric vertical takeoff and landing (eVTOL) aircraft are being targeted. Conventional multidisciplinary analysis and optimization (MDAO) can be expensive, while surrogate-based optimization can struggle with challenging physical constraints. This work proposes physics-constrained generative adversarial networks (physicsGAN), to intelligently parameterize the takeoff control profiles of an eVTOL aircraft and to transform the original design space to a feasible space. Specifically, the transformed feasible space refers to a space where all designs directly satisfy all design constraints. The physicsGAN-enabled surrogate-based takeoff trajectory design framework was demonstrated on the Airbus A3 Vahana. The physicsGAN generated only feasible control profiles of power and wing angle in the feasible space with around 98.9% of designs satisfying all constraints. The proposed design framework obtained 99.6% accuracy compared with simulation-based optimal design and took only 2.2 seconds, which reduced the computational time by around 200 times. Meanwhile, data-driven GAN-enabled surrogate-based optimization took 21.9 seconds using a derivative-free optimizer, which was around an order of magnitude slower than the proposed framework. Moreover, the data-driven GAN-based optimization using gradient-based optimizers could not consistently find the optimal design during random trials and got stuck in an infeasible region, which is problematic in real practice. Therefore, the proposed physicsGAN-based design framework outperformed data-driven GAN-based design to the extent of efficiency (2.2 seconds), optimality (99.6% accurate), and feasibility (100% feasible). According to the literature review, this is the first physics-constrained generative artificial intelligence enabled by surrogate models.

Confe...

Conference version with 10 pages and 7 figures

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation 2025-01-06
Show

We consider the task of Image-to-Video (I2V) generation, which involves transforming static images into realistic video sequences based on a textual description. While recent advancements produce photorealistic outputs, they frequently struggle to create videos with accurate and consistent object motion, especially in multi-object scenarios. To address these limitations, we propose a two-stage compositional framework that decomposes I2V generation into: (i) An explicit intermediate representation generation stage, followed by (ii) A video generation stage that is conditioned on this representation. Our key innovation is the introduction of a mask-based motion trajectory as an intermediate representation, that captures both semantic object information and motion, enabling an expressive but compact representation of motion and semantics. To incorporate the learned representation in the second stage, we utilize object-level attention objectives. Specifically, we consider a spatial, per-object, masked-cross attention objective, integrating object-specific prompts into corresponding latent space regions and a masked spatio-temporal self-attention objective, ensuring frame-to-frame consistency for each object. We evaluate our method on challenging benchmarks with multi-object and high-motion scenarios and empirically demonstrate that the proposed method achieves state-of-the-art results in temporal coherence, motion realism, and text-prompt faithfulness. Additionally, we introduce \benchmark, a new challenging benchmark for single-object and multi-object I2V generation, and demonstrate our method's superiority on this benchmark. Project page is available at https://guyyariv.github.io/TTM/.

Generating Counterfactual Trajectories with Latent Diffusion Models for Concept Discovery 2025-01-06
Show

Trustworthiness is a major prerequisite for the safe application of opaque deep learning models in high-stakes domains like medicine. Understanding the decision-making process not only contributes to fostering trust but might also reveal previously unknown decision criteria of complex models that could advance the state of medical research. The discovery of decision-relevant concepts from black box models is a particularly challenging task. This study proposes Concept Discovery through Latent Diffusion-based Counterfactual Trajectories (CDCT), a novel three-step framework for concept discovery leveraging the superior image synthesis capabilities of diffusion models. In the first step, CDCT uses a Latent Diffusion Model (LDM) to generate a counterfactual trajectory dataset. This dataset is used to derive a disentangled representation of classification-relevant concepts using a Variational Autoencoder (VAE). Finally, a search algorithm is applied to identify relevant concepts in the disentangled latent space. The application of CDCT to a classifier trained on the largest public skin lesion dataset revealed not only the presence of several biases but also meaningful biomarkers. Moreover, the counterfactuals generated within CDCT show better FID scores than those produced by a previously established state-of-the-art method, while being 12 times more resource-efficient. Unsupervised concept discovery holds great potential for the application of trustworthy AI and the further development of human knowledge in various domains. CDCT represents a further step in this direction.

Publi...

Published at International Conference on Pattern Recognition (ICPR) 2024

Holistic Semantic Representation for Navigational Trajectory Generation 2025-01-06
Show

Trajectory generation has garnered significant attention from researchers in the field of spatio-temporal analysis, as it can generate substantial synthesized human mobility trajectories that enhance user privacy and alleviate data scarcity. However, existing trajectory generation methods often focus on improving trajectory generation quality from a singular perspective, lacking a comprehensive semantic understanding across various scales. Consequently, we are inspired to develop a HOlistic SEmantic Representation (HOSER) framework for navigational trajectory generation. Given an origin-and-destination (OD) pair and the starting time point of a latent trajectory, we first propose a Road Network Encoder to expand the receptive field of road- and zone-level semantics. Second, we design a Multi-Granularity Trajectory Encoder to integrate the spatio-temporal semantics of the generated trajectory at both the point and trajectory levels. Finally, we employ a Destination-Oriented Navigator to seamlessly integrate destination-oriented guidance. Extensive experiments on three real-world datasets demonstrate that HOSER outperforms state-of-the-art baselines by a significant margin. Moreover, the model's performance in few-shot learning and zero-shot learning scenarios further verifies the effectiveness of our holistic semantic representation.

Accep...

Accepted by AAAI 2025

CUQDS: Conformal Uncertainty Quantification under Distribution Shift for Trajectory Prediction 2025-01-03
Show

Trajectory prediction models that can infer both finite future trajectories and their associated uncertainties of the target vehicles in an online setting (e.g., real-world application scenarios) is crucial for ensuring the safe and robust navigation and path planning of autonomous vehicle motion. However, the majority of existing trajectory prediction models have neither considered reducing the uncertainty as one objective during the training stage nor provided reliable uncertainty quantification during inference stage under potential distribution shift. Therefore, in this paper, we propose the Conformal Uncertainty Quantification under Distribution Shift framework, CUQDS, to quantify the uncertainty of the predicted trajectories of existing trajectory prediction models under potential data distribution shift, while considering improving the prediction accuracy of the models and reducing the estimated uncertainty during the training stage. Specifically, CUQDS includes 1) a learning-based Gaussian process regression module that models the output distribution of the base model (any existing trajectory prediction or time series forecasting neural networks) and reduces the estimated uncertainty by additional loss term, and 2) a statistical-based Conformal P control module to calibrate the estimated uncertainty from the Gaussian process regression module in an online setting under potential distribution shift between training and testing data.

9 pages, 2 figures
Architecture for Trajectory-Based Fishing Ship Classification with AIS Data 2025-01-03
Show

This paper proposes a data preparation process for managing real-world kinematic data and detecting fishing vessels. The solution is a binary classification that classifies ship trajectories into either fishing or non-fishing ships. The data used are characterized by the typical problems found in classic data mining applications using real-world data, such as noise and inconsistencies. The two classes are also clearly unbalanced in the data, a problem which is addressed using algorithms that resample the instances. For classification, a series of features are extracted from spatiotemporal data that represent the trajectories of the ships, available from sequences of Automatic Identification System (AIS) reports. These features are proposed for the modelling of ship behavior but, because they do not contain context-related information, the classification can be applied in other scenarios. Experimentation shows that the proposed data preparation process is useful for the presented classification problem. In addition, positive results are obtained using minimal information.

Sensors 2020
Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory 2025-01-03
Show

To improve the generalization of the autonomous driving (AD) perception model, vehicles need to update the model over time based on the continuously collected data. As time progresses, the amount of data fitted by the AD model expands, which helps to improve the AD model generalization substantially. However, such ever-expanding data is a double-edged sword for the AD model. Specifically, as the fitted data volume grows to exceed the the AD model's fitting capacities, the AD model is prone to under-fitting. To address this issue, we propose to use a pretrained Large Vision Models (LVMs) as backbone coupled with downstream perception head to understand AD semantic information. This design can not only surmount the aforementioned under-fitting problem due to LVMs' powerful fitting capabilities, but also enhance the perception generalization thanks to LVMs' vast and diverse training data. On the other hand, to mitigate vehicles' computational burden of training the perception head while running LVM backbone, we introduce a Posterior Optimization Trajectory (POT)-Guided optimization scheme (POTGui) to accelerate the convergence. Concretely, we propose a POT Generator (POTGen) to generate posterior (future) optimization direction in advance to guide the current optimization iteration, through which the model can generally converge within 10 epochs. Extensive experiments demonstrate that the proposed method improves the performance by over 66.48% and converges faster over 6 times, compared to the existing state-of-the-art approach.

7 pages
Trajectory Representation Learning on Road Networks and Grids with Spatio-Temporal Dynamics 2025-01-02
Show

Trajectory representation learning is a fundamental task for applications in fields including smart city, and urban planning, as it facilitates the utilization of trajectory data (e.g., vehicle movements) for various downstream applications, such as trajectory similarity computation or travel time estimation. This is achieved by learning low-dimensional representations from high-dimensional and raw trajectory data. However, existing methods for trajectory representation learning either rely on grid-based or road-based representations, which are inherently different and thus, could lose information contained in the other modality. Moreover, these methods overlook the dynamic nature of urban traffic, relying on static road network features rather than time varying traffic patterns. In this paper, we propose TIGR, a novel model designed to integrate grid and road network modalities while incorporating spatio-temporal dynamics to learn rich, general-purpose representations of trajectories. We evaluate TIGR on two realworld datasets and demonstrate the effectiveness of combining both modalities by substantially outperforming state-of-the-art methods, i.e., up to 43.22% for trajectory similarity, up to 16.65% for travel time estimation, and up to 10.16% for destination prediction.

Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks 2025-01-02
Show

Recent advancements in Large Language Models (LLMs) have led to significant breakthroughs in various natural language processing tasks. However, generating factually consistent responses in knowledge-intensive scenarios remains a challenge due to issues such as hallucination, difficulty in acquiring long-tailed knowledge, and limited memory expansion. This paper introduces SMART, a novel multi-agent framework that leverages external knowledge to enhance the interpretability and factual consistency of LLM-generated responses. SMART comprises four specialized agents, each performing a specific sub-trajectory action to navigate complex knowledge-intensive tasks. We propose a multi-agent co-training paradigm, Long-Short Trajectory Learning, which ensures synergistic collaboration among agents while maintaining fine-grained execution by each agent. Extensive experiments on five knowledge-intensive tasks demonstrate SMART's superior performance compared to widely adopted knowledge internalization and knowledge enhancement methods. Our framework can extend beyond knowledge-intensive tasks to more complex scenarios. Our code is available at https://github.com/yueshengbin/SMART.

Accepted by AAAI2025
Diffusion Policies for Generative Modeling of Spacecraft Trajectories 2025-01-01
Show

Machine learning has demonstrated remarkable promise for solving the trajectory generation problem and in paving the way for online use of trajectory optimization for resource-constrained spacecraft. However, a key shortcoming in current machine learning-based methods for trajectory generation is that they require large datasets and even small changes to the original trajectory design requirements necessitate retraining new models to learn the parameter-to-solution mapping. In this work, we leverage compositional diffusion modeling to efficiently adapt out-of-distribution data and problem variations in a few-shot framework for 6 degree-of-freedom (DoF) powered descent trajectory generation. Unlike traditional deep learning methods that can only learn the underlying structure of one specific trajectory optimization problem, diffusion models are a powerful generative modeling framework that represents the solution as a probability density function (PDF) and this allows for the composition of PDFs encompassing a variety of trajectory design specifications and constraints. We demonstrate the capability of compositional diffusion models for inference-time 6 DoF minimum-fuel landing site selection and composable constraint representations. Using these samples as initial guesses for 6 DoF powered descent guidance enables dynamically feasible and computationally efficient trajectory generation.

AIAA ...

AIAA SCITECH 2025 Forum

Spatial Temporal Attention based Target Vehicle Trajectory Prediction for Internet of Vehicles 2025-01-01
Show

Forecasting vehicle behavior within complex traffic environments is pivotal within Intelligent Transportation Systems (ITS). Though this technology plays a significant role in alleviating the prevalent operational difficulties in logistics and transportation systems, the precise prediction of vehicle trajectories still poses a substantial challenge. To address this, our study introduces the Spatio Temporal Attention-based methodology for Target Vehicle Trajectory Prediction (STATVTPred). This approach integrates Global Positioning System(GPS) localization technology to track target movement and dynamically predict the vehicle's future path using comprehensive spatio-temporal trajectory data. We map the vehicle trajectory onto a directed graph, after which spatial attributes are extracted via a Graph Attention Networks(GATs). The Transformer technology is employed to yield temporal features from the sequence. These elements are then amalgamated with local road network structure maps to filter and deliver a smooth trajectory sequence, resulting in precise vehicle trajectory prediction.This study validates our proposed STATVTPred method on T-Drive and Chengdu taxi-trajectory datasets. The experimental results demonstrate that STATVTPred achieves 6.38% and 10.55% higher Average Match Rate (AMR) than the Transformer model on the Beijing and Chengdu datasets, respectively. Compared to the LSTM Encoder-Decoder model, STATVTPred boosts AMR by 37.45% and 36.06% on the same datasets. This is expected to establish STATVTPred as a new approach for handling trajectory prediction of targets in logistics and transportation scenarios, thereby enhancing prediction accuracy.

Trajectories of Change: Approaches for Tracking Knowledge Evolution 2024-12-31
Show

We explore local vs. global evolution of knowledge systems through the framework of socio-epistemic networks (SEN), applying two complementary methods to a corpus of scientific texts. The framework comprises three interconnected layers-social, semiotic (material), and semantic-proposing a multilayered approach to understanding structural developments of knowledge. To analyse diachronic changes on the semantic layer, we first use information-theoretic measures based on relative entropy to detect semantic shifts, assess their significance, and identify key driving features. Second, variations in document embedding densities reveal changes in semantic neighbourhoods, tracking how concentration of similar documents increase, remain stable, or disperse. This enables us to trace document trajectories based on content (topics) or metadata (authorship, institution). Case studies of Joseph Silk and Hans-J"urgen Treder illustrate how individual scholar's work aligns with broader disciplinary shifts in general relativity and gravitation research, demonstrating the applications, limitations, and further potential of this approach.

TrajLearn: Trajectory Prediction Learning using Deep Generative Models 2024-12-30
Show

Trajectory prediction aims to estimate an entity's future path using its current position and historical movement data, benefiting fields like autonomous navigation, robotics, and human movement analytics. Deep learning approaches have become key in this area, utilizing large-scale trajectory datasets to model movement patterns, but face challenges in managing complex spatial dependencies and adapting to dynamic environments. To address these challenges, we introduce TrajLearn, a novel model for trajectory prediction that leverages generative modeling of higher-order mobility flows based on hexagonal spatial representation. TrajLearn predicts the next $k$ steps by integrating a customized beam search for exploring multiple potential paths while maintaining spatial continuity. We conducted a rigorous evaluation of TrajLearn, benchmarking it against leading state-of-the-art approaches and meaningful baselines. The results indicate that TrajLearn achieves significant performance gains, with improvements of up to ~40% across multiple real-world trajectory datasets. In addition, we evaluated different prediction horizons (i.e., various values of $k$), conducted resolution sensitivity analysis, and performed ablation studies to assess the impact of key model components. Furthermore, we developed a novel algorithm to generate mixed-resolution maps by hierarchically subdividing hexagonal regions into finer segments within a specified observation area. This approach supports selective detailing, applying finer resolution to areas of interest or high activity (e.g., urban centers) while using coarser resolution for less significant regions (e.g., rural areas), effectively reducing data storage requirements and computational overhead. We promote reproducibility and adaptability by offering complete code, data, and detailed documentation with flexible configuration options for various applications.

STITCHER: Real-Time Trajectory Planning with Motion Primitive Search 2024-12-30
Show

Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy state or actuator constraints. Most modern trajectory planning techniques rely on numerical optimization because high-quality, expressive trajectories that satisfy various constraints can be systematically computed. However, meeting computation time constraints and the potential for numerical instabilities can limit the use of optimization-based planners in safety-critical scenarios. This work presents an optimization-free planning framework that stitches short trajectory segments together with graph search to compute long range, expressive, and near-optimal trajectories in real-time. Our STITCHER algorithm is shown to outperform modern optimization-based planners through our innovative planning architecture and several algorithmic developments that make real-time planning possible. Extensive simulation testing is conducted to analyze the algorithmic components that make up STITCHER, and a thorough comparison with two state-of-the-art optimization planners is performed. It is shown STITCHER can generate trajectories through complex environments over long distances (tens of meters) with low computation times (milliseconds).

V1 Draft
DEMO: A Dynamics-Enhanced Learning Model for Multi-Horizon Trajectory Prediction in Autonomous Vehicles 2024-12-30
Show

Autonomous vehicles (AVs) rely on accurate trajectory prediction of surrounding vehicles to ensure the safety of both passengers and other road users. Trajectory prediction spans both short-term and long-term horizons, each requiring distinct considerations: short-term predictions rely on accurately capturing the vehicle's dynamics, while long-term predictions rely on accurately modeling the interaction patterns within the environment. However current approaches, either physics-based or learning-based models, always ignore these distinct considerations, making them struggle to find the optimal prediction for both short-term and long-term horizon. In this paper, we introduce the Dynamics-Enhanced Learning MOdel (DEMO), a novel approach that combines a physics-based Vehicle Dynamics Model with advanced deep learning algorithms. DEMO employs a two-stage architecture, featuring a Dynamics Learning Stage and an Interaction Learning Stage, where the former stage focuses on capturing vehicle motion dynamics and the latter focuses on modeling interaction. By capitalizing on the respective strengths of both methods, DEMO facilitates multi-horizon predictions for future trajectories. Experimental results on the Next Generation Simulation (NGSIM), Macau Connected Autonomous Driving (MoCAD), Highway Drone (HighD), and nuScenes datasets demonstrate that DEMO outperforms state-of-the-art (SOTA) baselines in both short-term and long-term prediction horizons.

Accep...

Accepted by Information Fusion

ESI-GAL: EEG Source Imaging-based Trajectory Estimation for Grasp and Lift Task 2024-12-30
Show

Electroencephalogram (EEG) signals-based motor kinematics prediction (MKP) has been an active area of research to develop brain-computer interface (BCI) systems such as exosuits, prostheses, and rehabilitation devices. However, EEG source imaging (ESI) based kinematics prediction is sparsely explored in the literature. In this study, pre-movement EEG features are utilized to predict three-dimensional (3D) hand kinematics for the grasp-and-lift motor task. A public dataset, WAY-EEG-GAL, is utilized for MKP analysis. In particular, sensor-domain (EEG data) and source-domain (ESI data) based features from the frontoparietal region are explored for MKP. Deep learning-based models are explored to achieve efficient kinematics decoding. Various time-lagged and window sizes are analyzed for hand kinematics prediction. Subsequently, intra-subject and inter-subject MKP analysis is performed to investigate the subject-specific and subject-independent motor-learning capabilities of the neural decoders. The Pearson correlation coefficient (PCC) is used as the performance metric for kinematics trajectory decoding. The rEEGNet neural decoder achieved the best performance with sensor-domain and source-domain features with a time lag and window size of 100 ms and 450 ms, respectively. The highest mean PCC values of 0.790, 0.795, and 0.637 are achieved using sensor-domain features, while 0.769, 0.777, and 0.647 are achieved using source-domain features in x, y, and z-directions, respectively. This study explores the feasibility of trajectory prediction using EEG sensor-domain and source-domain EEG features for the grasp-and-lift task. Furthermore, inter-subject trajectory estimation is performed using the proposed deep learning decoder with EEG source domain features.

Learning Optimal Control and Dynamical Structure of Global Trajectory Search Problems with Diffusion Models 2024-12-29
Show

Spacecraft trajectory design is a global search problem, where previous work has revealed specific solution structures that can be captured with data-driven methods. This paper explores two global search problems in the circular restricted three-body problem: hybrid cost function of minimum fuel/time-of-flight and transfers to energy-dependent invariant manifolds. These problems display a fundamental structure either in the optimal control profile or the use of dynamical structures. We build on our prior generative machine learning framework to apply diffusion models to learn the conditional probability distribution of the search problem and analyze the model's capability to capture these structures.

This ...

This paper was presented at the AAS/AIAA Astrodynamics Specialist Conference

Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems 2024-12-29
Show

Tracking controllers enable robotic systems to accurately follow planned reference trajectories. In particular, reinforcement learning (RL) has shown promise in the synthesis of controllers for systems with complex dynamics and modest online compute budgets. However, the poor sample efficiency of RL and the challenges of reward design make training slow and sometimes unstable, especially for high-dimensional systems. In this work, we leverage the inherent Lie group symmetries of robotic systems with a floating base to mitigate these challenges when learning tracking controllers. We model a general tracking problem as a Markov decision process (MDP) that captures the evolution of both the physical and reference states. Next, we prove that symmetry in the underlying dynamics and running costs leads to an MDP homomorphism, a mapping that allows a policy trained on a lower-dimensional "quotient" MDP to be lifted to an optimal tracking controller for the original system. We compare this symmetry-informed approach to an unstructured baseline, using Proximal Policy Optimization (PPO) to learn tracking controllers for three systems: the Particle (a forced point mass), the Astrobee (a fully-actuated space robot), and the Quadrotor (an underactuated system). Results show that a symmetry-aware approach both accelerates training and reduces tracking error after the same number of training steps.

The f...

The first three authors contributed equally to this work. This version resolves PDF compatibility issues in some browsers

Global Search of Optimal Spacecraft Trajectories using Amortization and Deep Generative Models 2024-12-28
Show

Preliminary spacecraft trajectory optimization is a parameter dependent global search problem that aims to provide a set of solutions that are of high quality and diverse. In the case of numerical solution, it is dependent on the original optimal control problem, the choice of a control transcription, and the behavior of a gradient based numerical solver. In this paper we formulate the parameterized global search problem as the task of sampling a conditional probability distribution with support on the neighborhoods of local basins of attraction to the high quality solutions. The conditional distribution is learned and represented using deep generative models that allow for prediction of how the local basins change as parameters vary. The approach is benchmarked on a low thrust spacecraft trajectory optimization problem in the circular restricted three-body problem, showing significant speed-up over a simple multi-start method and vanilla machine learning approaches. The paper also provides an in-depth analysis of the multi-modal funnel structure of a low-thrust spacecraft trajectory optimization problem.

47 pa...

47 pages, 23 figures, initial content of this paper appears in Paper 23-352 at the AAS/AIAA Astrodynamics Specialist Conference, Big Sky, MT, August 13-17 2023

UAV-Enabled Secure ISAC Against Dual Eavesdropping Threats: Joint Beamforming and Trajectory Design 2024-12-27
Show

In this work, we study an unmanned aerial vehicle (UAV)-enabled secure integrated sensing and communication (ISAC) system, where a UAV serves as an aerial base station (BS) to simultaneously perform communication with a user and detect a target on the ground, while a dual-functional eavesdropper attempts to intercept the signals for both sensing and communication. Facing the dual eavesdropping threats, we aim to enhance the average achievable secrecy rate for the communication user by jointly designing the UAV trajectory together with the transmit information and sensing beamforming, while satisfying the requirements on sensing performance and sensing security, as well as the UAV power and flight constraints. To address the non-convex nature of the optimization problem, we employ the alternating optimization (AO) strategy, jointly with the successive convex approximation (SCA) and semidefinite relaxation (SDR) methods. Numerical results validate the proposed approach, demonstrating its ability to achieve a high secrecy rate while meeting the required sensing and security constraints.

7 pag...

7 pages, 6 figures, submitted for possible publication

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis 2024-12-27
Show

Graphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality trajectory data for training. Common practices for collecting such data rely on human supervision or synthetic data generation through executing pre-defined tasks, which are either resource-intensive or unable to guarantee data quality. Moreover, these methods suffer from limited data diversity and significant gaps between synthetic data and real-world environments. To address these challenges, we propose OS-Genesis, a novel GUI data synthesis pipeline that reverses the conventional trajectory collection process. Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions, then retrospectively derive high-quality tasks to enable trajectory-level exploration. A trajectory reward model is then employed to ensure the quality of the generated trajectories. We demonstrate that training GUI agents with OS-Genesis significantly improves their performance on highly challenging online benchmarks. In-depth analysis further validates OS-Genesis's efficiency and its superior data quality and diversity compared to existing synthesis methods. Our codes, data, and checkpoints are available at \href{https://qiushisun.github.io/OS-Genesis-Home/}{OS-Genesis Homepage}.

Work in progress
A data driven approach to classify descriptors based on their efficiency in translating noisy trajectories into physically-relevant information 2024-12-27
Show

Reconstructing the physical complexity of many-body dynamical systems can be challenging. Starting from the trajectories of their constitutive units (raw data), typical approaches require selecting appropriate descriptors to convert them into time-series, which are then analyzed to extract interpretable information. However, identifying the most effective descriptor is often non-trivial. Here, we report a data-driven approach to compare the efficiency of various descriptors in extracting information from noisy trajectories and translating it into physically relevant insights. As a prototypical system with non-trivial internal complexity, we analyze molecular dynamics trajectories of an atomistic system where ice and water coexist in equilibrium near the solid/liquid transition temperature. We compare general and specific descriptors often used in aqueous systems: number of neighbors, molecular velocities, Smooth Overlap of Atomic Positions (SOAP), Local Environments and Neighbors Shuffling (LENS), Orientational Tetrahedral Order, and distance from the fifth neighbor ($d_5$). Using Onion Clustering -- an efficient unsupervised method for single-point time-series analysis -- we assess the maximum extractable information for each descriptor and rank them via a high-dimensional metric. Our results show that advanced descriptors like SOAP and LENS outperform classical ones due to higher signal-to-noise ratios. Nonetheless, even simple descriptors can rival or exceed advanced ones after local signal denoising. For example, $d_5$, initially among the weakest, becomes the most effective at resolving the system's non-local dynamical complexity after denoising. This work highlights the critical role of noise in information extraction from molecular trajectories and offers a data-driven approach to identify optimal descriptors for systems with characteristic internal complexity.

19 pa...

19 pages, 5 figures + 3 in supporting information (at the bottom of the manuscript)

TrajGEOS: Trajectory Graph Enhanced Orientation-based Sequential Network for Mobility Prediction 2024-12-26
Show

Human mobility studies how people move to access their needed resources and plays a significant role in urban planning and location-based services. As a paramount task of human mobility modeling, next location prediction is challenging because of the diversity of users' historical trajectories that gives rise to complex mobility patterns and various contexts. Deep sequential models have been widely used to predict the next location by leveraging the inherent sequentiality of trajectory data. However, they do not fully leverage the relationship between locations and fail to capture users' multi-level preferences. This work constructs a trajectory graph from users' historical traces and proposes a \textbf{Traj}ectory \textbf{G}raph \textbf{E}nhanced \textbf{O}rientation-based \textbf{S}equential network (TrajGEOS) for next-location prediction tasks. TrajGEOS introduces hierarchical graph convolution to capture location and user embeddings. Such embeddings consider not only the contextual feature of locations but also the relation between them, and serve as additional features in downstream modules. In addition, we design an orientation-based module to learn users' mid-term preferences from sequential modeling modules and their recent trajectories. Extensive experiments on three real-world LBSN datasets corroborate the value of graph and orientation-based modules and demonstrate that TrajGEOS outperforms the state-of-the-art methods on the next location prediction task.

CAPER: Enhancing Career Trajectory Prediction using Temporal Knowledge Graph and Ternary Relationship 2024-12-26
Show

The problem of career trajectory prediction (CTP) aims to predict one's future employer or job position. While several CTP methods have been developed for this problem, we posit that none of these methods (1) jointly considers the mutual ternary dependency between three key units (i.e., user, position, and company) of a career and (2) captures the characteristic shifts of key units in career over time, leading to an inaccurate understanding of the job movement patterns in the labor market. To address the above challenges, we propose a novel solution, named as CAPER, that solves the challenges via sophisticated temporal knowledge graph (TKG) modeling. It enables the utilization of a graph-structured knowledge base with rich expressiveness, effectively preserving the changes in job movement patterns. Furthermore, we devise an extrapolated career reasoning task on TKG for a realistic evaluation. The experiments on a real-world career trajectory dataset demonstrate that CAPER consistently and significantly outperforms four baselines, two recent TKG reasoning methods, and five state-of-the-art CTP methods in predicting one's future companies and positions--i.e., on average, yielding 6.80% and 34.58% more accurate predictions, respectively. The codebase of CAPER is available at https://github.com/Bigdasgit/CAPER.

Accep...

Accepted by ACM KDD 2025

Single Trajectory Distillation for Accelerating Image and Video Style Transfer 2024-12-25
Show

Diffusion-based stylization methods typically denoise from a specific partial noise state for image-to-image and video-to-video tasks. This multi-step diffusion process is computationally expensive and hinders real-world application. A promising solution to speed up the process is to obtain few-step consistency models through trajectory distillation. However, current consistency models only force the initial-step alignment between the probability flow ODE (PF-ODE) trajectories of the student and the imperfect teacher models. This training strategy can not ensure the consistency of whole trajectories. To address this issue, we propose single trajectory distillation (STD) starting from a specific partial noise state. We introduce a trajectory bank to store the teacher model's trajectory states, mitigating the time cost during training. Besides, we use an asymmetric adversarial loss to enhance the style and quality of the generated images. Extensive experiments on image and video stylization demonstrate that our method surpasses existing acceleration models in terms of style similarity and aesthetic evaluations. Our code and results will be available on the project page: https://single-trajectory-distillation.github.io.

CausalTAD: Causal Implicit Generative Model for Debiased Online Trajectory Anomaly Detection 2024-12-25
Show

Trajectory anomaly detection, aiming to estimate the anomaly risk of trajectories given the Source-Destination (SD) pairs, has become a critical problem for many real-world applications. Existing solutions directly train a generative model for observed trajectories and calculate the conditional generative probability $P({T}

{C})$ as the anomaly risk, where ${T}$ and ${C}$ represent the trajectory and SD pair respectively. However, we argue that the observed trajectories are confounded by road network preference which is a common cause of both SD distribution and trajectories. Existing methods ignore this issue limiting their generalization ability on out-of-distribution trajectories. In this paper, we define the debiased trajectory anomaly detection problem and propose a causal implicit generative model, namely CausalTAD, to solve it. CausalTAD adopts do-calculus to eliminate the confounding bias of road network preference and estimates $P({T}
SCKF-LSTM Based Trajectory Tracking for Electricity-Gas Integrated Energy System 2024-12-24
Show

This paper introduces a novel approach for tracking the dynamic trajectories of integrated natural gas and power systems, leveraging a Kalman filter-based structure. To predict the states of the system, the Holt's exponential smoothing techniques and nonlinear dynamic equations of gas pipelines are applied to establish the power and gas system equations, respectively. The square-root cubature Kalman filter algorithm is utilized to address the numerical challenges posed by the strongly nonlinear system equations. The boundary conditions in the gas system include the flow balances at sink nodes, and the mass flow rates of loads have to be predicted at each computation step. For the prediction of load mass flows, the long short-term memory network is employed, known for its effectiveness in time series prediction. Consequently, a combined method based on the square-root cubature Kalman filter and the long short-term memory network is proposed for tracking integrated gas and power systems. To evaluate the tracking performances of the proposed method, the IEEE-39 bus power system and GasLib-40 node gas system are used to form the testing system. Simulation results demonstrate high precision in tracking the dynamic states of power and gas systems. Two indexes are introduced for a numerical analysis of the tracking results, indicating that the accuracy of this method surpasses that of traditional measurements.

Accep...

Accepted by IEEE Transactions on Industrial Informatics

Quantum framework for Reinforcement Learning: integrating Markov Decision Process, quantum arithmetic, and trajectory search 2024-12-24
Show

This paper introduces a quantum framework for addressing reinforcement learning (RL) tasks, grounded in the quantum principles and leveraging a fully quantum model of the classical Markov Decision Process (MDP). By employing quantum concepts and a quantum search algorithm, this work presents the implementation and optimization of the agent-environment interactions entirely within the quantum domain, eliminating reliance on classical computations. Key contributions include the quantum-based state transitions, return calculation, and trajectory search mechanism that utilize quantum principles to demonstrate the realization of RL processes through quantum phenomena. The implementation emphasizes the fundamental role of quantum superposition in enhancing computational efficiency for RL tasks. Experimental results demonstrate the capacity of a quantum model to achieve quantum advantage in RL, highlighting the potential of fully quantum implementations in decision-making tasks. This work not only underscores the applicability of quantum computing in machine learning but also contributes the field of quantum reinforcement learning (QRL) by offering a robust framework for understanding and exploiting quantum computing in RL systems.

Graph Neural Networks

Title Date Abstract Comment
Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classification 2025-01-29
Show

Graph Neural Networks have become the preferred tool to process graph data, with their efficacy being boosted through graph data augmentation techniques. Despite the evolution of augmentation methods, issues like graph property distortions and restricted structural changes persist. This leads to the question: Is it possible to develop more property-conserving and structure-sensitive augmentation methods? Through a spectral lens, we investigate the interplay between graph properties, their augmentation, and their spectral behavior, and observe that keeping the low-frequency eigenvalues unchanged can preserve the critical properties at a large scale when generating augmented graphs. These observations inform our introduction of the Dual-Prism (DP) augmentation methods, including DP-Noise and DP-Mask, which retain essential graph properties while diversifying augmented graphs. Extensive experiments validate the efficiency of our approach, providing a new and promising direction for graph data augmentation.

Accepted by AAAI'25
SynthFormer: Equivariant Pharmacophore-based Generation of Synthesizable Molecules for Ligand-Based Drug Design 2025-01-29
Show

Drug discovery is a complex, resource-intensive process requiring significant time and cost to bring new medicines to patients. Many generative models aim to accelerate drug discovery, but few produce synthetically accessible molecules. Conversely, synthesis-focused models do not leverage the 3D information crucial for effective drug design. We introduce SynthFormer, a novel machine learning model that generates fully synthesizable molecules, structured as synthetic trees, by introducing both 3D information and pharmacophores as input. SynthFormer features a 3D equivariant graph neural network to encode pharmacophores, followed by a Transformer-based synthesis-aware decoding mechanism for constructing synthetic trees as a sequence of tokens. It is a first-of-its-kind approach that could provide capabilities for designing active molecules based on pharmacophores, exploring the local synthesizable chemical space around hit molecules and optimizing their properties. We demonstrate its effectiveness through various challenging tasks, including designing active compounds for a range of proteins, performing hit expansion and optimizing molecular properties.

LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging 2025-01-29
Show

Transformers have set new benchmarks in audio processing tasks, leveraging self-attention mechanisms to capture complex patterns and dependencies within audio data. However, their focus on pairwise interactions limits their ability to process the higher-order relations essential for identifying distinct audio objects. To address this limitation, this work introduces the Local- Higher Order Graph Neural Network (LHGNN), a graph based model that enhances feature understanding by integrating local neighbourhood information with higher-order data from Fuzzy C-Means clusters, thereby capturing a broader spectrum of audio relationships. Evaluation of the model on three publicly available audio datasets shows that it outperforms Transformer-based models across all benchmarks while operating with substantially fewer parameters. Moreover, LHGNN demonstrates a distinct advantage in scenarios lacking ImageNet pretraining, establishing its effectiveness and efficiency in environments where extensive pretraining data is unavailable.

RegionGCN: Spatial-Heterogeneity-Aware Graph Convolutional Networks 2025-01-29
Show

Modeling spatial heterogeneity in the data generation process is essential for understanding and predicting geographical phenomena. Despite their prevalence in geospatial tasks, neural network models usually assume spatial stationarity, which could limit their performance in the presence of spatial process heterogeneity. By allowing model parameters to vary over space, several approaches have been proposed to incorporate spatial heterogeneity into neural networks. However, current geographically weighting approaches are ineffective on graph neural networks, yielding no significant improvement in prediction accuracy. We assume the crux lies in the over-fitting risk brought by a large number of local parameters. Accordingly, we propose to model spatial process heterogeneity at the regional level rather than at the individual level, which largely reduces the number of spatially varying parameters. We further develop a heuristic optimization procedure to learn the region partition adaptively in the process of model training. Our proposed spatial-heterogeneity-aware graph convolutional network, named RegionGCN, is applied to the spatial prediction of county-level vote share in the 2016 US presidential election based on socioeconomic attributes. Results show that RegionGCN achieves significant improvement over the basic and geographically weighted GCNs. We also offer an exploratory analysis tool for the spatial variation of non-linear relationships through ensemble learning of regional partitions from RegionGCN. Our work contributes to the practice of Geospatial Artificial Intelligence (GeoAI) in tackling spatial heterogeneity.

28 pages, 6 figures
Query-Aware Learnable Graph Pooling Tokens as Prompt for Large Language Models 2025-01-29
Show

Graph-structured data plays a vital role in numerous domains, such as social networks, citation networks, commonsense reasoning graphs and knowledge graphs. While graph neural networks have been employed for graph processing, recent advancements have explored integrating large language models for graph-based tasks. In this paper, we propose a novel approach named Learnable Graph Pooling Token (LGPT), which addresses the limitations of the scalability issues in node-level projection and information loss in graph-level projection. LGPT enables flexible and efficient graph representation by introducing learnable parameters that act as tokens in large language models, balancing fine-grained and global graph information. Additionally, we investigate an Early Query Fusion technique, which fuses query context before constructing the graph representation, leading to more effective graph embeddings. Our method achieves a 4.13% performance improvement on the GraphQA benchmark without training the large language model, demonstrating significant gains in handling complex textual-attributed graph data.

Channel Estimation for XL-MIMO Systems with Decentralized Baseband Processing: Integrating Local Reconstruction with Global Refinement 2025-01-29
Show

In this paper, we investigate the channel estimation problem for extremely large-scale multiple-input multiple-output (XL-MIMO) systems with a hybrid analog-digital architecture, implemented within a decentralized baseband processing (DBP) framework with a star topology. Existing centralized and fully decentralized channel estimation methods face limitations due to excessive computational complexity or degraded performance. To overcome these challenges, we propose a novel two-stage channel estimation scheme that integrates local sparse reconstruction with global fusion and refinement. Specifically, in the first stage, by exploiting the sparsity of channels in the angular-delay domain, the local reconstruction task is formulated as a sparse signal recovery problem. To solve it, we develop a graph neural networks-enhanced sparse Bayesian learning (SBL-GNNs) algorithm, which effectively captures dependencies among channel coefficients, significantly improving estimation accuracy. In the second stage, the local estimates from the local processing units (LPUs) are aligned into a global angular domain for fusion at the central processing unit (CPU). Based on the aggregated observations, the channel refinement is modeled as a Bayesian denoising problem. To efficiently solve it, we devise a variational message passing algorithm that incorporates a Markov chain-based hierarchical sparse prior, effectively leveraging both the sparsity and the correlations of the channels in the global angular-delay domain. Simulation results validate the effectiveness and superiority of the proposed SBL-GNNs algorithm over existing methods, demonstrating improved estimation performance and reduced computational complexity.

This ...

This manuscript has been submitted to IEEE journal for possible publication

Mesh-Informed Reduced Order Models for Aneurysm Rupture Risk Prediction 2025-01-29
Show

The complexity of the cardiovascular system needs to be accurately reproduced in order to promptly acknowledge health conditions; to this aim, advanced multifidelity and multiphysics numerical models are crucial. On one side, Full Order Models (FOMs) deliver accurate hemodynamic assessments, but their high computational demands hinder their real-time clinical application. In contrast, ROMs provide more efficient yet accurate solutions, essential for personalized healthcare and timely clinical decision-making. In this work, we explore the application of computational fluid dynamics (CFD) in cardiovascular medicine by integrating FOMs with ROMs for predicting the risk of aortic aneurysm growth and rupture. Wall Shear Stress (WSS) and the Oscillatory Shear Index (OSI), sampled at different growth stages of the thoracic aortic aneurysm, are predicted by means of Graph Neural Networks (GNNs). GNNs exploit the natural graph structure of the mesh obtained by the Finite Volume (FV) discretization, taking into account the spatial local information, regardless of the dimension of the input graph. Our experimental validation framework yields promising results, confirming our method as a valid alternative that overcomes the curse of dimensionality.

Graph-Sequential Alignment and Uniformity: Toward Enhanced Recommendation Systems 2025-01-29
Show

Graph-based and sequential methods are two popular recommendation paradigms, each excelling in its domain but lacking the ability to leverage signals from the other. To address this, we propose a novel method that integrates both approaches for enhanced performance. Our framework uses Graph Neural Network (GNN)-based and sequential recommenders as separate submodules while sharing a unified embedding space optimized jointly. To enable positive knowledge transfer, we design a loss function that enforces alignment and uniformity both within and across submodules. Experiments on three real-world datasets demonstrate that the proposed method significantly outperforms using either approach alone and achieves state-of-the-art results. Our implementations are publicly available at https://github.com/YuweiCao-UIC/GSAU.git.

Accep...

Accepted to The Web Conference 2025

Reqo: A Robust and Explainable Query Optimization Cost Model 2025-01-29
Show

In recent years, there has been a growing interest in using machine learning (ML) in query optimization to select more efficient plans. Existing learning-based query optimizers use certain model architectures to convert tree-structured query plans into representations suitable for downstream ML tasks. As the design of these architectures significantly impacts cost estimation, we propose a tree model architecture based on Bidirectional Graph Neural Networks (Bi-GNN) aggregated by Gated Recurrent Units (GRUs) to achieve more accurate cost estimates. The inherent uncertainty of data and model parameters also leads to inaccurate cost estimates, resulting in suboptimal plans and less robust query performance. To address this, we implement a novel learning-to-rank cost model that effectively quantifies the uncertainty in cost estimates using approximate probabilistic ML. This model adaptively integrates quantified uncertainty with estimated costs and learns from comparing pairwise plans, achieving more robust performance. In addition, we propose the first explainability technique specifically designed for learning-based cost models. This technique explains the contribution of any subgraphs in the query plan to the final predicted cost, which can be integrated and trained with any learning-based cost model to significantly boost the model's explainability. By incorporating these innovations, we propose a cost model for a Robust and Explainable Query Optimizer, Reqo, that improves the accuracy, robustness, and explainability of cost estimation, outperforming state-of-the-art approaches in all three dimensions.

A Geometric Perspective for High-Dimensional Multiplex Graphs 2025-01-29
Show

High-dimensional multiplex graphs are characterized by their high number of complementary and divergent dimensions. The existence of multiple hierarchical latent relations between the graph dimensions poses significant challenges to embedding methods. In particular, the geometric distortions that might occur in the representational space have been overlooked in the literature. This work studies the problem of high-dimensional multiplex graph embedding from a geometric perspective. We find that the node representations reside on highly curved manifolds, thus rendering their exploitation more challenging for downstream tasks. Moreover, our study reveals that increasing the number of graph dimensions can cause further distortions to the highly curved manifolds. To address this problem, we propose a novel multiplex graph embedding method that harnesses hierarchical dimension embedding and Hyperbolic Graph Neural Networks. The proposed approach hierarchically extracts hyperbolic node representations that reside on Riemannian manifolds while gradually learning fewer and more expressive latent dimensions of the multiplex graph. Experimental results on real-world high-dimensional multiplex graphs show that the synergy between hierarchical and hyperbolic embeddings incurs much fewer geometric distortions and brings notable improvements over state-of-the-art approaches on downstream tasks.

Publi...

Published in Proceedings of the ACM Conference on Information and Knowledge Management (CIKM) 2024, DOI: 10.1145/3627673.3679541

Compositional Models for Estimating Causal Effects 2025-01-28
Show

Many real-world systems can be represented as sets of interacting components. Examples of such systems include computational systems such as query processors, natural systems such as cells, and social systems such as families. Many approaches have been proposed in traditional (associational) machine learning to model such structured systems, including statistical relational models and graph neural networks. Despite this prior work, existing approaches to estimating causal effects typically treat such systems as single units, represent them with a fixed set of variables and assume a homogeneous data-generating process. We study a compositional approach for estimating individual treatment effects (ITE) in structured systems, where each unit is represented by the composition of multiple heterogeneous components. This approach uses a modular architecture to model potential outcomes at each component and aggregates component-level potential outcomes to obtain the unit-level potential outcomes. We discover novel benefits of the compositional approach in causal inference - systematic generalization to estimate counterfactual outcomes of unseen combinations of components and improved overlap guarantees between treatment and control groups compared to the classical methods for causal effect estimation. We also introduce a set of novel environments for empirically evaluating the compositional approach and demonstrate the effectiveness of our approach using both simulated and real-world data.

Conditional Distribution Learning on Graphs 2025-01-28
Show

Leveraging the diversity and quantity of data provided by various graph-structured data augmentations while preserving intrinsic semantic information is challenging. Additionally, successive layers in graph neural network (GNN) tend to produce more similar node embeddings, while graph contrastive learning aims to increase the dissimilarity between negative pairs of node embeddings. This inevitably results in a conflict between the message-passing mechanism (MPM) of GNNs and the contrastive learning (CL) of negative pairs via intraviews. In this paper, we propose a conditional distribution learning (CDL) method that learns graph representations from graph-structured data for semisupervised graph classification. Specifically, we present an end-to-end graph representation learning model to align the conditional distributions of weakly and strongly augmented features over the original features. This alignment enables the CDL model to effectively preserve intrinsic semantic information when both weak and strong augmentations are applied to graph-structured data. To avoid the conflict between the MPM and the CL of negative pairs, positive pairs of node representations are retained for measuring the similarity between the original features and the corresponding weakly augmented features. Extensive experiments with several benchmark graph datasets demonstrate the effectiveness of the proposed CDL method.

9 pages
Few Edges Are Enough: Few-Shot Network Attack Detection with Graph Neural Networks 2025-01-28
Show

Detecting cyberattacks using Graph Neural Networks (GNNs) has seen promising results recently. Most of the state-of-the-art models that leverage these techniques require labeled examples, hard to obtain in many real-world scenarios. To address this issue, unsupervised learning and Self-Supervised Learning (SSL) have emerged as interesting approaches to reduce the dependency on labeled data. Nonetheless, these methods tend to yield more anomalous detection algorithms rather than effective attack detection systems. This paper introduces Few Edges Are Enough (FEAE), a GNN-based architecture trained with SSL and Few-Shot Learning (FSL) to better distinguish between false positive anomalies and actual attacks. To maximize the potential of few-shot examples, our model employs a hybrid self-supervised objective that combines the advantages of contrastive-based and reconstruction-based SSL. By leveraging only a minimal number of labeled attack events, represented as attack edges, FEAE achieves competitive performance on two well-known network datasets compared to both supervised and unsupervised methods. Remarkably, our experimental results unveil that employing only 1 malicious event for each attack type in the dataset is sufficient to achieve substantial improvements. FEAE not only outperforms self-supervised GNN baselines but also surpasses some supervised approaches on one of the datasets.

This ...

This is the version of the author, accepted for publication at IWSEC 2024. Published version available at https://link.springer.com/chapter/10.1007/978-981-97-7737-2_15

Exact Computation of Any-Order Shapley Interactions for Graph Neural Networks 2025-01-28
Show

Albeit the ubiquitous use of Graph Neural Networks (GNNs) in machine learning (ML) prediction tasks involving graph-structured data, their interpretability remains challenging. In explainable artificial intelligence (XAI), the Shapley Value (SV) is the predominant method to quantify contributions of individual features to a ML model's output. Addressing the limitations of SVs in complex prediction models, Shapley Interactions (SIs) extend the SV to groups of features. In this work, we explain single graph predictions of GNNs with SIs that quantify node contributions and interactions among multiple nodes. By exploiting the GNN architecture, we show that the structure of interactions in node embeddings are preserved for graph prediction. As a result, the exponential complexity of SIs depends only on the receptive fields, i.e. the message-passing ranges determined by the connectivity of the graph and the number of convolutional layers. Based on our theoretical results, we introduce GraphSHAP-IQ, an efficient approach to compute any-order SIs exactly. GraphSHAP-IQ is applicable to popular message passing techniques in conjunction with a linear global pooling and output layer. We showcase that GraphSHAP-IQ substantially reduces the exponential complexity of computing exact SIs on multiple benchmark datasets. Beyond exact computation, we evaluate GraphSHAP-IQ's approximation of SIs on popular GNN architectures and compare with existing baselines. Lastly, we visualize SIs of real-world water distribution networks and molecule structures using a SI-Graph.

Prepr...

Preprint Version. Accepted at ICLR 2025

Dynamic Hypergraph Representation for Bone Metastasis Cancer Analysis 2025-01-28
Show

Bone metastasis analysis is a significant challenge in pathology and plays a critical role in determining patient quality of life and treatment strategies. The microenvironment and specific tissue structures are essential for pathologists to predict the primary bone cancer origins and primary bone cancer subtyping. By digitizing bone tissue sections into whole slide images (WSIs) and leveraging deep learning to model slide embeddings, this analysis can be enhanced. However, tumor metastasis involves complex multivariate interactions with diverse bone tissue structures, which traditional WSI analysis methods such as multiple instance learning (MIL) fail to capture. Moreover, graph neural networks (GNNs), limited to modeling pairwise relationships, are hard to represent high-order biological associations. To address these challenges, we propose a dynamic hypergraph neural network (DyHG) that overcomes the edge construction limitations of traditional graph representations by connecting multiple nodes via hyperedges. A low-rank strategy is used to reduce the complexity of parameters in learning hypergraph structures, while a Gumbel-Softmax-based sampling strategy optimizes the patch distribution across hyperedges. An MIL aggregator is then used to derive a graph-level embedding for comprehensive WSI analysis. To evaluate the effectiveness of DyHG, we construct two large-scale datasets for primary bone cancer origins and subtyping classification based on real-world bone metastasis scenarios. Extensive experiments demonstrate that DyHG significantly outperforms state-of-the-art (SOTA) baselines, showcasing its ability to model complex biological interactions and improve the accuracy of bone metastasis analysis.

12 pages,11 figures
Hypergraph Diffusion for High-Order Recommender Systems 2025-01-28
Show

Recommender systems rely on Collaborative Filtering (CF) to predict user preferences by leveraging patterns in historical user-item interactions. While traditional CF methods primarily focus on learning compact vector embeddings for users and items, graph neural network (GNN)-based approaches have emerged as a powerful alternative, utilizing the structure of user-item interaction graphs to enhance recommendation accuracy. However, existing GNN-based models, such as LightGCN and UltraGCN, often struggle with two major limitations: an inability to fully account for heterophilic interactions, where users engage with diverse item categories, and the over-smoothing problem in multi-layer GNNs, which hinders their ability to model complex, high-order relationships. To address these gaps, we introduce WaveHDNN, an innovative wavelet-enhanced hypergraph diffusion framework. WaveHDNN integrates a Heterophily-aware Collaborative Encoder, designed to capture user-item interactions across diverse categories, with a Multi-scale Group-wise Structure Encoder, which leverages wavelet transforms to effectively model localized graph structures. Additionally, cross-view contrastive learning is employed to maintain robust and consistent representations. Experiments on benchmark datasets validate the efficacy of WaveHDNN, demonstrating its superior ability to capture both heterophilic and localized structural information, leading to improved recommendation performance.

Technical Report
Data Mining in Transportation Networks with Graph Neural Networks: A Review and Outlook 2025-01-28
Show

Data mining in transportation networks (DMTNs) refers to using diverse types of spatio-temporal data for various transportation tasks, including pattern analysis, traffic prediction, and traffic controls. Graph neural networks (GNNs) are essential in many DMTN problems due to their capability to represent spatial correlations between entities. Between 2016 and 2024, the notable applications of GNNs in DMTNs have extended to multiple fields such as traffic prediction and operation. However, existing reviews have primarily focused on traffic prediction tasks. To fill this gap, this study provides a timely and insightful summary of GNNs in DMTNs, highlighting new progress in prediction and operation from academic and industry perspectives since 2023. First, we present and analyze various DMTN problems, followed by classical and recent GNN models. Second, we delve into key works in three areas: (1) traffic prediction, (2) traffic operation, and (3) industry involvement, such as Google Maps, Amap, and Baidu Maps. Along these directions, we discuss new research opportunities based on the significance of transportation problems and data availability. Finally, we compile resources such as data, code, and other learning materials to foster interdisciplinary communication. This review, driven by recent trends in GNNs in DMTN studies since 2023, could democratize abundant datasets and efficient GNN methods for various transportation problems including prediction and operation.

41 pages, 6 figures
Applying Ensemble Models based on Graph Neural Network and Reinforcement Learning for Wind Power Forecasting 2025-01-28
Show

Accurately predicting the wind power output of a wind farm across various time scales utilizing Wind Power Forecasting (WPF) is a critical issue in wind power trading and utilization. The WPF problem remains unresolved due to numerous influencing variables, such as wind speed, temperature, latitude, and longitude. Furthermore, achieving high prediction accuracy is crucial for maintaining electric grid stability and ensuring supply security. In this paper, we model all wind turbines within a wind farm as graph nodes in a graph built by their geographical locations. Accordingly, we propose an ensemble model based on graph neural networks and reinforcement learning (EMGRL) for WPF. Our approach includes: (1) applying graph neural networks to capture the time-series data from neighboring wind farms relevant to the target wind farm; (2) establishing a general state embedding that integrates the target wind farm's data with the historical performance of base models on the target wind farm; (3) ensembling and leveraging the advantages of all base models through an actor-critic reinforcement learning framework for WPF.

GCBF+: A Neural Graph Control Barrier Function Framework for Distributed Safe Multi-Agent Control 2025-01-27
Show

Distributed, scalable, and safe control of large-scale multi-agent systems is a challenging problem. In this paper, we design a distributed framework for safe multi-agent control in large-scale environments with obstacles, where a large number of agents are required to maintain safety using only local information and reach their goal locations. We introduce a new class of certificates, termed graph control barrier function (GCBF), which are based on the well-established control barrier function theory for safety guarantees and utilize a graph structure for scalable and generalizable distributed control of MAS. We develop a novel theoretical framework to prove the safety of an arbitrary-sized MAS with a single GCBF. We propose a new training framework GCBF+ that uses graph neural networks to parameterize a candidate GCBF and a distributed control policy. The proposed framework is distributed and is capable of taking point clouds from LiDAR, instead of actual state information, for real-world robotic applications. We illustrate the efficacy of the proposed method through various hardware experiments on a swarm of drones with objectives ranging from exchanging positions to docking on a moving target without collision. Additionally, we perform extensive numerical experiments, where the number and density of agents, as well as the number of obstacles, increase. Empirical results show that in complex environments with agents with nonlinear dynamics (e.g., Crazyflie drones), GCBF+ outperforms the hand-crafted CBF-based method with the best performance by up to 20% for relatively small-scale MAS with up to 256 agents, and leading reinforcement learning (RL) methods by up to 40% for MAS with 1024 agents. Furthermore, the proposed method does not compromise on the performance, in terms of goal reaching, for achieving high safety rates, which is a common trade-off in RL-based methods.

20 pa...

20 pages, 15 figures; Accepted by IEEE Transactions on Robotics (T-RO)

Graph Neural Network Based Hybrid Beamforming Design in Wideband Terahertz MIMO-OFDM Systems 2025-01-27
Show

6G wireless technology is projected to adopt higher and wider frequency bands, enabled by highly directional beamforming. However, the vast bandwidths available also make the impact of beam squint in massive multiple input and multiple output (MIMO) systems non-negligible. Traditional approaches such as adding a true-time-delay line (TTD) on each antenna are costly due to the massive antenna arrays required. This paper puts forth a signal processing alternative, specifically adapted to the multicarrier structure of OFDM systems, through an innovative application of Graph Neural Networks (GNNs) to optimize hybrid beamforming. By integrating two types of graph nodes to represent the analog and the digital beamforming matrices efficiently, our approach not only reduces the computational and memory burdens but also achieves high spectral efficiency performance, approaching that of all digital beamforming. The GNN runtime and memory requirement are at a fraction of the processing time and resource consumption of traditional signal processing methods, hence enabling real-time adaptation of hybrid beamforming. Furthermore, the proposed GNN exhibits strong resiliency to beam squinting, achieving almost constant spectral efficiency even as the system bandwidth increases at higher carrier frequencies.

6 pag...

6 pages, 7 figures. This conference paper was published in the 2024 IEEE International Symposium on Phased Array Systems and Technology

From Molecules to Mixtures: Learning Representations of Olfactory Mixture Similarity using Inductive Biases 2025-01-27
Show

Olfaction -- how molecules are perceived as odors to humans -- remains poorly understood. Recently, the principal odor map (POM) was introduced to digitize the olfactory properties of single compounds. However, smells in real life are not pure single molecules, but complex mixtures of molecules, whose representations remain relatively under-explored. In this work, we introduce POMMix, an extension of the POM to represent mixtures. Our representation builds upon the symmetries of the problem space in a hierarchical manner: (1) graph neural networks for building molecular embeddings, (2) attention mechanisms for aggregating molecular representations into mixture representations, and (3) cosine prediction heads to encode olfactory perceptual distance in the mixture embedding space. POMMix achieves state-of-the-art predictive performance across multiple datasets. We also evaluate the generalizability of the representation on multiple splits when applied to unseen molecules and mixture sizes. Our work advances the effort to digitize olfaction, and highlights the synergy of domain expertise and deep learning in crafting expressive representations in low-data regimes.

25 pages, 12 figures
ReFill: Reinforcement Learning for Fill-In Minimization 2025-01-27
Show

Efficiently solving sparse linear systems $Ax=b$, where $A$ is a large, sparse, symmetric positive semi-definite matrix, is a core challenge in scientific computing, machine learning, and optimization. A major bottleneck in Gaussian elimination for these systems is fill-in, the creation of non-zero entries that increase memory and computational cost. Minimizing fill-in is NP-hard, and existing heuristics like Minimum Degree and Nested Dissection offer limited adaptability across diverse problem instances. We introduce \textit{ReFill}, a reinforcement learning framework enhanced by Graph Neural Networks (GNNs) to learn adaptive ordering strategies for fill-in minimization. ReFill trains a GNN-based heuristic to predict efficient elimination orders, outperforming traditional heuristics by dynamically adapting to the structure of input matrices. Experiments demonstrate that ReFill outperforms strong heuristics in reducing fill-in, highlighting the untapped potential of learning-based methods for this well-studied classical problem.

ScaDyG:A New Paradigm for Large-scale Dynamic Graph Learning 2025-01-27
Show

Dynamic graphs (DGs), which capture time-evolving relationships between graph entities, have widespread real-world applications. To efficiently encode DGs for downstream tasks, most dynamic graph neural networks follow the traditional message-passing mechanism and extend it with time-based techniques. Despite their effectiveness, the growth of historical interactions introduces significant scalability issues, particularly in industry scenarios. To address this limitation, we propose ScaDyG, with the core idea of designing a time-aware scalable learning paradigm as follows: 1) Time-aware Topology Reformulation: ScaDyG first segments historical interactions into time steps (intra and inter) based on dynamic modeling, enabling weight-free and time-aware graph propagation within pre-processing. 2) Dynamic Temporal Encoding: To further achieve fine-grained graph propagation within time steps, ScaDyG integrates temporal encoding through a combination of exponential functions in a scalable manner. 3) Hypernetwork-driven Message Aggregation: After obtaining the propagated features (i.e., messages), ScaDyG utilizes hypernetwork to analyze historical dependencies, implementing node-wise representation by an adaptive temporal fusion. Extensive experiments on 12 datasets demonstrate that ScaDyG performs comparably well or even outperforms other SOTA methods in both node and link-level downstream tasks, with fewer learnable parameters and higher efficiency.

Multi-View Attention Syntactic Enhanced Graph Convolutional Network for Aspect-based Sentiment Analysis 2025-01-27
Show

Aspect-based Sentiment Analysis (ABSA) is the task aimed at predicting the sentiment polarity of aspect words within sentences. Recently, incorporating graph neural networks (GNNs) to capture additional syntactic structure information in the dependency tree derived from syntactic dependency parsing has been proven to be an effective paradigm for boosting ABSA. Despite GNNs enhancing model capability by fusing more types of information, most works only utilize a single topology view of the dependency tree or simply conflate different perspectives of information without distinction, which limits the model performance. To address these challenges, in this paper, we propose a new multi-view attention syntactic enhanced graph convolutional network (MASGCN) that weighs different syntactic information of views using attention mechanisms. Specifically, we first construct distance mask matrices from the dependency tree to obtain multiple subgraph views for GNNs. To aggregate features from different views, we propose a multi-view attention mechanism to calculate the attention weights of views. Furthermore, to incorporate more syntactic information, we fuse the dependency type information matrix into the adjacency matrices and present a structural entropy loss to learn the dependency type adjacency matrix. Comprehensive experiments on four benchmark datasets demonstrate that our model outperforms state-of-the-art methods. The codes and datasets are available at https://github.com/SELGroup/MASGCN.

This ...

This paper is accepted by DASFAA 2025

Graph Condensation: A Survey 2025-01-27
Show

The rapid growth of graph data poses significant challenges in storage, transmission, and particularly the training of graph neural networks (GNNs). To address these challenges, graph condensation (GC) has emerged as an innovative solution. GC focuses on synthesizing a compact yet highly representative graph, enabling GNNs trained on it to achieve performance comparable to those trained on the original large graph. The notable efficacy of GC and its broad prospects have garnered significant attention and spurred extensive research. This survey paper provides an up-to-date and systematic overview of GC, organizing existing research into five categories aligned with critical GC evaluation criteria: effectiveness, generalization, efficiency, fairness, and robustness. To facilitate an in-depth and comprehensive understanding of GC, this paper examines various methods under each category and thoroughly discusses two essential components within GC: optimization strategies and condensed graph generation. We also empirically compare and analyze representative GC methods with diverse optimization strategies based on the five proposed GC evaluation criteria. Finally, we explore the applications of GC in various fields, outline the related open-source libraries, and highlight the present challenges and novel insights, with the aim of promoting advancements in future research. The related resources can be found at https://github.com/XYGaoG/Graph-Condensation-Papers.

Trans...

Transactions on Knowledge and Data Engineering (TKDE) 2025

An Efficient Sparse Kernel Generator for O(3)-Equivariant Deep Networks 2025-01-27
Show

Rotation equivariant graph neural networks, i.e., networks designed to guarantee certain geometric relations between their inputs and outputs, yield state-of-the-art performance on spatial deep learning tasks. They exhibit high data efficiency during training and significantly reduced inference time for interatomic potential calculations compared to classical approaches. Key to these models is the Clebsch-Gordon (CG) tensor product, a kernel that contracts two dense feature vectors with a highly structured sparse tensor to produce a dense output vector. The operation, which may be repeated millions of times for typical equivariant models, is a costly and inefficient bottleneck. We introduce a GPU sparse kernel generator for the CG tensor product that provides significant speedup over the best existing open and closed-source implementations. Our implementation achieves high performance by carefully managing GPU shared memory through static analysis at model compile-time, minimizing reads and writes to global memory. We break the tensor product into a series of kernels with operands that fit entirely into registers, enabling us to emit long arithmetic instruction streams that maximize instruction-level parallelism. By fusing the CG tensor product with a subsequent graph convolution, we reduce both intermediate storage and global memory traffic over naive approaches that duplicate input data. We also provide optimized kernels for the gradient of the CG tensor product and a novel identity for the higher partial derivatives required to predict interatomic forces. Our fused kernels offer up to 4.5x speedup for the forward pass and 3x for the backward pass over NVIDIA cuEquivariance, as well as >10x speedup over the widely-used e3nn package. We offer up to 5.3x inference-time speedup for the MACE chemistry foundation model over the original unoptimized version.

12 pa...

12 pages, 9 figures, 3 tables

Adaptive AI-based Decentralized Resource Management in the Cloud-Edge Continuum 2025-01-27
Show

The increasing complexity of application requirements and the dynamic nature of the Cloud-Edge Continuum present significant challenges for efficient resource management. These challenges stem from the ever-changing infrastructure, which is characterized by additions, removals, and reconfigurations of nodes and links, as well as the variability of application workloads. Traditional centralized approaches struggle to adapt to these changes due to their static nature, while decentralized solutions face challenges such as limited global visibility and coordination overhead. This paper proposes a hybrid decentralized framework for dynamic application placement and resource management. The framework utilizes Graph Neural Networks (GNNs) to embed resource and application states, enabling comprehensive representation and efficient decision-making. It employs a collaborative multi-agent reinforcement learning (MARL) approach, where local agents optimize resource management in their neighborhoods and a global orchestrator ensures system-wide coordination. By combining decentralized application placement with centralized oversight, our framework addresses the scalability, adaptability, and accuracy challenges inherent in the Cloud-Edge Continuum. This work contributes to the development of decentralized application placement strategies, the integration of GNN embeddings, and collaborative MARL systems, providing a foundation for efficient, adaptive and scalable resource management.

GraphICL: Unlocking Graph Learning Potential in LLMs through Structured Prompt Design 2025-01-27
Show

The growing importance of textual and relational systems has driven interest in enhancing large language models (LLMs) for graph-structured data, particularly Text-Attributed Graphs (TAGs), where samples are represented by textual descriptions interconnected by edges. While research has largely focused on developing specialized graph LLMs through task-specific instruction tuning, a comprehensive benchmark for evaluating LLMs solely through prompt design remains surprisingly absent. Without such a carefully crafted evaluation benchmark, most if not all, tailored graph LLMs are compared against general LLMs using simplistic queries (e.g., zero-shot reasoning with LLaMA), which can potentially camouflage many advantages as well as unexpected predicaments of them. To achieve more general evaluations and unveil the true potential of LLMs for graph tasks, we introduce Graph In-context Learning (GraphICL) Benchmark, a comprehensive benchmark comprising novel prompt templates designed to capture graph structure and handle limited label knowledge. Our systematic evaluation shows that general-purpose LLMs equipped with our GraphICL outperform state-of-the-art specialized graph LLMs and graph neural network models in resource-constrained settings and out-of-domain tasks. These findings highlight the significant potential of prompt engineering to enhance LLM performance on graph learning tasks without training and offer a strong baseline for advancing research in graph LLMs.

Vehicle-group-based Crash Risk Prediction and Interpretation on Highways 2025-01-27
Show

Previous studies in predicting crash risks primarily associated the number or likelihood of crashes on a road segment with traffic parameters or geometric characteristics, usually neglecting the impact of vehicles' continuous movement and interactions with nearby vehicles. Recent technology advances, such as Connected and Automated Vehicles (CAVs) and Unmanned Aerial Vehicles (UAVs) are able to collect high-resolution trajectory data, which enables trajectory-based risk analysis. This study investigates a new vehicle group (VG) based risk analysis method and explores risk evolution mechanisms considering VG features. An impact-based vehicle grouping method is proposed to cluster vehicles into VGs by evaluating their responses to the erratic behaviors of nearby vehicles. The risk of a VG is aggregated based on the risk between each vehicle pair in the VG, measured by inverse Time-to-Collision (iTTC). A Logistic Regression and a Graph Neural Network (GNN) are then employed to predict VG risks using aggregated and disaggregated VG information. Both methods achieve excellent performance with AUC values exceeding 0.93. For the GNN model, GNNExplainer with feature perturbation is applied to identify critical individual vehicle features and their directional impact on VG risks. Overall, this research contributes a new perspective for identifying, predicting, and interpreting traffic risks.

13 pa...

13 pages, 12 figures; vehicle grouping method updated, explainable GNN framework incorporated

Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems 2025-01-26
Show

Preconditioning is at the heart of iterative solutions of large, sparse linear systems of equations in scientific disciplines. Several algebraic approaches, which access no information beyond the matrix itself, are widely studied and used, but ill-conditioned matrices remain very challenging. We take a machine learning approach and propose using graph neural networks as a general-purpose preconditioner. They show attractive performance for many problems and can be used when the mainstream preconditioners perform poorly. Empirical evaluation on over 800 matrices suggests that the construction time of these graph neural preconditioners (GNPs) is more predictable and can be much shorter than that of other widely used ones, such as ILU and AMG, while the execution time is faster than using a Krylov method as the preconditioner, such as in inner-outer GMRES. GNPs have a strong potential for solving large-scale, challenging algebraic problems arising from not only partial differential equations, but also economics, statistics, graph, and optimization, to name a few.

ICLR ...

ICLR 2025. Code is available at https://github.com/jiechenjiechen/GNP. (Important update from v1 to v2: Updated the timing experiments and evaluation metrics for fairer and better results.)

Field-level simulation-based inference with galaxy catalogs: the impact of systematic effects 2025-01-26
Show

It has been recently shown that a powerful way to constrain cosmological parameters from galaxy redshift surveys is to train graph neural networks to perform field-level likelihood-free inference without imposing cuts on scale. In particular, de Santi et al. (2023) developed models that could accurately infer the value of $\Omega_{\rm m}$ from catalogs that only contain the positions and radial velocities of galaxies that are robust to uncertainties in astrophysics and subgrid models. However, observations are affected by many effects, including 1) masking, 2) uncertainties in peculiar velocities and radial distances, and 3) different galaxy selections. Moreover, observations only allow us to measure redshift, intertwining galaxies' radial positions and velocities. In this paper we train and test our models on galaxy catalogs, created from thousands of state-of-the-art hydrodynamic simulations run with different codes from the CAMELS project, that incorporate these observational effects. We find that, although the presence of these effects degrades the precision and accuracy of the models, and increases the fraction of catalogs where the model breaks down, the fraction of galaxy catalogs where the model performs well is over 90 %, demonstrating the potential of these models to constrain cosmological parameters even when applied to real data.

36 pa...

36 pages, 12 figures. For the reference in the abstract see: de Santi et al. 2023, arXiv:2302.14101

Distributionally Robust Graph Out-of-Distribution Recommendation via Diffusion Model 2025-01-26
Show

The distributionally robust optimization (DRO)-based graph neural network methods improve recommendation systems' out-of-distribution (OOD) generalization by optimizing the model's worst-case performance. However, these studies fail to consider the impact of noisy samples in the training data, which results in diminished generalization capabilities and lower accuracy. Through experimental and theoretical analysis, this paper reveals that current DRO-based graph recommendation methods assign greater weight to noise distribution, leading to model parameter learning being dominated by it. When the model overly focuses on fitting noise samples in the training data, it may learn irrelevant or meaningless features that cannot be generalized to OOD data. To address this challenge, we design a Distributionally Robust Graph model for OOD recommendation (DRGO). Specifically, our method first employs a simple and effective diffusion paradigm to alleviate the noisy effect in the latent space. Additionally, an entropy regularization term is introduced in the DRO objective function to avoid extreme sample weights in the worst-case distribution. Finally, we provide a theoretical proof of the generalization error bound of DRGO as well as a theoretical analysis of how our approach mitigates noisy sample effects, which helps to better understand the proposed framework from a theoretical perspective. We conduct extensive experiments on four datasets to evaluate the effectiveness of our framework against three typical distribution shifts, and the results demonstrate its superiority in both independently and identically distributed distributions (IID) and OOD.

14 pa...

14 pages, Accepted by WWW'25

Graph Representation Learning via Causal Diffusion for Out-of-Distribution Recommendation 2025-01-26
Show

Graph Neural Networks (GNNs)-based recommendation algorithms typically assume that training and testing data are drawn from independent and identically distributed (IID) spaces. However, this assumption often fails in the presence of out-of-distribution (OOD) data, resulting in significant performance degradation. In this study, we construct a Structural Causal Model (SCM) to analyze interaction data, revealing that environmental confounders (e.g., the COVID-19 pandemic) lead to unstable correlations in GNN-based models, thus impairing their generalization to OOD data. To address this issue, we propose a novel approach, graph representation learning via causal diffusion (CausalDiffRec) for OOD recommendation. This method enhances the model's generalization on OOD data by eliminating environmental confounding factors and learning invariant graph representations. Specifically, we use backdoor adjustment and variational inference to infer the real environmental distribution, thereby eliminating the impact of environmental confounders. This inferred distribution is then used as prior knowledge to guide the representation learning in the reverse phase of the diffusion process to learn the invariant representation. In addition, we provide a theoretical derivation that proves optimizing the objective function of CausalDiffRec can encourage the model to learn environment-invariant graph representations, thereby achieving excellent generalization performance in recommendations under distribution shifts. Our extensive experiments validate the effectiveness of CausalDiffRec in improving the generalization of OOD data, and the average improvement is up to 10.69% on Food, 18.83% on KuaiRec, 22.41% on Yelp2018, and 11.65% on Douban datasets.

14 pa...

14 pages, accepted by WWW2025

Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space 2025-01-26
Show

Graph Neural Networks (GNNs) have shown great success in various graph-based learning tasks. However, it often faces the issue of over-smoothing as the model depth increases, which causes all node representations to converge to a single value and become indistinguishable. This issue stems from the inherent limitations of GNNs, which struggle to distinguish the importance of information from different neighborhoods. In this paper, we introduce MbaGCN, a novel graph convolutional architecture that draws inspiration from the Mamba paradigm-originally designed for sequence modeling. MbaGCN presents a new backbone for GNNs, consisting of three key components: the Message Aggregation Layer, the Selective State Space Transition Layer, and the Node State Prediction Layer. These components work in tandem to adaptively aggregate neighborhood information, providing greater flexibility and scalability for deep GNN models. While MbaGCN may not consistently outperform all existing methods on each dataset, it provides a foundational framework that demonstrates the effective integration of the Mamba paradigm into graph representation learning. Through extensive experiments on benchmark datasets, we demonstrate that MbaGCN paves the way for future advancements in graph neural network research.

11 pages, 4 figures
An Aspect Performance-aware Hypergraph Neural Network for Review-based Recommendation 2025-01-26
Show

Online reviews allow consumers to provide detailed feedback on various aspects of items. Existing methods utilize these aspects to model users' fine-grained preferences for specific item features through graph neural networks. We argue that the performance of items on different aspects is important for making precise recommendations, which has not been taken into account by existing approaches, due to lack of data. In this paper, we propose an aspect performance-aware hypergraph neural network (APH) for the review-based recommendation, which learns the performance of items from the conflicting sentiment polarity of user reviews. Specifically, APH comprehensively models the relationships among users, items, aspects, and sentiment polarity by systematically constructing an aspect hypergraph based on user reviews. In addition, APH aggregates aspects representing users and items by employing an aspect performance-aware hypergraph aggregation method. It aggregates the sentiment polarities from multiple users by jointly considering user preferences and the semantics of their sentiments, determining the weights of sentiment polarities to infer the performance of items on various aspects. Such performances are then used as weights to aggregate neighboring aspects. Experiments on six real-world datasets demonstrate that APH improves MSE, Precision@5, and Recall@5 by an average of 2.30%, 4.89%, and 1.60% over the best baseline. The source code and data are available at https://github.com/dianziliu/APH.

12 pa...

12 pages, accepted by WSDM'25

MAPPING: Debiasing Graph Neural Networks for Fair Node Classification with Limited Sensitive Information Leakage 2025-01-26
Show

Despite remarkable success in diverse web-based applications, Graph Neural Networks(GNNs) inherit and further exacerbate historical discrimination and social stereotypes, which critically hinder their deployments in high-stake domains such as online clinical diagnosis, financial crediting, etc. However, current fairness research that primarily craft on i.i.d data, cannot be trivially replicated to non-i.i.d. graph structures with topological dependence among samples. Existing fair graph learning typically favors pairwise constraints to achieve fairness but fails to cast off dimensional limitations and generalize them into multiple sensitive attributes; besides, most studies focus on in-processing techniques to enforce and calibrate fairness, constructing a model-agnostic debiasing GNN framework at the pre-processing stage to prevent downstream misuses and improve training reliability is still largely under-explored. Furthermore, previous work on GNNs tend to enhance either fairness or privacy individually but few probe into their interplays. In this paper, we propose a novel model-agnostic debiasing framework named MAPPING (\underline{M}asking \underline{A}nd \underline{P}runing and Message-\underline{P}assing train\underline{ING}) for fair node classification, in which we adopt the distance covariance($dCov$)-based fairness constraints to simultaneously reduce feature and topology biases in arbitrary dimensions, and combine them with adversarial debiasing to confine the risks of attribute inference attacks. Experiments on real-world datasets with different GNN variants demonstrate the effectiveness and flexibility of MAPPING. Our results show that MAPPING can achieve better trade-offs between utility and fairness, and mitigate privacy risks of sensitive information leakage.

Accep...

Accepted by WWW Journal. Code is available at https://github.com/yings0930/MAPPING

ReInc: Scaling Training of Dynamic Graph Neural Networks 2025-01-25
Show

Dynamic Graph Neural Networks (DGNNs) have gained widespread attention due to their applicability in diverse domains such as traffic network prediction, epidemiological forecasting, and social network analysis. In this paper, we present ReInc, a system designed to enable efficient and scalable training of DGNNs on large-scale graphs. ReInc introduces key innovations that capitalize on the unique combination of Graph Neural Networks (GNNs) and Recurrent Neural Networks (RNNs) inherent in DGNNs. By reusing intermediate results and incrementally computing aggregations across consecutive graph snapshots, ReInc significantly enhances computational efficiency. To support these optimizations, ReInc incorporates a novel two-level caching mechanism with a specialized caching policy aligned to the DGNN execution workflow. Additionally, ReInc addresses the challenges of managing structural and temporal dependencies in dynamic graphs through a new distributed training strategy. This approach eliminates communication overheads associated with accessing remote features and redistributing intermediate results. Experimental results demonstrate that ReInc achieves up to an order of magnitude speedup compared to state-of-the-art frameworks, tested across various dynamic GNN architectures and real-world graph datasets.

Data Center Cooling System Optimization Using Offline Reinforcement Learning 2025-01-25
Show

The recent advances in information technology and artificial intelligence have fueled a rapid expansion of the data center (DC) industry worldwide, accompanied by an immense appetite for electricity to power the DCs. In a typical DC, around 3040% of the energy is spent on the cooling system rather than on computer servers, posing a pressing need for developing new energy-saving optimization technologies for DC cooling systems. However, optimizing such real-world industrial systems faces numerous challenges, including but not limited to a lack of reliable simulation environments, limited historical data, and stringent safety and control robustness requirements. In this work, we present a novel physics-informed offline reinforcement learning (RL) framework for energy efficiency optimization of DC cooling systems. The proposed framework models the complex dynamical patterns and physical dependencies inside a server room using a purposely designed graph neural network architecture that is compliant with the fundamental time-reversal symmetry. Because of its well-behaved and generalizable state-action representations, the model enables sample-efficient and robust latent space offline policy learning using limited real-world operational data. Our framework has been successfully deployed and verified in a large-scale production DC for closed-loop control of its air-cooling units (ACUs). We conducted a total of 2000 hours of short and long-term experiments in the production DC environment. The results show that our method achieves 1421% energy savings in the DC cooling system, without any violation of the safety or operational constraints. Our results have demonstrated the significant potential of offline RL in solving a broad range of data-limited, safety-critical real-world industrial control problems.

Accep...

Accepted in ICLR 2025

Utilizing Graph Neural Networks for Effective Link Prediction in Microservice Architectures 2025-01-25
Show

Managing microservice architectures in distributed systems is complex and resource intensive due to the high frequency and dynamic nature of inter service interactions. Accurate prediction of these future interactions can enhance adaptive monitoring, enabling proactive maintenance and resolution of potential performance issues before they escalate. This study introduces a Graph Neural Network GNN based approach, specifically using a Graph Attention Network GAT, for link prediction in microservice Call Graphs. Unlike social networks, where interactions tend to occur sporadically and are often less frequent, microservice Call Graphs involve highly frequent and time sensitive interactions that are essential to operational performance. Our approach leverages temporal segmentation, advanced negative sampling, and GATs attention mechanisms to model these complex interactions accurately. Using real world data, we evaluate our model across performance metrics such as AUC, Precision, Recall, and F1 Score, demonstrating its high accuracy and robustness in predicting microservice interactions. Our findings support the potential of GNNs for proactive monitoring in distributed systems, paving the way for applications in adaptive resource management and performance optimization.

Accep...

Accepted for presentation and publication at the ICPE 2025 conference

Personalized Layer Selection for Graph Neural Networks 2025-01-24
Show

Graph Neural Networks (GNNs) combine node attributes over a fixed granularity of the local graph structure around a node to predict its label. However, different nodes may relate to a node-level property with a different granularity of its local neighborhood, and using the same level of smoothing for all nodes can be detrimental to their classification. In this work, we challenge the common fact that a single GNN layer can classify all nodes of a graph by training GNNs with a distinct personalized layer for each node. Inspired by metric learning, we propose a novel algorithm, MetSelect1, to select the optimal representation layer to classify each node. In particular, we identify a prototype representation of each class in a transformed GNN layer and then, classify using the layer where the distance is smallest to a class prototype after normalizing with that layer's variance. Results on 10 datasets and 3 different GNNs show that we significantly improve the node classification accuracy of GNNs in a plug-and-play manner. We also find that using variable layers for prediction enables GNNs to be deeper and more robust to poisoning attacks. We hope this work can inspire future works to learn more adaptive and personalized graph representations.

FIT-GNN: Faster Inference Time for GNNs Using Coarsening 2025-01-24
Show

Scalability of Graph Neural Networks (GNNs) remains a significant challenge, particularly when dealing with large-scale graphs. To tackle this, coarsening-based methods are used to reduce the graph into a smaller graph, resulting in faster computation. Nonetheless, prior research has not adequately addressed the computational costs during the inference phase. This paper presents a novel approach to improve the scalability of GNNs by reducing computational burden during both training and inference phases. We demonstrate two different methods (Extra-Nodes and Cluster-Nodes). Our study also proposes a unique application of the coarsening algorithm for graph-level tasks, including graph classification and graph regression, which have not yet been explored. We conduct extensive experiments on multiple benchmark datasets in the order of $100K$ nodes to evaluate the performance of our approach. The results demonstrate that our method achieves competitive performance in tasks involving classification and regression on nodes and graphs, compared to traditional GNNs, while having single-node inference times that are orders of magnitude faster. Furthermore, our approach significantly reduces memory consumption, allowing training and inference on low-resource devices where traditional methods struggle.

Signed Graph Autoencoder for Explainable and Polarization-Aware Network Embeddings 2025-01-24
Show

Autoencoders based on Graph Neural Networks (GNNs) have garnered significant attention in recent years for their ability to extract informative latent representations, characterizing the structure of complex topologies, such as graphs. Despite the prevalence of Graph Autoencoders, there has been limited focus on developing and evaluating explainable neural-based graph generative models specifically designed for signed networks. To address this gap, we propose the Signed Graph Archetypal Autoencoder (SGAAE) framework. SGAAE extracts node-level representations that express node memberships over distinct extreme profiles, referred to as archetypes, within the network. This is achieved by projecting the graph onto a learned polytope, which governs its polarization. The framework employs a recently proposed likelihood for analyzing signed networks based on the Skellam distribution, combined with relational archetypal analysis and GNNs. Our experimental evaluation demonstrates the SGAAEs' capability to successfully infer node memberships over the different underlying latent structures while extracting competing communities formed through the participation of the opposing views in the network. Additionally, we introduce the 2-level network polarization problem and show how SGAAE is able to characterize such a setting. The proposed model achieves high performance in different tasks of signed link prediction across four real-world datasets, outperforming several baseline models.

Preprint
Integrating Physics Inspired Features with Graph Convolution 2025-01-24
Show

With the advent of advanced machine learning techniques, boosted object tagging has witnessed significant progress. In this article, we take this field further by introducing novel architectural modifications compatible with a wide array of Graph Neural Network (GNN) architectures. Our approach advocates for integrating capsule layers, replacing the conventional decoding blocks in standard GNNs. These capsules are a group of neurons with vector activations. The orientation of these vectors represents important properties of the objects under study, with their magnitude characterizing whether the object under study belongs to the class represented by the capsule. Moreover, capsule networks incorporate a regularization by reconstruction mechanism, facilitating the seamless integration of expert-designed high-level features into the analysis. We have studied the usefulness of our architecture with the LorentzNet architecture for quark-gluon tagging. Here, we have replaced the decoding block of LorentzNet with a capsulated decoding block and have called the resulting architecture CapsLorentzNet. Our new architecture can enhance the performance of LorentzNet by 20 % for the quark-gluon tagging task.

16 pages, 3 figures
An Attentive Graph Agent for Topology-Adaptive Cyber Defence 2025-01-24
Show

As cyber threats grow increasingly sophisticated, reinforcement learning is emerging as a promising technique to create intelligent, self-improving defensive systems. However, most existing autonomous defensive agents have overlooked the inherent graph structure of computer networks subject to cyber attacks, potentially missing critical information. To address this gap, we developed a custom version of the Cyber Operations Research Gym (CybORG) environment that encodes the observable network state as a directed graph, utilizing realistic and interpretable low-level features. %, like number of open ports and unexpected detected connections. We leverage a Graph Attention Network (GAT) architecture to process node, edge, and global features, and modify its output to be compatible with policy gradient methods in reinforcement learning. GAT policies offer several advantages over standard approaches based on simplistic flattened state observations. They can handle the changes in network topology that occur at runtime when dynamic connections between hosts appear. Policies can be deployed to networks that differ in size to the ones seen during training, enabling a degree of generalisation inaccessible with alternative approaches. Furthermore, the graph neural network policies outputs are explainable in terms of tangible network properties, providing enhanced interpretability of defensive actions. We verify that our low-level graph observations are meaningful enough to train GAT defensive policies that are able to adapt to changing topologies. We evaluate how our trained policies perform when deployed on networks of varying sizes with the same subnetwork structure, comparing them against policies specifically trained for each network configuration. Our study contributes to the development of robust cyber defence systems that can better adapt to real-world network security challenges.

On the Homophily of Heterogeneous Graphs: Understanding and Unleashing 2025-01-24
Show

Homophily, the tendency of similar nodes to connect, is a fundamental phenomenon in network science and a critical factor in the performance of graph neural networks (GNNs). While existing studies primarily explore homophily in homogeneous graphs, where nodes share the same type, real-world networks are often more accurately modeled as heterogeneous graphs (HGs) with diverse node types and intricate cross-type interactions. This structural diversity complicates the analysis of homophily, as traditional homophily metrics fail to account for distinct label spaces across node types. To address this limitation, we introduce the Cross-Type Homophily Ratio, a novel metric that quantifies homophily based on the similarity of target information across different node types. Furthermore, we introduce Cross-Type Homophily-guided Heterogeneous Graph Pruning, a method designed to selectively remove low-homophily crosstype edges, thereby enhancing the Cross-Type Homophily Ratio and boosting the performance of heterogeneous graph neural networks (HGNNs). Extensive experiments on five real-world HG datasets validate the effectiveness of our approach, which delivers up to 13.36% average relative performance improvement for HGNNs, offering a fresh perspective on cross-type homophily in heterogeneous graph learning.

Disentangled Condensation for Large-scale Graphs 2025-01-24
Show

Graph condensation has emerged as an intriguing technique to save the expensive training costs of Graph Neural Networks (GNNs) by substituting a condensed small graph with the original graph. Despite the promising results achieved, previous methods usually employ an entangled paradigm of redundant parameters (nodes, edges, GNNs), which incurs complex joint optimization during condensation. This paradigm has considerably impeded the scalability of graph condensation, making it challenging to condense extremely large-scale graphs and generate high-fidelity condensed graphs. Therefore, we propose to disentangle the condensation process into a two-stage GNN-free paradigm, independently condensing nodes and generating edges while eliminating the need to optimize GNNs at the same time. The node condensation module avoids the complexity of GNNs by focusing on node feature alignment with anchors of the original graph, while the edge translation module constructs the edges of the condensed nodes by transferring the original structure knowledge with neighborhood anchors. This simple yet effective approach achieves at least 10 times faster than state-of-the-art methods with comparable accuracy on medium-scale graphs. Moreover, the proposed DisCo can successfully scale up to the Ogbn-papers100M graph containing over 100 million nodes with flexible reduction rates and improves performance on the second-largest Ogbn-products dataset by over 5%. Extensive downstream tasks and ablation study on five common datasets further demonstrate the effectiveness of the proposed DisCo framework. Our code is available at https://github.com/BangHonor/DisCo.

Backdoor Attack on Vertical Federated Graph Neural Network Learning 2025-01-24
Show

Federated Graph Neural Network (FedGNN) integrate federated learning (FL) with graph neural networks (GNNs) to enable privacy-preserving training on distributed graph data. Vertical Federated Graph Neural Network (VFGNN), a key branch of FedGNN, handles scenarios where data features and labels are distributed among participants. Despite the robust privacy-preserving design of VFGNN, we have found that it still faces the risk of backdoor attacks, even in situations where labels are inaccessible. This paper proposes BVG, a novel backdoor attack method that leverages multi-hop triggers and backdoor retention, requiring only four target-class nodes to execute effective attacks. Experimental results demonstrate that BVG achieves nearly 100% attack success rates across three commonly used datasets and three GNN models, with minimal impact on the main task accuracy. We also evaluated various defense methods, and the BVG method maintained high attack effectiveness even under existing defenses. This finding highlights the need for advanced defense mechanisms to counter sophisticated backdoor attacks in practical VFGNN applications.

Polynomial Selection in Spectral Graph Neural Networks: An Error-Sum of Function Slices Approach 2025-01-24
Show

Spectral graph neural networks are proposed to harness spectral information inherent in graph-structured data through the application of polynomial-defined graph filters, recently achieving notable success in graph-based web applications. Existing studies reveal that various polynomial choices greatly impact spectral GNN performance, underscoring the importance of polynomial selection. However, this selection process remains a critical and unresolved challenge. Although prior work suggests a connection between the approximation capabilities of polynomials and the efficacy of spectral GNNs, there is a lack of theoretical insights into this relationship, rendering polynomial selection a largely heuristic process. To address the issue, this paper examines polynomial selection from an error-sum of function slices perspective. Inspired by the conventional signal decomposition, we represent graph filters as a sum of disjoint function slices. Building on this, we then bridge the polynomial capability and spectral GNN efficacy by proving that the construction error of graph convolution layer is bounded by the sum of polynomial approximation errors on function slices. This result leads us to develop an advanced filter based on trigonometric polynomials, a widely adopted option for approximating narrow signal slices. The proposed filter remains provable parameter efficiency, with a novel Taylor-based parameter decomposition that achieves streamlined, effective implementation. With this foundation, we propose TFGNN, a scalable spectral GNN operating in a decoupled paradigm. We validate the efficacy of TFGNN via benchmark node classification tasks, along with an example graph anomaly detection application to show its practical utility.

Accep...

Accepted in ACM The Web Conference 2025, WWW 2025

Convergence of gradient based training for linear Graph Neural Networks 2025-01-24
Show

Graph Neural Networks (GNNs) are powerful tools for addressing learning problems on graph structures, with a wide range of applications in molecular biology and social networks. However, the theoretical foundations underlying their empirical performance are not well understood. In this article, we examine the convergence of gradient dynamics in the training of linear GNNs. Specifically, we prove that the gradient flow training of a linear GNN with mean squared loss converges to the global minimum at an exponential rate. The convergence rate depends explicitly on the initial weights and the graph shift operator, which we validate on synthetic datasets from well-known graph models and real-world datasets. Furthermore, we discuss the gradient flow that minimizes the total weights at the global minimum. In addition to the gradient flow, we study the convergence of linear GNNs under gradient descent training, an iterative scheme viewed as a discretization of gradient flow.

27 pages, 8 figures
GraFPrint: A GNN-Based Approach for Audio Identification 2025-01-24
Show

This paper introduces GraFPrint, an audio identification framework that leverages the structural learning capabilities of Graph Neural Networks (GNNs) to create robust audio fingerprints. Our method constructs a k-nearest neighbor (k-NN) graph from time-frequency representations and applies max-relative graph convolutions to encode local and global information. The network is trained using a self-supervised contrastive approach, which enhances resilience to ambient distortions by optimizing feature representation. GraFPrint demonstrates superior performance on large-scale datasets at various levels of granularity, proving to be both lightweight and scalable, making it suitable for real-world applications with extensive reference databases.

Submi...

Submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)

MeshMask: Physics-Based Simulations with Masked Graph Neural Networks 2025-01-24
Show

We introduce a novel masked pre-training technique for graph neural networks (GNNs) applied to computational fluid dynamics (CFD) problems. By randomly masking up to 40% of input mesh nodes during pre-training, we force the model to learn robust representations of complex fluid dynamics. We pair this masking strategy with an asymmetric encoder-decoder architecture and gated multi-layer perceptrons to further enhance performance. The proposed method achieves state-of-the-art results on seven CFD datasets, including a new challenging dataset of 3D intracranial aneurysm simulations with over 250,000 nodes per mesh. Moreover, it significantly improves model performance and training efficiency across such diverse range of fluid simulation tasks. We demonstrate improvements of up to 60% in long-term prediction accuracy compared to previous best models, while maintaining similar computational costs. Notably, our approach enables effective pre-training on multiple datasets simultaneously, significantly reducing the time and data required to achieve high performance on new tasks. Through extensive ablation studies, we provide insights into the optimal masking ratio, architectural choices, and training strategies.

Adaptive Progressive Attention Graph Neural Network for EEG Emotion Recognition 2025-01-24
Show

In recent years, numerous neuroscientific studies have shown that human emotions are closely linked to specific brain regions, with these regions exhibiting variability across individuals and emotional states. To fully leverage these neural patterns, we propose an Adaptive Progressive Attention Graph Neural Network (APAGNN), which dynamically captures the spatial relationships among brain regions during emotional processing. The APAGNN employs three specialized experts that progressively analyze brain topology. The first expert captures global brain patterns, the second focuses on region-specific features, and the third examines emotion-related channels. This hierarchical approach enables increasingly refined analysis of neural activity. Additionally, a weight generator integrates the outputs of all three experts, balancing their contributions to produce the final predictive label. Extensive experiments on three publicly available datasets (SEED, SEED-IV and MPED) demonstrate that the proposed method enhances EEG emotion recognition performance, achieving superior results compared to baseline methods.

Top Ten Challenges Towards Agentic Neural Graph Databases 2025-01-24
Show

Graph databases (GDBs) like Neo4j and TigerGraph excel at handling interconnected data but lack advanced inference capabilities. Neural Graph Databases (NGDBs) address this by integrating Graph Neural Networks (GNNs) for predictive analysis and reasoning over incomplete or noisy data. However, NGDBs rely on predefined queries and lack autonomy and adaptability. This paper introduces Agentic Neural Graph Databases (Agentic NGDBs), which extend NGDBs with three core functionalities: autonomous query construction, neural query execution, and continuous learning. We identify ten key challenges in realizing Agentic NGDBs: semantic unit representation, abductive reasoning, scalable query execution, and integration with foundation models like large language models (LLMs). By addressing these challenges, Agentic NGDBs can enable intelligent, self-improving systems for modern data-driven applications, paving the way for adaptable and autonomous data management solutions.

12 Pages
Motif-aware Attribute Masking for Molecular Graph Pre-training 2025-01-24
Show

Attribute reconstruction is used to predict node or edge features in the pre-training of graph neural networks. Given a large number of molecules, they learn to capture structural knowledge, which is transferable for various downstream property prediction tasks and vital in chemistry, biomedicine, and material science. Previous strategies that randomly select nodes to do attribute masking leverage the information of local neighbors However, the over-reliance of these neighbors inhibits the model's ability to learn from higher-level substructures. For example, the model would learn little from predicting three carbon atoms in a benzene ring based on the other three but could learn more from the inter-connections between the functional groups, or called chemical motifs. In this work, we propose and investigate motif-aware attribute masking strategies to capture inter-motif structures by leveraging the information of atoms in neighboring motifs. Once each graph is decomposed into disjoint motifs, the features for every node within a sample motif are masked. The graph decoder then predicts the masked features of each node within the motif for reconstruction. We evaluate our approach on eight molecular property prediction datasets and demonstrate its advantages.

Extractive Schema Linking for Text-to-SQL 2025-01-23
Show

Text-to-SQL is emerging as a practical interface for real world databases. The dominant paradigm for Text-to-SQL is cross-database or schema-independent, supporting application schemas unseen during training. The schema of a database defines the tables, columns, column types and foreign key connections between tables. Real world schemas can be large, containing hundreds of columns, but for any particular query only a small fraction will be relevant. Placing the entire schema in the prompt for an LLM can be impossible for models with smaller token windows and expensive even when the context window is large enough to allow it. Even apart from computational considerations, the accuracy of the model can be improved by focusing the SQL generation on only the relevant portion of the database. Schema linking identifies the portion of the database schema useful for the question. Previous work on schema linking has used graph neural networks, generative LLMs, and cross encoder classifiers. We introduce a new approach to adapt decoder-only LLMs to schema linking that is both computationally more efficient and more accurate than the generative approach. Additionally our extractive approach permits fine-grained control over the precision-recall trade-off for schema linking.

Crossfire: An Elastic Defense Framework for Graph Neural Networks Under Bit Flip Attacks 2025-01-23
Show

Bit Flip Attacks (BFAs) are a well-established class of adversarial attacks, originally developed for Convolutional Neural Networks within the computer vision domain. Most recently, these attacks have been extended to target Graph Neural Networks (GNNs), revealing significant vulnerabilities. This new development naturally raises questions about the best strategies to defend GNNs against BFAs, a challenge for which no solutions currently exist. Given the applications of GNNs in critical fields, any defense mechanism must not only maintain network performance, but also verifiably restore the network to its pre-attack state. Verifiably restoring the network to its pre-attack state also eliminates the need for costly evaluations on test data to ensure network quality. We offer first insights into the effectiveness of existing honeypot- and hashing-based defenses against BFAs adapted from the computer vision domain to GNNs, and characterize the shortcomings of these approaches. To overcome their limitations, we propose Crossfire, a hybrid approach that exploits weight sparsity and combines hashing and honeypots with bit-level correction of out-of-distribution weight elements to restore network integrity. Crossfire is retraining-free and does not require labeled data. Averaged over 2,160 experiments on six benchmark datasets, Crossfire offers a 21.8% higher probability than its competitors of reconstructing a GNN attacked by a BFA to its pre-attack state. These experiments cover up to 55 bit flips from various attacks. Moreover, it improves post-repair prediction quality by 10.85%. Computational and storage overheads are negligible compared to the inherent complexity of even the simplest GNNs.

Accep...

Accepted at AAAI 2025, DOI will be included after publication

Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function 2025-01-23
Show

Modern machine learning algorithms, especially deep learning based techniques, typically involve careful hyperparameter tuning to achieve the best performance. Despite the surge of intense interest in practical techniques like Bayesian optimization and random search based approaches to automating this laborious and compute-intensive task, the fundamental learning theoretic complexity of tuning hyperparameters for deep neural networks is poorly understood. Inspired by this glaring gap, we initiate the formal study of hyperparameter tuning complexity in deep learning through a recently introduced data driven setting. We assume that we have a series of deep learning tasks, and we have to tune hyperparameters to do well on average over the distribution of tasks. A major difficulty is that the utility function as a function of the hyperparameter is very volatile and furthermore, it is given implicitly by an optimization problem over the model parameters. This is unlike previous work in data driven design, where one can typically explicitly model the algorithmic behavior as a function of the hyperparameters. To tackle this challenge, we introduce a new technique to characterize the discontinuities and oscillations of the utility function on any fixed problem instance as we vary the hyperparameter, our analysis relies on subtle concepts including tools from differential/algebraic geometry and constrained optimization. This can be used to show that the learning theoretic complexity of the corresponding family of utility functions is bounded. We instantiate our results and provide sample complexity bounds for concrete applications tuning a hyperparameter that interpolates neural activation functions and setting the kernel parameter in graph neural networks.

48 pages, 4 figures
The Road to Learning Explainable Inverse Kinematic Models: Graph Neural Networks as Inductive Bias for Symbolic Regression 2025-01-23
Show

This paper shows how a Graph Neural Network (GNN) can be used to learn an Inverse Kinematics (IK) based on an automatically generated dataset. The generated Inverse Kinematics is generalized to a family of manipulators with the same Degree of Freedom (DOF), but varying link length configurations. The results indicate a position error of less than 1.0 cm for 3 DOF and 4.5 cm for 5 DOF, and orientation error of 2$^\circ$ for 3 DOF and 8.2$^\circ$ for 6 DOF, which allows the deployment to certain real world-problems. However, out-of-domain errors and lack of extrapolation can be observed in the resulting GNN. An extensive analysis of these errors indicates potential for enhancement in the future. Consequently, the generated GNNs are tailored to be used in future work as an inductive bias to generate analytical equations through symbolic regression.

FedGrAINS: Personalized SubGraph Federated Learning with Adaptive Neighbor Sampling 2025-01-23
Show

Graphs are crucial for modeling relational and biological data. As datasets grow larger in real-world scenarios, the risk of exposing sensitive information increases, making privacy-preserving training methods like federated learning (FL) essential to ensure data security and compliance with privacy regulations. Recently proposed personalized subgraph FL methods have become the de-facto standard for training personalized Graph Neural Networks (GNNs) in a federated manner while dealing with the missing links across clients' subgraphs due to privacy restrictions. However, personalized subgraph FL faces significant challenges due to the heterogeneity in client subgraphs, such as degree distributions among the nodes, which complicate federated training of graph models. To address these challenges, we propose \textit{FedGrAINS}, a novel data-adaptive and sampling-based regularization method for subgraph FL. FedGrAINS leverages generative flow networks (GFlowNets) to evaluate node importance concerning clients' tasks, dynamically adjusting the message-passing step in clients' GNNs. This adaptation reflects task-optimized sampling aligned with a trajectory balance objective. Experimental results demonstrate that the inclusion of \textit{FedGrAINS} as a regularizer consistently improves the FL performance compared to baselines that do not leverage such regularization.

Accep...

Accepted to SDM2025 (SIAM Data Mining 2025)

VARFVV: View-Adaptive Real-Time Interactive Free-View Video Streaming with Edge Computing 2025-01-23
Show

Free-view video (FVV) allows users to explore immersive video content from multiple views. However, delivering FVV poses significant challenges due to the uncertainty in view switching, combined with the substantial bandwidth and computational resources required to transmit and decode multiple video streams, which may result in frequent playback interruptions. Existing approaches, either client-based or cloud-based, struggle to meet high Quality of Experience (QoE) requirements under limited bandwidth and computational resources. To address these issues, we propose VARFVV, a bandwidth- and computationally-efficient system that enables real-time interactive FVV streaming with high QoE and low switching delay. Specifically, VARFVV introduces a low-complexity FVV generation scheme that reassembles multiview video frames at the edge server based on user-selected view tracks, eliminating the need for transcoding and significantly reducing computational overhead. This design makes it well-suited for large-scale, mobile-based UHD FVV experiences. Furthermore, we present a popularity-adaptive bit allocation method, leveraging a graph neural network, that predicts view popularity and dynamically adjusts bit allocation to maximize QoE within bandwidth constraints. We also construct an FVV dataset comprising 330 videos from 10 scenes, including basketball, opera, etc. Extensive experiments show that VARFVV surpasses existing methods in video quality, switching latency, computational efficiency, and bandwidth usage, supporting over 500 users on a single edge server with a switching delay of 71.5ms. Our code and dataset are available at https://github.com/qianghu-huber/VARFVV.

GCAD: Anomaly Detection in Multivariate Time Series from the Perspective of Granger Causality 2025-01-23
Show

Multivariate time series anomaly detection has numerous real-world applications and is being extensively studied. Modeling pairwise correlations between variables is crucial. Existing methods employ learnable graph structures and graph neural networks to explicitly model the spatial dependencies between variables. However, these methods are primarily based on prediction or reconstruction tasks, which can only learn similarity relationships between sequence embeddings and lack interpretability in how graph structures affect time series evolution. In this paper, we designed a framework that models spatial dependencies using interpretable causal relationships and detects anomalies through changes in causal patterns. Specifically, we propose a method to dynamically discover Granger causality using gradients in nonlinear deep predictors and employ a simple sparsification strategy to obtain a Granger causality graph, detecting anomalies from a causal perspective. Experiments on real-world datasets demonstrate that the proposed model achieves more accurate anomaly detection compared to baseline methods.

Accep...

Accepted to AAAI 2025

RIDA: A Robust Attack Framework on Incomplete Graphs 2025-01-23
Show

Graph Neural Networks (GNNs) are vital in data science but are increasingly susceptible to adversarial attacks. To help researchers develop more robust GNN models, it's essential to focus on designing strong attack models as foundational benchmarks and guiding references. Among adversarial attacks, gray-box poisoning attacks are noteworthy due to their effectiveness and fewer constraints. These attacks exploit GNNs' need for retraining on updated data, thereby impacting their performance by perturbing these datasets. However, current research overlooks the real-world scenario of incomplete graphs.To address this gap, we introduce the Robust Incomplete Deep Attack Framework (RIDA). It is the first algorithm for robust gray-box poisoning attacks on incomplete graphs. The approach innovatively aggregates distant vertex information and ensures powerful data utilization.Extensive tests against 9 SOTA baselines on 3 real-world datasets demonstrate RIDA's superiority in handling incompleteness and high attack performance on the incomplete graph.

Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition 2025-01-23
Show

The increasing prevalence of large-scale graphs poses a significant challenge for graph neural network training, attributed to their substantial computational requirements. In response, graph condensation (GC) emerges as a promising data-centric solution aiming to substitute the large graph with a small yet informative condensed graph to facilitate data-efficient GNN training. However, existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources and training time. In this paper, we revisit existing GC optimization strategies and identify two pervasive issues therein: (1) various GC optimization strategies converge to coarse-grained class-level node feature matching between the original and condensed graphs; (2) existing GC methods rely on a Siamese graph network architecture that requires time-consuming bi-level optimization with iterative gradient computations. To overcome these issues, we propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC), which refines the node distribution matching from the class-to-class paradigm into a novel class-to-node paradigm, transforming the GC optimization into a class partition problem which can be efficiently solved by any clustering methods. Moreover, CGC incorporates a pre-defined graph structure to enable a closed-form solution for condensed node features, eliminating the need for back-and-forth gradient descent in existing GC approaches. Extensive experiments demonstrate that CGC achieves an exceedingly efficient condensation process with advanced accuracy. Compared with the state-of-the-art GC methods, CGC condenses the Ogbn-products graph within 30 seconds, achieving a speedup ranging from $10^2$X to $10^4$X and increasing accuracy by up to 4.2%.

ACM W...

ACM Web Conference 2025 (WWW '25)

KAA: Kolmogorov-Arnold Attention for Enhancing Attentive Graph Neural Networks 2025-01-23
Show

Graph neural networks (GNNs) with attention mechanisms, often referred to as attentive GNNs, have emerged as a prominent paradigm in advanced GNN models in recent years. However, our understanding of the critical process of scoring neighbor nodes remains limited, leading to the underperformance of many existing attentive GNNs. In this paper, we unify the scoring functions of current attentive GNNs and propose Kolmogorov-Arnold Attention (KAA), which integrates the Kolmogorov-Arnold Network (KAN) architecture into the scoring process. KAA enhances the performance of scoring functions across the board and can be applied to nearly all existing attentive GNNs. To compare the expressive power of KAA with other scoring functions, we introduce Maximum Ranking Distance (MRD) to quantitatively estimate their upper bounds in ranking errors for node importance. Our analysis reveals that, under limited parameters and constraints on width and depth, both linear transformation-based and MLP-based scoring functions exhibit finite expressive power. In contrast, our proposed KAA, even with a single-layer KAN parameterized by zero-order B-spline functions, demonstrates nearly infinite expressive power. Extensive experiments on both node-level and graph-level tasks using various backbone models show that KAA-enhanced scoring functions consistently outperform their original counterparts, achieving performance improvements of over 20% in some cases.

Generative Graphical Inverse Kinematics 2025-01-23
Show

Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for many robot manipulators. Existing numerical solvers are broadly applicable but typically only produce a single solution and rely on local search techniques to minimize nonconvex objective functions. More recent learning-based approaches that approximate the entire feasible set of solutions have shown promise as a means to generate multiple fast and accurate IK results in parallel. However, existing learning-based techniques have a significant drawback: each robot of interest requires a specialized model that must be trained from scratch. To address this key shortcoming, we propose a novel distance-geometric robot representation coupled with a graph structure that allows us to leverage the sample efficiency of Euclidean equivariant functions and the generalizability of graph neural networks (GNNs). Our approach is generative graphical inverse kinematics (GGIK), the first learned IK solver able to accurately and efficiently produce a large number of diverse solutions in parallel while also displaying the ability to generalize -- a single learned model can be used to produce IK solutions for a variety of different robots. When compared to several other learned IK methods, GGIK provides more accurate solutions with the same amount of data. GGIK can generalize reasonably well to robot manipulators unseen during training. Additionally, GGIK can learn a constrained distribution that encodes joint limits and scales efficiently to larger robots and a high number of sampled solutions. Finally, GGIK can be used to complement local IK solvers by providing reliable initializations for a local optimization process.

17 pages, 9 figures
Stress Predictions in Polycrystal Plasticity using Graph Neural Networks with Subgraph Training 2025-01-23
Show

Numerical modeling of polycrystal plasticity is computationally intensive. We employ Graph Neural Networks (GNN) to predict stresses on complex geometries for polycrystal plasticity from Finite Element Method (FEM) simulations. We present a novel message-passing GNN that encodes nodal strain and edge distances between FEM mesh cells, and aggregates to obtain embeddings and combines the decoded embeddings with the nodal strains to predict stress tensors on graph nodes. The GNN is trained on subgraphs generated from FEM mesh graphs, in which the mesh cells are converted to nodes and edges are created between adjacent cells. We apply the trained GNN to periodic polycrystals with complex geometries and learn the strain-stress maps based on crystal plasticity theory. The GNN is accurately trained on FEM graphs, in which the $R^2$ for both training and testing sets are larger than 0.99. The proposed GNN approach speeds up more than 150 times compared with FEM on stress predictions. We also apply the trained GNN to unseen simulations for validations and the GNN generalizes well with an overall $R^2$ of 0.992. The GNN accurately predicts the von Mises stress on polycrystals. The proposed model does not overfit and generalizes well beyond the training data, as the error distributions demonstrate. This work outlooks surrogating crystal plasticity simulations using graph data.

25 pa...

25 pages, 11 figures (main manuscript)

Deep Inverse Design for High-Level Synthesis 2025-01-22
Show

High-level synthesis (HLS) has significantly advanced the automation of digital circuits design, yet the need for expertise and time in pragma tuning remains challenging. Existing solutions for the design space exploration (DSE) adopt either heuristic methods, lacking essential information for further optimization potential, or predictive models, missing sufficient generalization due to the time-consuming nature of HLS and the exponential growth of the design space. To address these challenges, we propose Deep Inverse Design for HLS (DID4HLS), a novel approach that integrates graph neural networks and generative models. DID4HLS iteratively optimizes hardware designs aimed at compute-intensive algorithms by learning conditional distributions of design features from post-HLS data. Compared to four state-of-the-art DSE baselines, our method achieved an average improvement of 42.8% on average distance to reference set (ADRS) compared to the best-performing baselines across six benchmarks, while demonstrating high robustness and efficiency.

HierPromptLM: A Pure PLM-based Framework for Representation Learning on Heterogeneous Text-rich Networks 2025-01-22
Show

Representation learning on heterogeneous text-rich networks (HTRNs), which consist of multiple types of nodes and edges with each node associated with textual information, is essential for various real-world applications. Given the success of pretrained language models (PLMs) in processing text data, recent efforts have focused on integrating PLMs into HTRN representation learning. These methods typically handle textual and structural information separately, using both PLMs and heterogeneous graph neural networks (HGNNs). However, this separation fails to capture the critical interactions between these two types of information within HTRNs. Additionally, it necessitates an extra alignment step, which is challenging due to the fundamental differences between distinct embedding spaces generated by PLMs and HGNNs. To deal with it, we propose HierPromptLM, a novel pure PLM-based framework that seamlessly models both text data and graph structures without the need for separate processing. Firstly, we develop a Hierarchical Prompt module that employs prompt learning to integrate text data and heterogeneous graph structures at both the node and edge levels, within a unified textual space. Building upon this foundation, we further introduce two innovative HTRN-tailored pretraining tasks to fine-tune PLMs for representation learning by emphasizing the inherent heterogeneity and interactions between textual and structural information within HTRNs. Extensive experiments on two real-world HTRN datasets demonstrate HierPromptLM outperforms state-of-the-art methods, achieving significant improvements of up to 6.08% for node classification and 10.84% for link prediction.

GRAMA: Adaptive Graph Autoregressive Moving Average Models 2025-01-22
Show

Graph State Space Models (SSMs) have recently been introduced to enhance Graph Neural Networks (GNNs) in modeling long-range interactions. Despite their success, existing methods either compromise on permutation equivariance or limit their focus to pairwise interactions rather than sequences. Building on the connection between Autoregressive Moving Average (ARMA) and SSM, in this paper, we introduce GRAMA, a Graph Adaptive method based on a learnable Autoregressive Moving Average (ARMA) framework that addresses these limitations. By transforming from static to sequential graph data, GRAMA leverages the strengths of the ARMA framework, while preserving permutation equivariance. Moreover, GRAMA incorporates a selective attention mechanism for dynamic learning of ARMA coefficients, enabling efficient and flexible long-range information propagation. We also establish theoretical connections between GRAMA and Selective SSMs, providing insights into its ability to capture long-range dependencies. Extensive experiments on 14 synthetic and real-world datasets demonstrate that GRAMA consistently outperforms backbone models and performs competitively with state-of-the-art methods.

KAN KAN Buff Signed Graph Neural Networks? 2025-01-22
Show

Graph Representation Learning aims to create effective embeddings for nodes and edges that encapsulate their features and relationships. Graph Neural Networks (GNNs) leverage neural networks to model complex graph structures. Recently, the Kolmogorov-Arnold Neural Network (KAN) has emerged as a promising alternative to the traditional Multilayer Perceptron (MLP), offering improved accuracy and interpretability with fewer parameters. In this paper, we propose the integration of KANs into Signed Graph Convolutional Networks (SGCNs), leading to the development of KAN-enhanced SGCNs (KASGCN). We evaluate KASGCN on tasks such as signed community detection and link sign prediction to improve embedding quality in signed networks. Our experimental results indicate that KASGCN exhibits competitive or comparable performance to standard SGCNs across the tasks evaluated, with performance variability depending on the specific characteristics of the signed graph and the choice of parameter settings. These findings suggest that KASGCNs hold promise for enhancing signed graph analysis with context-dependent effectiveness.

Exploring Heterogeneity and Uncertainty for Graph-based Cognitive Diagnosis Models in Intelligent Education 2025-01-22
Show

Graph-based Cognitive Diagnosis (CD) has attracted much research interest due to its strong ability on inferring students' proficiency levels on knowledge concepts. While graph-based CD models have demonstrated remarkable performance, we contend that they still cannot achieve optimal performance due to the neglect of edge heterogeneity and uncertainty. Edges involve both correct and incorrect response logs, indicating heterogeneity. Meanwhile, a response log can have uncertain semantic meanings, e.g., a correct log can indicate true mastery or fortunate guessing, and a wrong log can indicate a lack of understanding or a careless mistake. In this paper, we propose an Informative Semantic-aware Graph-based Cognitive Diagnosis model (ISG-CD), which focuses on how to utilize the heterogeneous graph in CD and minimize effects of uncertain edges. Specifically, to explore heterogeneity, we propose a semantic-aware graph neural networks based CD model. To minimize effects of edge uncertainty, we propose an Informative Edge Differentiation layer from an information bottleneck perspective, which suggests keeping a minimal yet sufficient reliable graph for CD in an unsupervised way. We formulate this process as maximizing mutual information between the reliable graph and response logs, while minimizing mutual information between the reliable graph and the original graph. After that, we prove that mutual information maximization can be theoretically converted to the classic binary cross entropy loss function, while minimizing mutual information can be realized by the Hilbert-Schmidt Independence Criterion. Finally, we adopt an alternating training strategy for optimizing learnable parameters of both the semantic-aware graph neural networks based CD model and the edge differentiation layer. Extensive experiments on three real-world datasets have demonstrated the effectiveness of ISG-CD.

Inferring Past Human Actions in Homes with Abductive Reasoning 2025-01-22
Show

Abductive reasoning aims to make the most likely inference for a given set of incomplete observations. In this paper, we introduce "Abductive Past Action Inference", a novel research task aimed at identifying the past actions performed by individuals within homes to reach specific states captured in a single image, using abductive inference. The research explores three key abductive inference problems: past action set prediction, past action sequence prediction, and abductive past action verification. We introduce several models tailored for abductive past action inference, including a relational graph neural network, a relational bilinear pooling model, and a relational transformer model. Notably, the newly proposed object-relational bilinear graph encoder-decoder (BiGED) model emerges as the most effective among all methods evaluated, demonstrating good proficiency in handling the intricacies of the Action Genome dataset. The contributions of this research significantly advance the ability of deep learning models to reason about current scene evidence and make highly plausible inferences about past human actions. This advancement enables a deeper understanding of events and behaviors, which can enhance decision-making and improve system capabilities across various real-world applications such as Human-Robot Interaction and Elderly Care and Health Monitoring. Code and data available at https://github.com/LUNAProject22/AAR

15 pa...

15 pages, 8 figures, Accepted to WACV 2025

A Unified Invariant Learning Framework for Graph Classification 2025-01-22
Show

Invariant learning demonstrates substantial potential for enhancing the generalization of graph neural networks (GNNs) with out-of-distribution (OOD) data. It aims to recognize stable features in graph data for classification, based on the premise that these features causally determine the target label, and their influence is invariant to changes in distribution. Along this line, most studies have attempted to pinpoint these stable features by emphasizing explicit substructures in the graph, such as masked or attentive subgraphs, and primarily enforcing the invariance principle in the semantic space, i.e., graph representations. However, we argue that focusing only on the semantic space may not accurately identify these stable features. To address this, we introduce the Unified Invariant Learning (UIL) framework for graph classification. It provides a unified perspective on invariant graph learning, emphasizing both structural and semantic invariance principles to identify more robust stable features. In the graph space, UIL adheres to the structural invariance principle by reducing the distance between graphons over a set of stable features across different environments. Simultaneously, to confirm semantic invariance, UIL underscores that the acquired graph representations should demonstrate exemplary performance across diverse environments. We present both theoretical and empirical evidence to confirm our method's ability to recognize superior stable features. Moreover, through a series of comprehensive experiments complemented by in-depth analyses, we demonstrate that UIL considerably enhances OOD generalization, surpassing the performance of leading baseline methods. Our codes are available at https://github.com/yongduosui/UIL.

Accepted to KDD 2025
A Hybrid Supervised and Self-Supervised Graph Neural Network for Edge-Centric Applications 2025-01-21
Show

This paper presents a novel graph-based deep learning model for tasks involving relations between two nodes (edge-centric tasks), where the focus lies on predicting relationships and interactions between pairs of nodes rather than node properties themselves. This model combines supervised and self-supervised learning, taking into account for the loss function the embeddings learned and patterns with and without ground truth. Additionally it incorporates an attention mechanism that leverages both node and edge features. The architecture, trained end-to-end, comprises two primary components: embedding generation and prediction. First, a graph neural network (GNN) transform raw node features into dense, low-dimensional embeddings, incorporating edge attributes. Then, a feedforward neural model processes the node embeddings to produce the final output. Experiments demonstrate that our model matches or exceeds existing methods for protein-protein interactions prediction and Gene Ontology (GO) terms prediction. The model also performs effectively with one-hot encoding for node features, providing a solution for the previously unsolved problem of predicting similarity between compounds with unknown structures.

Word and Phrase Features in Graph Convolutional Network for Automatic Question Classification 2025-01-21
Show

Effective question classification is crucial for AI-driven educational tools, enabling adaptive learning systems to categorize questions by skill area, difficulty level, and competence. This classification not only supports educational diagnostics and analytics but also enhances complex tasks like information retrieval and question answering by associating questions with relevant categories. Traditional methods, often based on word embeddings and conventional classifiers, struggle to capture the nuanced relationships in natural language, leading to suboptimal performance. To address this, we propose a novel approach leveraging graph convolutional networks, named Phrase Question-Graph Convolutional Network (PQ-GCN) to better model the inherent structure of questions. By representing questions as graphs-where nodes signify words or phrases and edges denote syntactic or semantic relationships-our method allows the model to learn from the interconnected nature of language more effectively. Additionally, we explore the incorporation of phrase-based features to enhance classification performance on question datasets of various domains and characteristics. Our findings demonstrate that the proposed model, augmented with these features, offer a promising solution for more robust and context-aware question classification, bridging the gap between graph neural network research and practical educational applications of AI.

SVGS-DSGAT: An IoT-Enabled Innovation in Underwater Robotic Object Detection Technology 2025-01-21
Show

With the advancement of Internet of Things (IoT) technology, underwater target detection and tracking have become increasingly important for ocean monitoring and resource management. Existing methods often fall short in handling high-noise and low-contrast images in complex underwater environments, lacking precision and robustness. This paper introduces a novel SVGS-DSGAT model that combines GraphSage, SVAM, and DSGAT modules, enhancing feature extraction and target detection capabilities through graph neural networks and attention mechanisms. The model integrates IoT technology to facilitate real-time data collection and processing, optimizing resource allocation and model responsiveness. Experimental results demonstrate that the SVGS-DSGAT model achieves an mAP of 40.8% on the URPC 2020 dataset and 41.5% on the SeaDronesSee dataset, significantly outperforming existing mainstream models. This IoT-enhanced approach not only excels in high-noise and complex backgrounds but also improves the overall efficiency and scalability of the system. This research provides an effective IoT solution for underwater target detection technology, offering significant practical application value and broad development prospects.

17 pages, 8 figures
SafePowerGraph-HIL: Real-Time HIL Validation of Heterogeneous GNNs for Bridging Sim-to-Real Gap in Power Grids 2025-01-21
Show

As machine learning (ML) techniques gain prominence in power system research, validating these methods' effectiveness under real-world conditions requires real-time hardware-in-the-loop (HIL) simulations. HIL simulation platforms enable the integration of computational models with physical devices, allowing rigorous testing across diverse scenarios critical to system resilience and reliability. In this study, we develop a SafePowerGraph-HIL framework that utilizes HIL simulations on the IEEE 9-bus system, modeled in Hypersim, to generate high-fidelity data, which is then transmitted in real-time via SCADA to an AWS cloud database before being input into a Heterogeneous Graph Neural Network (HGNN) model designed for power system state estimation and dynamic analysis. By leveraging Hypersim's capabilities, we simulate complex grid interactions, providing a robust dataset that captures critical parameters for HGNN training. The trained HGNN is subsequently validated using newly generated data under varied system conditions, demonstrating accuracy and robustness in predicting power system states. The results underscore the potential of integrating HIL with advanced neural network architectures to enhance the real-time operational capabilities of power systems. This approach represents a significant advancement toward the development of intelligent, adaptive control strategies that support the robustness and resilience of evolving power grids.

5 pages, 5 figures
Attending To Syntactic Information In Biomedical Event Extraction Via Graph Neural Networks 2025-01-21
Show

Many models are proposed in the literature on biomedical event extraction(BEE). Some of them use the shortest dependency path(SDP) information to represent the argument classification task. There is an issue with this representation since even missing one word from the dependency parsing graph may totally change the final prediction. To this end, the full adjacency matrix of the dependency graph is used to embed individual tokens using a graph convolutional network(GCN). An ablation study is also done to show the effect of the dependency graph on the overall performance. The results show a significant improvement when dependency graph information is used. The proposed model slightly outperforms state-of-the-art models on BEE over different datasets.

6 figures, 4 tables
Training Graph Neural Networks Using Non-Robust Samples 2025-01-21
Show

Graph Neural Networks (GNNs) are a highly effective neural network architecture for processing graph -- structured data. Unlike traditional neural networks that rely solely on the features of the data as input, GNNs leverage both the graph structure, which represents the relationships between data points, and the feature matrix of the data to optimize their feature representation. This unique capability enables GNNs to achieve superior performance across various tasks. However, it also makes GNNs more susceptible to noise from both the graph structure and data features, which can significantly increase the training difficulty and degrade their performance. To address this issue, this paper proposes a novel method for selecting noise-sensitive training samples from the original training set to construct a smaller yet more effective training set for model training. These samples are used to help improve the model's ability to correctly process data in noisy environments. We have evaluated our approach on three of the most classical GNN models -- GCN, GAT, and GraphSAGE -- as well as three widely used benchmark datasets: Cora, Citeseer, and PubMed. Our experiments demonstrate that the proposed method can substantially boost the training of Graph Neural Networks compared to using randomly sampled training sets of the same size from the original training set and the larger original full training set.

Coarse-to-Fine Lightweight Meta-Embedding for ID-Based Recommendation 2025-01-21
Show

The state-of-the-art recommendation systems have shifted the attention to efficient recommendation, e.g., on-device recommendation, under memory constraints. To this end, the existing methods either focused on the lightweight embeddings for both users and items, or involved on-device systems enjoying the compact embeddings to enhance reusability and reduces space complexity. However, they focus solely on the coarse granularity of embedding, while overlook the fine-grained semantic nuances, to adversarially downgrade the efficacy of meta-embeddings in capturing the intricate relationship over both user and item, consequently resulting into the suboptimal recommendations. In this paper, we aim to study how the meta-embedding can efficiently learn varied grained semantics, together with how the fine-grained meta-embedding can strengthen the representation of coarse-grained meta-embedding. To answer these questions, we develop a novel graph neural networks (GNNs) based recommender where each user and item serves as the node, linked directly to coarse-grained virtual nodes and indirectly to fine-grained virtual nodes, ensuring different grained semantic learning, while disclosing: 1) In contrast to coarse-grained semantics, fine-grained semantics are well captured through sparse meta-embeddings, which adaptively 2) balance the embedding uniqueness and memory constraint. Additionally, the initialization method come up upon SparsePCA, along with a soft thresholding activation function to render the sparseness of the meta-embeddings. We propose a weight bridging update strategy that focuses on matching each coarse-grained meta-embedding with several fine-grained meta-embeddings based on the users/items' semantics. Extensive experiments substantiate our method's superiority over existing baselines. Our code is available at https://github.com/htyjers/C2F-MetaEmbed.

16 pages, 6 figures
OpenFGL: A Comprehensive Benchmark for Federated Graph Learning 2025-01-21
Show

Federated graph learning (FGL) is a promising distributed training paradigm for graph neural networks across multiple local systems without direct data sharing. This approach inherently involves large-scale distributed graph processing, which closely aligns with the challenges and research focuses of graph-based data systems. Despite the proliferation of FGL, the diverse motivations from real-world applications, spanning various research backgrounds and settings, pose a significant challenge to fair evaluation. To fill this gap, we propose OpenFGL, a unified benchmark designed for the primary FGL scenarios: Graph-FL and Subgraph-FL. Specifically, OpenFGL includes 42 graph datasets from 18 application domains, 8 federated data simulation strategies that emphasize different graph properties, and 5 graph-based downstream tasks. Additionally, it offers 18 recently proposed SOTA FGL algorithms through a user-friendly API, enabling a thorough comparison and comprehensive evaluation of their effectiveness, robustness, and efficiency. Our empirical results demonstrate the capabilities of FGL while also highlighting its potential limitations, providing valuable insights for future research in this growing field, particularly in fostering greater interdisciplinary collaboration between FGL and data systems.

Accep...

Accepted by VLDB 2025

Large Language Models Meet Graph Neural Networks for Text-Numeric Graph Reasoning 2025-01-21
Show

In real-world scientific discovery, human beings always make use of the accumulated prior knowledge with imagination pick select one or a few most promising hypotheses from large and noisy data analysis results. In this study, we introduce a new type of graph structure, the text-numeric graph (TNG), which is defined as graph entities and associations have both text-attributed information and numeric information. The TNG is an ideal data structure model for novel scientific discovery via graph reasoning because it integrates human-understandable textual annotations or prior knowledge, with numeric values that represent the observed or activation levels of graph entities or associations in different samples. Together both the textual information and numeric values determine the importance of graph entities and associations in graph reasoning for novel scientific knowledge discovery. We further propose integrating large language models (LLMs) and graph neural networks (GNNs) to analyze the TNGs for graph understanding and reasoning. To demonstrate the utility, we generated the text-omic(numeric) signaling graphs (TOSG), as one type of TNGs, in which all graphs have the same entities, associations and annotations, but have sample-specific entity numeric (omic) values using single cell RNAseq (scRNAseq) datasets of different diseases. We proposed joint LLM-GNN models for key entity mining and signaling pathway mining on the TOSGs. The evaluation results showed the LLM-GNN and TNGs models significantly improve classification accuracy and network inference. In conclusion, the TNGs and joint LLM-GNN models are important approaches for scientific discovery.

29 pages, 6 figures
DCOR: Anomaly Detection in Attributed Networks via Dual Contrastive Learning Reconstruction 2025-01-20
Show

Anomaly detection using a network-based approach is one of the most efficient ways to identify abnormal events such as fraud, security breaches, and system faults in a variety of applied domains. While most of the earlier works address the complex nature of graph-structured data and predefined anomalies, the impact of data attributes and emerging anomalies are often neglected. This paper introduces DCOR, a novel approach on attributed networks that integrates reconstruction-based anomaly detection with Contrastive Learning. Utilizing a Graph Neural Network (GNN) framework, DCOR contrasts the reconstructed adjacency and feature matrices from both the original and augmented graphs to detect subtle anomalies. We employed comprehensive experimental studies on benchmark datasets through standard evaluation measures. The results show that DCOR significantly outperforms state-of-the-art methods. Obtained results demonstrate the efficacy of proposed approach in attributed networks with the potential of uncovering new patterns of anomalies.

Accep...

Accepted at the Thirteenth International Conference on Complex Networks and Their Applications

Leveraging graph neural networks and mobility data for COVID-19 forecasting 2025-01-20
Show

The COVID-19 pandemic has victimized over 7 million people to date, prompting diverse research efforts. Spatio-temporal models combining mobility data with machine learning have gained attention for disease forecasting. Here, we explore Graph Convolutional Recurrent Network (GCRN) and Graph Convolutional Long Short-Term Memory (GCLSTM), which combine the power of Graph Neural Networks (GNN) with traditional architectures that deal with sequential data. The aim is to forecast future values of COVID-19 cases in Brazil and China by leveraging human mobility networks, whose nodes represent geographical locations and links are flows of vehicles or people. We show that employing backbone extraction to filter out negligible connections in the mobility network enhances predictive stability. Comparing regression and classification tasks demonstrates that binary classification yields smoother, more interpretable results. Interestingly, we observe qualitatively equivalent results for both Brazil and China datasets by introducing sliding windows of variable size and prediction horizons. Compared to prior studies, introducing the sliding window and the network backbone extraction strategies yields improvements of about 80% in root mean squared errors.

Graph Defense Diffusion Model 2025-01-20
Show

Graph Neural Networks (GNNs) demonstrate significant potential in various applications but remain highly vulnerable to adversarial attacks, which can greatly degrade their performance. Existing graph purification methods attempt to address this issue by filtering attacked graphs; however, they struggle to effectively defend against multiple types of adversarial attacks simultaneously due to their limited flexibility, and they lack comprehensive modeling of graph data due to their heavy reliance on heuristic prior knowledge. To overcome these challenges, we propose a more versatile approach for defending against adversarial attacks on graphs. In this work, we introduce the Graph Defense Diffusion Model (GDDM), a flexible purification method that leverages the denoising and modeling capabilities of diffusion models. The iterative nature of diffusion models aligns well with the stepwise process of adversarial attacks, making them particularly suitable for defense. By iteratively adding and removing noise, GDDM effectively purifies attacked graphs, restoring their original structure and features. Our GDDM consists of two key components: (1) Graph Structure-Driven Refiner, which preserves the basic fidelity of the graph during the denoising process, and ensures that the generated graph remains consistent with the original scope; and (2) Node Feature-Constrained Regularizer, which removes residual impurities from the denoised graph, further enhances the purification effect. Additionally, we design tailored denoising strategies to handle different types of adversarial attacks, improving the model's adaptability to various attack scenarios. Extensive experiments conducted on three real-world datasets demonstrate that GDDM outperforms state-of-the-art methods in defending against a wide range of adversarial attacks, showcasing its robustness and effectiveness.

13 pages,5 figures
UniGraph: Learning a Unified Cross-Domain Foundation Model for Text-Attributed Graphs 2025-01-20
Show

Foundation models like ChatGPT and GPT-4 have revolutionized artificial intelligence, exhibiting remarkable abilities to generalize across a wide array of tasks and applications beyond their initial training objectives. However, graph learning has predominantly focused on single-graph models, tailored to specific tasks or datasets, lacking the ability to transfer learned knowledge to different domains. This limitation stems from the inherent complexity and diversity of graph structures, along with the different feature and label spaces specific to graph data. In this paper, we recognize text as an effective unifying medium and employ Text-Attributed Graphs (TAGs) to leverage this potential. We present our UniGraph framework, designed to learn a foundation model for TAGs, which is capable of generalizing to unseen graphs and tasks across diverse domains. Unlike single-graph models that use pre-computed node features of varying dimensions as input, our approach leverages textual features for unifying node representations, even for graphs such as molecular graphs that do not naturally have textual features. We propose a novel cascaded architecture of Language Models (LMs) and Graph Neural Networks (GNNs) as backbone networks. Additionally, we propose the first pre-training algorithm specifically designed for large-scale self-supervised learning on TAGs, based on Masked Graph Modeling. We introduce graph instruction tuning using Large Language Models (LLMs) to enable zero-shot prediction ability. Our comprehensive experiments across various graph learning tasks and domains demonstrate the model's effectiveness in self-supervised representation learning on unseen graphs, few-shot in-context transfer, and zero-shot transfer, even surpassing or matching the performance of GNNs that have undergone supervised training on target datasets.

KDD 2025
Graph neural networks informed locally by thermodynamics 2025-01-20
Show

Thermodynamics-informed neural networks employ inductive biases for the enforcement of the first and second principles of thermodynamics. To construct these biases, a metriplectic evolution of the system is assumed. This provides excellent results, when compared to uninformed, black box networks. While the degree of accuracy can be increased in one or two orders of magnitude, in the case of graph networks, this requires assembling global Poisson and dissipation matrices, which breaks the local structure of such networks. In order to avoid this drawback, a local version of the metriplectic biases has been developed in this work, which avoids the aforementioned matrix assembly, thus preserving the node-by-node structure of the graph networks. We apply this framework for examples in the fields of solid and fluid mechanics. Our approach demonstrates significant computational efficiency and strong generalization capabilities, accurately making inferences on examples significantly different from those encountered during training.

Advancing Multi-Party Dialogue Systems with Speaker-ware Contrastive Learning 2025-01-20
Show

Dialogue response generation has made significant progress, but most research has focused on dyadic dialogue. In contrast, multi-party dialogues involve more participants, each potentially discussing different topics, making the task more complex. Current methods often rely on graph neural networks to model dialogue context, which helps capture the structural dynamics of multi-party conversations. However, these methods are heavily dependent on intricate graph structures and dataset annotations, and they often overlook the distinct speaking styles of participants. To address these challenges, we propose CMR, a Contrastive learning-based Multi-party dialogue Response generation model. CMR uses self-supervised contrastive learning to better distinguish "who says what." Additionally, by comparing speakers within the same conversation, the model captures differences in speaking styles and thematic transitions. To the best of our knowledge, this is the first approach to apply contrastive learning in multi-party dialogue generation. Experimental results show that CMR significantly outperforms state-of-the-art models in multi-party dialogue response tasks.

Spatiotemporal Air Quality Mapping in Urban Areas Using Sparse Sensor Data, Satellite Imagery, Meteorological Factors, and Spatial Features 2025-01-20
Show

Monitoring air pollution is crucial for protecting human health from exposure to harmful substances. Traditional methods of air quality monitoring, such as ground-based sensors and satellite-based remote sensing, face limitations due to high deployment costs, sparse sensor coverage, and environmental interferences. To address these challenges, this paper proposes a framework for high-resolution spatiotemporal Air Quality Index (AQI) mapping using sparse sensor data, satellite imagery, and various spatiotemporal factors. By leveraging Graph Neural Networks (GNNs), we estimate AQI values at unmonitored locations based on both spatial and temporal dependencies. The framework incorporates a wide range of environmental features, including meteorological data, road networks, points of interest (PoIs), population density, and urban green spaces, which enhance prediction accuracy. We illustrate the use of our approach through a case study in Lahore, Pakistan, where multi-resolution data is used to generate the air quality index map at a fine spatiotemporal scale.

Fast instance-specific algorithm configuration with graph neural network 2025-01-20
Show

Combinatorial optimization (CO) problems are pivotal across various industrial applications, where the speed of solving these problems is crucial. Improving the performance of CO solvers across diverse input instances requires fine-tuning solver parameters for each instance. However, this tuning process is time-consuming, and the time required increases with the number of instances. To address this, a method called instance-specific algorithm configuration (ISAC) has been devised. This approach involves two main steps: training and execution. During the training step, features are extracted from various instances and then grouped into clusters. For each cluster, parameters are fine-tuned. This cluster-specific tuning process results in a set of generalized parameters for instances belonging to each class. In the execution step, features are extracted from an unknown instance to determine its cluster, and the corresponding pre-tuned parameters are applied. Generally, the running time of a solver is evaluated by the time to solution ($TTS$). However, methods like ISAC require preprocessing. Therefore, the total execution time is $T_{tot}=TTS+T_{tune}$, where $T_{tune}$ represents the tuning time. While the goal is to minimize $T_{tot}$, it is important to note that extracting features in the ISAC method requires a certain amount of computational time. The extracting features include summary statistics of the solver execution logs, which takes several 10 seconds. This research presents a method to significantly reduce the time of the ISAC execution step by streamlining feature extraction and class determination with a graph neural network. Experimental results show that $T_{tune}$ in the execution step, which take several 10 seconds in the original ISAC manner, could be reduced to sub-seconds.

Mitigating Spatial Disparity in Urban Prediction Using Residual-Aware Spatiotemporal Graph Neural Networks: A Chicago Case Study 2025-01-20
Show

Urban prediction tasks, such as forecasting traffic flow, temperature, and crime rates, are crucial for efficient urban planning and management. However, existing Spatiotemporal Graph Neural Networks (ST-GNNs) often rely solely on accuracy, overlooking spatial and demographic disparities in their predictions. This oversight can lead to imbalanced resource allocation and exacerbate existing inequities in urban areas. This study introduces a Residual-Aware Attention (RAA) Block and an equality-enhancing loss function to address these disparities. By adapting the adjacency matrix during training and incorporating spatial disparity metrics, our approach aims to reduce local segregation of residuals and errors. We applied our methodology to urban prediction tasks in Chicago, utilizing a travel demand dataset as an example. Our model achieved a 48% significant improvement in fairness metrics with only a 9% increase in error metrics. Spatial analysis of residual distributions revealed that models with RAA Blocks produced more equitable prediction results, particularly by reducing errors clustered in central regions. Attention maps demonstrated the model's ability to dynamically adjust focus, leading to more balanced predictions. Case studies of various community areas in Chicago further illustrated the effectiveness of our approach in addressing spatial and demographic disparities, supporting more balanced and equitable urban planning and policy-making.

Achieving Network Resilience through Graph Neural Network-enabled Deep Reinforcement Learning 2025-01-19
Show

Deep reinforcement learning (DRL) has been widely used in many important tasks of communication networks. In order to improve the perception ability of DRL on the network, some studies have combined graph neural networks (GNNs) with DRL, which use the GNNs to extract unstructured features of the network. However, as networks continue to evolve and become increasingly complex, existing GNN-DRL methods still face challenges in terms of scalability and robustness. Moreover, these methods are inadequate for addressing network security issues. From the perspective of security and robustness, this paper explores the solution of combining GNNs with DRL to build a resilient network. This article starts with a brief tutorial of GNNs and DRL, and introduces their existing applications in networks. Furthermore, we introduce the network security methods that can be strengthened by GNN-DRL approaches. Then, we designed a framework based on GNN-DRL to defend against attacks and enhance network resilience. Additionally, we conduct a case study using an encrypted traffic dataset collected from real IoT environments, and the results demonstrated the effectiveness and superiority of our framework. Finally, we highlight key open challenges and opportunities for enhancing network resilience with GNN-DRL.

Machine Learning Surrogates for Optimizing Transportation Policies with Agent-Based Models 2025-01-19
Show

Rapid urbanization and growing urban populations worldwide present significant challenges for cities, including increased traffic congestion and air pollution. Effective strategies are needed to manage traffic volumes and reduce emissions. In practice, traditional traffic flow simulations are used to test those strategies. However, high computational intensity usually limits their applicability in investigating a magnitude of different scenarios to evaluate best policies. This paper presents a first approach of using Graph Neural Networks (GNN) as surrogates for large-scale agent-based simulation models. In a case study using the MATSim model of Paris, the GNN effectively learned the impacts of capacity reduction policies on citywide traffic flow. Performance analysis across various road types and scenarios revealed that the GNN could accurately capture policy-induced effects on edge-based traffic volumes, particularly on roads directly affected by the policies and those with higher traffic volumes.

Image Segmentation: Inducing graph-based learning 2025-01-19
Show

This study explores the potential of graph neural networks (GNNs) to enhance semantic segmentation across diverse image modalities. We evaluate the effectiveness of a novel GNN-based U-Net architecture on three distinct datasets: PascalVOC, a standard benchmark for natural image segmentation, WoodScape, a challenging dataset of fisheye images commonly used in autonomous driving, introducing significant geometric distortions; and ISIC2016, a dataset of dermoscopic images for skin lesion segmentation. We compare our proposed UNet-GNN model against established convolutional neural networks (CNNs) based segmentation models, including U-Net and U-Net++, as well as the transformer-based SwinUNet. Unlike these methods, which primarily rely on local convolutional operations or global self-attention, GNNs explicitly model relationships between image regions by constructing and operating on a graph representation of the image features. This approach allows the model to capture long-range dependencies and complex spatial relationships, which we hypothesize will be particularly beneficial for handling geometric distortions present in fisheye imagery and capturing intricate boundaries in medical images. Our analysis demonstrates the versatility of GNNs in addressing diverse segmentation challenges and highlights their potential to improve segmentation accuracy in various applications, including autonomous driving and medical image analysis.

About

Daily ArXiv Papers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages