The project automatically fetches the latest papers from arXiv based on keywords.
The subheadings in the README file represent the search keywords.
Only the most recent articles for each keyword are retained, up to a maximum of 100 papers.
You can click the 'Watch' button to receive daily email notifications.
Last update: 2024-12-05
Title | Date | Abstract | Comment |
---|---|---|---|
LLMForecaster: Improving Seasonal Event Forecasts with Unstructured Textual Data | 2024-12-03 | ShowModern time-series forecasting models often fail to make full use of rich unstructured information about the time series themselves. This lack of proper conditioning can lead to obvious model failures; for example, models may be unaware of the details of a particular product, and hence fail to anticipate seasonal surges in customer demand in the lead up to major exogenous events like holidays for clearly relevant products. To address this shortcoming, this paper introduces a novel forecast post-processor -- which we call LLMForecaster -- that fine-tunes large language models (LLMs) to incorporate unstructured semantic and contextual information and historical data to improve the forecasts from an existing demand forecasting pipeline. In an industry-scale retail application, we demonstrate that our technique yields statistically significantly forecast improvements across several sets of products subject to holiday-driven demand surges. |
Prese...Presented at NeurIPS Time Series in the Age of Large Models (2024) |
Quantile-Crossing Spectrum and Spline Autoregression Estimation | 2024-12-03 | ShowThe quantile-crossing spectrum is the spectrum of quantile-crossing processes created from a time series by the indicator function that shows whether or not the time series lies above or below a given quantile at a given time. This bivariate function of frequency and quantile level provides a richer view of serial dependence than that offered by the ordinary spectrum. We propose a new method for estimating the quantile-crossing spectrum as a bivariate function of frequency and quantile level. The proposed method, called spline autoregression (SAR), jointly fits an AR model to the quantile-crossing series across multiple quantiles; the AR coefficients are represented as spline functions of the quantile level and penalized for their roughness. Numerical experiments show that when the underlying spectrum is smooth in quantile level the proposed method is able to produce more accurate estimates in comparison with the alternative that ignores the smoothness. |
|
F-SE-LSTM: A Time Series Anomaly Detection Method with Frequency Domain Information | 2024-12-03 | ShowWith the development of society, time series anomaly detection plays an important role in network and IoT services. However, most existing anomaly detection methods directly analyze time series in the time domain and cannot distinguish some relatively hidden anomaly sequences. We attempt to analyze the impact of frequency on time series from a frequency domain perspective, thus proposing a new time series anomaly detection method called F-SE-LSTM. This method utilizes two sliding windows and fast Fourier transform (FFT) to construct a frequency matrix. Simultaneously, Squeeze-and-Excitation Networks (SENet) and Long Short-Term Memory (LSTM) are employed to extract frequency-related features within and between periods. Through comparative experiments on multiple datasets such as Yahoo Webscope S5 and Numenta Anomaly Benchmark, the results demonstrate that the frequency matrix constructed by F-SE-LSTM exhibits better discriminative ability than ordinary time domain and frequency domain data. Furthermore, F-SE-LSTM outperforms existing state-of-the-art deep learning anomaly detection methods in terms of anomaly detection capability and execution efficiency. |
14 pages, 7 figures |
Time-Series-Informed Closed-loop Learning for Sequential Decision Making and Control | 2024-12-03 | ShowClosed-loop performance of sequential decision making algorithms, such as model predictive control, depends strongly on the parameters of cost functions, models, and constraints. Bayesian optimization is a common approach to learning these parameters based on closed-loop experiments. However, traditional Bayesian optimization approaches treat the learning problem as a black box, ignoring valuable information and knowledge about the structure of the underlying problem, resulting in slow convergence and high experimental resource use. We propose a time-series-informed optimization framework that incorporates intermediate performance evaluations from early iterations of each experimental episode into the learning procedure. Additionally, probabilistic early stopping criteria are proposed to terminate unpromising experiments, significantly reducing experimental time. Simulation results show that our approach achieves baseline performance with approximately half the resources. Moreover, with the same resource budget, our approach outperforms the baseline in terms of final closed-loop performance, highlighting its efficiency in sequential decision making scenarios. |
12 pa...12 pages, 3 figures, submitted to L4DC 2025 |
PITN: Physics-Informed Temporal Networks for Cuffless Blood Pressure Estimation | 2024-12-03 | ShowMonitoring blood pressure with non-invasive sensors has gained popularity for providing comfortable user experiences, one of which is a significant function of smart wearables. Although providing a comfortable user experience, such methods are suffering from the demand for a significant amount of realistic data to train an individual model for each subject, especially considering the invasive or obtrusive BP ground-truth measurements. To tackle this challenge, we introduce a novel physics-informed temporal network~(PITN) with adversarial contrastive learning to enable precise BP estimation with very limited data. Specifically, we first enhance the physics-informed neural network~(PINN) with the temporal block for investigating BP dynamics' multi-periodicity for personal cardiovascular cycle modeling and temporal variation. We then employ adversarial training to generate extra physiological time series data, improving PITN's robustness in the face of sparse subject-specific training data. Furthermore, we utilize contrastive learning to capture the discriminative variations of cardiovascular physiologic phenomena. This approach aggregates physiological signals with similar blood pressure values in latent space while separating clusters of samples with dissimilar blood pressure values. Experiments on three widely-adopted datasets with different modailties (\emph{i.e.,} bioimpedance, PPG, millimeter-wave) demonstrate the superiority and effectiveness of the proposed methods over previous state-of-the-art approaches. The code is available at~\url{https://github.com/Zest86/ACL-PITN}. |
12 pages, 6 figures |
LLM-ABBA: Understanding time series via symbolic approximation | 2024-12-03 | ShowThe success of large language models (LLMs) for time series has been demonstrated in previous work. Utilizing a symbolic time series representation, one can efficiently bridge the gap between LLMs and time series. However, the remaining challenge is to exploit the semantic information hidden in time series by using symbols or existing tokens of LLMs, while aligning the embedding space of LLMs according to the hidden information of time series. The symbolic time series approximation (STSA) method called adaptive Brownian bridge-based symbolic aggregation (ABBA) shows outstanding efficacy in preserving salient time series features by modeling time series patterns in terms of amplitude and period while using existing tokens of LLMs. In this paper, we introduce a method, called LLM-ABBA, that integrates ABBA into large language models for various downstream time series tasks. By symbolizing time series, LLM-ABBA compares favorably to the recent state-of-the-art (SOTA) in UCR and three medical time series classification tasks. Meanwhile, a fixed-polygonal chain trick in ABBA is introduced to \kc{avoid obvious drifting} during prediction tasks by significantly mitigating the effects of cumulative error arising from misused symbols during the transition from symbols to numerical values. In time series regression tasks, LLM-ABBA achieves the new SOTA on Time Series Extrinsic Regression (TSER) benchmarks. LLM-ABBA also shows competitive prediction capability compared to recent SOTA time series prediction results. We believe this framework can also seamlessly extend to other time series tasks. |
|
FSMLP: Modelling Channel Dependencies With Simplex Theory Based Multi-Layer Perceptions In Frequency Domain | 2024-12-03 | ShowTime series forecasting (TSF) plays a crucial role in various domains, including web data analysis, energy consumption prediction, and weather forecasting. While Multi-Layer Perceptrons (MLPs) are lightweight and effective for capturing temporal dependencies, they are prone to overfitting when used to model inter-channel dependencies. In this paper, we investigate the overfitting problem in channel-wise MLPs using Rademacher complexity theory, revealing that extreme values in time series data exacerbate this issue. To mitigate this issue, we introduce a novel Simplex-MLP layer, where the weights are constrained within a standard simplex. This strategy encourages the model to learn simpler patterns and thereby reducing overfitting to extreme values. Based on the Simplex-MLP layer, we propose a novel \textbf{F}requency \textbf{S}implex \textbf{MLP} (FSMLP) framework for time series forecasting, comprising of two kinds of modules: \textbf{S}implex \textbf{C}hannel-\textbf{W}ise MLP (SCWM) and \textbf{F}requency \textbf{T}emporal \textbf{M}LP (FTM). The SCWM effectively leverages the Simplex-MLP to capture inter-channel dependencies, while the FTM is a simple yet efficient temporal MLP designed to extract temporal information from the data. Our theoretical analysis shows that the upper bound of the Rademacher Complexity for Simplex-MLP is lower than that for standard MLPs. Moreover, we validate our proposed method on seven benchmark datasets, demonstrating significant improvements in forecasting accuracy and efficiency, while also showcasing superior scalability. Additionally, we demonstrate that Simplex-MLP can improve other methods that use channel-wise MLP to achieve less overfitting and improved performance. Code are available \href{https://github.com/FMLYD/FSMLP}{\textcolor{red}{here}}. |
|
Zero-shot forecasting of chaotic systems | 2024-12-03 | ShowTime-series forecasting is a challenging problem that traditionally requires specialized models custom-trained for the specific task at hand. Recently, inspired by the success of large language models, foundation models pre-trained on vast amounts of time-series data from diverse domains have emerged as a promising candidate for general-purpose time-series forecasting. The defining characteristic of these foundation models is their ability to perform zero-shot learning, that is, forecasting a new system from limited context data without explicit re-training or fine-tuning. Here, we evaluate whether the zero-shot learning paradigm extends to the challenging task of forecasting chaotic systems. Across 135 distinct chaotic dynamical systems and |
Added...Added new experiments probing in-context learning and a simple mechanism for zero-shot forecasting |
Comparing Clustering Approaches for Smart Meter Time Series: Investigating the Influence of Dataset Properties on Performance | 2024-12-02 | ShowThe widespread adoption of smart meters for monitoring energy consumption has generated vast quantities of high-resolution time series data which remains underutilised. While clustering has emerged as a fundamental tool for mining smart meter time series (SMTS) data, selecting appropriate clustering methods remains challenging despite numerous comparative studies. These studies often rely on problematic methodologies and consider a limited scope of methods, frequently overlooking compelling methods from the broader time series clustering literature. Consequently, they struggle to provide dependable guidance for practitioners designing their own clustering approaches. This paper presents a comprehensive comparative framework for SMTS clustering methods using expert-informed synthetic datasets that emphasise peak consumption behaviours as fundamental cluster concepts. Using a phased methodology, we first evaluated 31 distance measures and 8 representation methods using leave-one-out classification, then examined the better-suited methods in combination with 11 clustering algorithms. We further assessed the robustness of these combinations to systematic changes in key dataset properties that affect clustering performance on real-world datasets, including cluster balance, noise, and the presence of outliers. Our results revealed that methods accommodating local temporal shifts while maintaining amplitude sensitivity, particularly Dynamic Time Warping and |
|
TSA on AutoPilot: Self-tuning Self-supervised Time Series Anomaly Detection | 2024-12-02 | ShowTime series anomaly detection (TSAD) finds many applications such as monitoring environmental sensors, industry KPIs, patient biomarkers, etc. A two-fold challenge for TSAD is a versatile and unsupervised model that can detect various different types of time series anomalies (spikes, discontinuities, trend shifts, etc.) without any labeled data. Modern neural networks have outstanding ability in modeling complex time series. Self-supervised models in particular tackle unsupervised TSAD by transforming the input via various augmentations to create pseudo anomalies for training. However, their performance is sensitive to the choice of augmentation, which is hard to choose in practice, while there exists no effort in the literature on data augmentation tuning for TSAD without labels. Our work aims to fill this gap. We introduce TSAP for TSA "on autoPilot", which can (self-)tune augmentation hyperparameters end-to-end. It stands on two key components: a differentiable augmentation architecture and an unsupervised validation loss to effectively assess the alignment between augmentation type and anomaly type. Case studies show TSAP's ability to effectively select the (discrete) augmentation type and associated (continuous) hyperparameters. In turn, it outperforms established baselines, including SOTA self-supervised models, on diverse TSAD tasks exhibiting different anomaly types. |
Accep...Accepted at NeurIPS workshop on Self-Supervised Learning |
FGATT: A Robust Framework for Wireless Data Imputation Using Fuzzy Graph Attention Networks and Transformer Encoders | 2024-12-02 | ShowMissing data is a pervasive challenge in wireless networks and many other domains, often compromising the performance of machine learning and deep learning models. To address this, we propose a novel framework, FGATT, that combines the Fuzzy Graph Attention Network (FGAT) with the Transformer encoder to perform robust and accurate data imputation. FGAT leverages fuzzy rough sets and graph attention mechanisms to capture spatial dependencies dynamically, even in scenarios where predefined spatial information is unavailable. The Transformer encoder is employed to model temporal dependencies, utilizing its self-attention mechanism to focus on significant time-series patterns. A self-adaptive graph construction method is introduced to enable dynamic connectivity learning, ensuring the framework's applicability to a wide range of wireless datasets. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in imputation accuracy and robustness, particularly in scenarios with substantial missing data. The proposed model is well-suited for applications in wireless sensor networks and IoT environments, where data integrity is critical. |
|
Reconstructing shared dynamics with a deep neural network | 2024-12-02 | ShowDetermining hidden shared patterns behind dynamic phenomena can be a game-changer in multiple areas of research. Here we present the principles and show a method to identify hidden shared dynamics from time series by a two-module, feedforward neural network architecture: the Mapper-Coach network. We reconstruct unobserved, continuous latent variable input, the time series generated by a chaotic logistic map, from the observed values of two simultaneously forced chaotic logistic maps. The network has been trained to predict one of the observed time series based on its own past and conditioned on the other observed time series by error-back propagation. It was shown, that after this prediction have been learned successfully, the activity of the bottleneck neuron, connecting the mapper and the coach module, correlated strongly with the latent shared input variable. The method has the potential to reveal hidden components of dynamical systems, where experimental intervention is not possible. |
|
MPBD-LSTM: A Predictive Model for Colorectal Liver Metastases Using Time Series Multi-phase Contrast-Enhanced CT Scans | 2024-12-02 | ShowColorectal cancer is a prevalent form of cancer, and many patients develop colorectal cancer liver metastasis (CRLM) as a result. Early detection of CRLM is critical for improving survival rates. Radiologists usually rely on a series of multi-phase contrast-enhanced computed tomography (CECT) scans done during follow-up visits to perform early detection of the potential CRLM. These scans form unique five-dimensional data (time, phase, and axial, sagittal, and coronal planes in 3D CT). Most of the existing deep learning models can readily handle four-dimensional data (e.g., time-series 3D CT images) and it is not clear how well they can be extended to handle the additional dimension of phase. In this paper, we build a dataset of time-series CECT scans to aid in the early diagnosis of CRLM, and build upon state-of-the-art deep learning techniques to evaluate how to best predict CRLM. Our experimental results show that a multi-plane architecture based on 3D bi-directional LSTM, which we call MPBD-LSTM, works best, achieving an area under curve (AUC) of 0.79. On the other hand, analysis of the results shows that there is still great room for further improvement. |
|
Enhancing Crop Segmentation in Satellite Image Time Series with Transformer Networks | 2024-12-02 | ShowRecent studies have shown that Convolutional Neural Networks (CNNs) achieve impressive results in crop segmentation of Satellite Image Time Series (SITS). However, the emergence of transformer networks in various vision tasks raises the question of whether they can outperform CNNs in this task as well. This paper presents a revised version of the Transformer-based Swin UNETR model, specifically adapted for crop segmentation of SITS. The proposed model demonstrates significant advancements, achieving a validation accuracy of 96.14% and a test accuracy of 95.26% on the Munich dataset, surpassing the previous best results of 93.55% for validation and 92.94% for the test. Additionally, the model's performance on the Lombardia dataset is comparable to UNet3D and superior to FPN and DeepLabV3. Experiments of this study indicate that the model will likely achieve comparable or superior accuracy to CNNs while requiring significantly less training time. These findings highlight the potential of transformer-based architectures for crop segmentation in SITS, opening new avenues for remote sensing applications. |
|
Discovering group dynamics in coordinated time series via hierarchical recurrent switching-state models | 2024-12-02 | ShowWe seek a computationally efficient model for a collection of time series arising from multiple interacting entities (a.k.a. "agents"). Recent models of spatiotemporal patterns across individuals fail to incorporate explicit system-level collective behavior that can influence the trajectories of individual entities. To address this gap in the literature, we present a new hierarchical switching-state model that can be trained in an unsupervised fashion to simultaneously learn both system-level and individual-level dynamics. We employ a latent system-level discrete state Markov chain that provides top-down influence on latent entity-level chains which in turn govern the emission of each observed time series. Recurrent feedback from the observations to the latent chains at both entity and system levels allows recent situational context to inform how dynamics unfold at all levels in bottom-up fashion. We hypothesize that including both top-down and bottom-up influences on group dynamics will improve interpretability of the learned dynamics and reduce error when forecasting. Our hierarchical switching recurrent dynamical model can be learned via closed-form variational coordinate ascent updates to all latent chains that scale linearly in the number of entities. This is asymptotically no more costly than fitting a separate model for each entity. Analysis of both synthetic data and real basketball team movements suggests our lean parametric model can achieve competitive forecasts compared to larger neural network models that require far more computational resources. Further experiments on soldier data as well as a synthetic task with 64 cooperating entities show how our approach can yield interpretable insights about team dynamics over time. |
|
Bridging the Gap Between Data-Driven and Theory-Driven Modelling -- Leveraging Causal Machine Learning for Integrative Modelling of Dynamical Systems | 2024-12-02 | ShowClassical machine learning techniques often struggle with overfitting and unreliable predictions when exposed to novel conditions. Introducing causality into the modelling process offers a promising way to mitigate these challenges by enhancing interpretability and predictive reliability. However, constructing an initial causal graph manually using domain knowledge is a time-consuming, particularly in complex time series with numerous variables. To address this, causal discovery algorithms can provide a preliminary causal structure that domain experts can refine. This study investigates causal feature selection with domain knowledge using a data centre system as an example. We use simulated time-series data to compare different causal feature selection with traditional machine-learning feature selection methods. Our results show that predictions based on causal features are more robust and interpretable compared to those derived from traditional methods. These findings underscore the potential of combining causal discovery algorithms with human expertise to improve machine learning applications. |
11 pa...11 pages, 11 figures and 5 tables |
A Self-Supervised Task for Fault Detection in Satellite Multivariate Time Series | 2024-12-02 | ShowIn the space sector, due to environmental conditions and restricted accessibility, robust fault detection methods are imperative for ensuring mission success and safeguarding valuable assets. This work proposes a novel approach leveraging Physics-Informed Real NVP neural networks, renowned for their ability to model complex and high-dimensional distributions, augmented with a self-supervised task based on sensors' data permutation. It focuses on enhancing fault detection within the satellite multivariate time series. The experiments involve various configurations, including pre-training with self-supervision, multi-task learning, and standalone self-supervised training. Results indicate significant performance improvements across all settings. In particular, employing only the self-supervised loss yields the best overall results, suggesting its efficacy in guiding the network to extract relevant features for fault detection. This study presents a promising direction for improving fault detection in space systems and warrants further exploration in other datasets and applications. |
SPAIC...SPAICE: AI in and for Space, 2024 |
On the Weak Convergence of the Function-Indexed Sequential Empirical Process and its Smoothed Analogue under Nonstationarity | 2024-12-02 | ShowWe study the sequential empirical process indexed by general function classes and its smoothed set-indexed analogue. Sufficient conditions for asymptotic equicontinuity and weak convergence are provided for nonstationary arrays of time series, in terms of uniform moment bounds for partial sums and, for the set-indexed smoothed process, |
|
How Much Can Time-related Features Enhance Time Series Forecasting? | 2024-12-02 | ShowRecent advancements in long-term time series forecasting (LTSF) have primarily focused on capturing cross-time and cross-variate (channel) dependencies within historical data. However, a critical aspect often overlooked by many existing methods is the explicit incorporation of \textbf{time-related features} (e.g., season, month, day of the week, hour, minute), which are essential components of time series data. The absence of this explicit time-related encoding limits the ability of current models to capture cyclical or seasonal trends and long-term dependencies, especially with limited historical input. To address this gap, we introduce a simple yet highly efficient module designed to encode time-related features, Time Stamp Forecaster (TimeSter), thereby enhancing the backbone's forecasting performance. By integrating TimeSter with a linear backbone, our model, TimeLinear, significantly improves the performance of a single linear projector, reducing MSE by an average of 23% on benchmark datasets such as Electricity and Traffic. Notably, TimeLinear achieves these gains while maintaining exceptional computational efficiency, delivering results that are on par with or exceed state-of-the-art models, despite using a fraction of the parameters. |
|
Generalized Principal Component Analysis for Large-dimensional Matrix Factor Model | 2024-12-02 | ShowMatrix factor models have been growing popular dimension reduction tools for large-dimensional matrix time series. However, the heteroscedasticity of the idiosyncratic components has barely received any attention. Starting from the pseudo likelihood function, this paper introduces a Generalized Principal Component Analysis (GPCA) method for matrix factor model which takes the heteroscedasticity into account. Theoretically, we first derive the asymptotic distributions of the GPCA estimators by assuming the separable covariance matrices are known in advance. We then propose adaptive thresholding estimators for the separable covariance matrices and derive their convergence rates, which is of independent interest. We also show that this would not alter the asymptotic distributions of the GPCA estimators under certain regular sparsity conditions in the high-dimensional covariance matrix estimation literature. The GPCA estimators are shown to be more efficient than the state-of-the-art methods under certain heteroscedasticity conditions. Thorough numerical studies are conducted to demonstrate the superiority of our method over the existing approaches. Analysis of a financial portfolio dataset illustrates the empirical usefulness of the proposed method. |
|
Protecting Federated Learning from Extreme Model Poisoning Attacks via Multidimensional Time Series Anomaly Detection | 2024-12-02 | ShowCurrent defense mechanisms against model poisoning attacks in federated learning (FL) systems have proven effective up to a certain threshold of malicious clients. In this work, we introduce FLANDERS, a novel pre-aggregation filter for FL resilient to large-scale model poisoning attacks, i.e., when malicious clients far exceed legitimate participants. FLANDERS treats the sequence of local models sent by clients in each FL round as a matrix-valued time series. Then, it identifies malicious client updates as outliers in this time series by comparing actual observations with estimates generated by a matrix autoregressive forecasting model maintained by the server. Experiments conducted in several non-iid FL setups show that FLANDERS significantly improves robustness across a wide spectrum of attacks when paired with standard and robust existing aggregation methods. |
|
Reliable Generation of Privacy-preserving Synthetic Electronic Health Record Time Series via Diffusion Models | 2024-12-02 | ShowElectronic Health Records (EHRs) are rich sources of patient-level data, offering valuable resources for medical data analysis. However, privacy concerns often restrict access to EHRs, hindering downstream analysis. Current EHR de-identification methods are flawed and can lead to potential privacy leakage. Additionally, existing publicly available EHR databases are limited, preventing the advancement of medical research using EHR. This study aims to overcome these challenges by generating realistic and privacy-preserving synthetic electronic health records (EHRs) time series efficiently. We introduce a new method for generating diverse and realistic synthetic EHR time series data using Denoising Diffusion Probabilistic Models (DDPM). We conducted experiments on six databases: Medical Information Mart for Intensive Care III and IV (MIMIC-III/IV), the eICU Collaborative Research Database (eICU), and non-EHR datasets on Stocks and Energy. We compared our proposed method with eight existing methods. Our results demonstrate that our approach significantly outperforms all existing methods in terms of data fidelity while requiring less training effort. Additionally, data generated by our method yields a lower discriminative accuracy compared to other baseline methods, indicating the proposed method can generate data with less privacy risk. The proposed diffusion-model-based method can reliably and efficiently generate synthetic EHR time series, which facilitates the downstream medical data analysis. Our numerical results show the superiority of the proposed method over all other existing methods. |
|
Option Pricing with Convolutional Kolmogorov-Arnold Networks | 2024-12-02 | ShowWith the rapid advancement of neural networks, methods for option pricing have evolved significantly. This study employs the Black-Scholes-Merton (B-S-M) model, incorporating an additional variable to improve the accuracy of predictions compared to the traditional Black-Scholes (B-S) model. Furthermore, Convolutional Kolmogorov-Arnold Networks (Conv-KANs) and Kolmogorov-Arnold Networks (KANs) are introduced to demonstrate that networks with enhanced non-linear capabilities yield superior fitting performance. For comparative analysis, Conv-LSTM and LSTM models, which are widely used in time series forecasting, are also applied. Additionally, a novel data selection strategy is proposed to simulate a real trading environment, thereby enhancing the robustness of the model. |
|
FD-LLM: Large Language Model for Fault Diagnosis of Machines | 2024-12-02 | ShowLarge language models (LLMs) are effective at capturing complex, valuable conceptual representations from textual data for a wide range of real-world applications. However, in fields like Intelligent Fault Diagnosis (IFD), incorporating additional sensor data-such as vibration signals, temperature readings, and operational metrics-is essential but it is challenging to capture such sensor data information within traditional text corpora. This study introduces a novel IFD approach by effectively adapting LLMs to numerical data inputs for identifying various machine faults from time-series sensor data. We propose FD-LLM, an LLM framework specifically designed for fault diagnosis by formulating the training of the LLM as a multi-class classification problem. We explore two methods for encoding vibration signals: the first method uses a string-based tokenization technique to encode vibration signals into text representations, while the second extracts statistical features from both the time and frequency domains as statistical summaries of each signal. We assess the fault diagnosis capabilities of four open-sourced LLMs based on the FD-LLM framework, and evaluate the models' adaptability and generalizability under various operational conditions and machine components, namely for traditional fault diagnosis, cross-operational conditions, and cross-machine component settings. Our results show that LLMs such as Llama3 and Llama3-instruct demonstrate strong fault detection capabilities and significant adaptability across different operational conditions, outperforming state-of-the-art deep learning (DL) approaches in many cases. |
20 pa...20 pages, 2 figures, 16 tables, including the tables in the appendix |
Towards a robust frequency-domain analysis: Spectral RĂ©nyi divergence revisited | 2024-12-02 | ShowThis paper studies a specific category of statistical divergences for spectral densities of time series: the spectral |
|
Recurrences reveal shared causal drivers of complex time series | 2024-12-02 | ShowUnmeasured causal forces influence diverse experimental time series, such as the transcription factors that regulate genes, or the descending neurons that steer motor circuits. Combining the theory of skew-product dynamical systems with topological data analysis, we show that simultaneous recurrence events across multiple time series reveal the structure of their shared unobserved driving signal. We introduce a physics-based unsupervised learning algorithm that reconstructs causal drivers by iteratively building a recurrence graph with glass-like structure. As the amount of data increases, a percolation transition on this graph leads to weak ergodicity breaking for random walks -- revealing the shared driver's dynamics, even from strongly-corrupted measurements. We relate reconstruction accuracy to the rate of information transfer from a chaotic driver to the response systems, and we find that effective reconstruction proceeds through gradual approximation of the driver's dynamical attractor. Through extensive benchmarks against classical signal processing and machine learning techniques, we demonstrate our method's ability to extract causal drivers from diverse experimental datasets spanning ecology, genomics, fluid dynamics, and physiology. |
Physi...Physical Review X (to appear). Code available online at https://github.com/williamgilpin/shrec |
Representation Learning for Time-Domain High-Energy Astrophysics: Discovery of Extragalactic Fast X-ray Transient XRT 200515 | 2024-12-02 | ShowWe present a novel representation learning method for downstream tasks such as anomaly detection and unsupervised transient classification in high-energy datasets. This approach enabled the discovery of a new fast X-ray transient (FXT) in the Chandra archive, XRT 200515, a needle-in-the-haystack event and the first Chandra FXT of its kind. Recent serendipitous breakthroughs in X-ray astronomy, including FXTs from binary neutron star mergers and an extragalactic planetary transit candidate, highlight the need for systematic transient searches in X-ray archives. We introduce new event file representations, E-t Maps and E-t-dt Cubes, designed to capture both temporal and spectral information, effectively addressing the challenges posed by variable-length event file time series in machine learning applications. Our pipeline extracts low-dimensional, informative features from these representations using principal component analysis or sparse autoencoders, followed by clustering in the embedding space with DBSCAN. New transients are identified within transient-dominant clusters or through nearest-neighbor searches around known transients, producing a catalog of 3,539 candidates (3,427 flares and 112 dips). XRT 200515 exhibits unique temporal and spectral variability, including an intense, hard <10 s initial burst followed by spectral softening in an ~800 s oscillating tail. We interpret XRT 200515 as either the first giant magnetar flare observed at low X-ray energies or the first extragalactic Type I X-ray burst from a faint LMXB in the LMC. Our method extends to datasets from other observatories such as XMM-Newton, Swift-XRT, eROSITA, Einstein Probe, and upcoming missions like AXIS. |
25 pa...25 pages, submitted to Monthly Notices of the Royal Astronomical Society, presented at the 2023 Conference on Machine Learning in Astronomical Surveys (ML-IAP/CCA-2023) |
MuSiCNet: A Gradual Coarse-to-Fine Framework for Irregularly Sampled Multivariate Time Series Analysis | 2024-12-02 | ShowIrregularly sampled multivariate time series (ISMTS) are prevalent in reality. Most existing methods treat ISMTS as synchronized regularly sampled time series with missing values, neglecting that the irregularities are primarily attributed to variations in sampling rates. In this paper, we introduce a novel perspective that irregularity is essentially relative in some senses. With sampling rates artificially determined from low to high, an irregularly sampled time series can be transformed into a hierarchical set of relatively regular time series from coarse to fine. We observe that additional coarse-grained relatively regular series not only mitigate the irregularly sampled challenges to some extent but also incorporate broad-view temporal information, thereby serving as a valuable asset for representation learning. Therefore, following the philosophy of learning that Seeing the big picture first, then delving into the details, we present the Multi-Scale and Multi-Correlation Attention Network (MuSiCNet) combining multiple scales to iteratively refine the ISMTS representation. Specifically, within each scale, we explore time attention and frequency correlation matrices to aggregate intra- and inter-series information, naturally enhancing the representation quality with richer and more intrinsic details. While across adjacent scales, we employ a representation rectification method containing contrastive learning and reconstruction results adjustment to further improve representation consistency. MuSiCNet is an ISMTS analysis framework that competitive with SOTA in three mainstream tasks consistently, including classification, interpolation, and forecasting. |
IJCAI...IJCAI2024 AI4TS workshop best paper runner-up |
DSSRNN: Decomposition-Enhanced State-Space Recurrent Neural Network for Time-Series Analysis | 2024-12-01 | ShowTime series forecasting is a crucial yet challenging task in machine learning, requiring domain-specific knowledge due to its wide-ranging applications. While recent Transformer models have improved forecasting capabilities, they come with high computational costs. Linear-based models have shown better accuracy than Transformers but still fall short of ideal performance. To address these challenges, we introduce the Decomposition State-Space Recurrent Neural Network (DSSRNN), a novel framework designed for both long-term and short-term time series forecasting. DSSRNN uniquely combines decomposition analysis to capture seasonal and trend components with state-space models and physics-based equations. We evaluate DSSRNN's performance on indoor air quality datasets, focusing on CO2 concentration prediction across various forecasting horizons. Results demonstrate that DSSRNN consistently outperforms state-of-the-art models, including transformer-based architectures, in terms of both Mean Squared Error (MSE) and Mean Absolute Error (MAE). For example, at the shortest horizon (T=96) in Office 1, DSSRNN achieved an MSE of 0.378 and an MAE of 0.401, significantly lower than competing models. Additionally, DSSRNN exhibits superior computational efficiency compared to more complex models. While not as lightweight as the DLinear model, DSSRNN achieves a balance between performance and efficiency, with only 0.11G MACs and 437MiB memory usage, and an inference time of 0.58ms for long-term forecasting. This work not only showcases DSSRNN's success but also establishes a new benchmark for physics-informed machine learning in environmental forecasting and potentially other domains. |
|
State-Space Modeling of Shape-constrained Functional Time Series | 2024-12-01 | ShowFunctional time series data frequently appears in econometric analyses, where the functions of interest are subject to some shape constraints, including monotonicity and convexity, as typical of the estimation of the Lorenz curve. This paper proposes a state-space model for time-varying functions to extract trends and serial dependence from functional time series while imposing the shape constraints on the estimated functions. The function of interest is modeled by a convex combination of selected basis functions to satisfy the shape constraints, where the time-varying convex weights on simplex follow the dynamic multi-logit models. To enable posterior computation by an efficient Markov chain Monte Carlo method, a novel data augmentation technique is devised for the complicated likelihood of this model. The proposed method is applied to the estimation of time-varying Lorenz curves, and its utility is illustrated through numerical experiments and analysis of panel data of household incomes in Japan. |
34 pa...34 pages, 7 figures, 6 tables |
A Wave is Worth 100 Words: Investigating Cross-Domain Transferability in Time Series | 2024-12-01 | ShowTime series analysis is a fundamental data mining task that supervised training methods based on empirical risk minimization have proven their effectiveness on specific tasks and datasets. However, the acquisition of well-annotated data is costly and a large amount of unlabeled series data is under-utilized. Due to distributional shifts across various domains and different patterns of interest across multiple tasks. The problem of cross-domain multi-task migration of time series remains a significant challenge. To address these problems, this paper proposes a novel cross-domain pretraining method based on Wave Quantization (termed as WQ4TS), which can be combined with any advanced time series model and applied to multiple downstream tasks. Specifically, we transfer the time series data from different domains into a common spectral latent space, and enable the model to learn the temporal pattern knowledge of different domains directly from the common space and utilize it for the inference of downstream tasks, thereby mitigating the challenge of heterogeneous cross-domains migration. The establishment of spectral latent space brings at least three benefits, cross-domain migration capability thus adapting to zero- and few-shot scenarios without relying on priori knowledge of the dataset, general compatible cross-domain migration framework without changing the existing model structure, and robust modeling capability thus achieving SOTA results in multiple downstream tasks. To demonstrate the effectiveness of the proposed approach, we conduct extensive experiments including three important tasks: forecasting, imputation, and classification. And three common real-world data scenarios are simulated: full-data, few-shot, and zero-shot. The proposed WQ4TS achieves the best performance on 87.5% of all tasks, and the average improvement of the metrics on all the tasks is up to 34.7%. |
|
Well log data generation and imputation using sequence-based generative adversarial networks | 2024-12-01 | ShowWell log analysis is crucial for hydrocarbon exploration, providing detailed insights into subsurface geological formations. However, gaps and inaccuracies in well log data, often due to equipment limitations, operational challenges, and harsh subsurface conditions, can introduce significant uncertainties in reservoir evaluation. Addressing these challenges requires effective methods for both synthetic data generation and precise imputation of missing data, ensuring data completeness and reliability. This study introduces a novel framework utilizing sequence-based generative adversarial networks (GANs) specifically designed for well log data generation and imputation. The framework integrates two distinct sequence-based GAN models: Time Series GAN (TSGAN) for generating synthetic well log data and Sequence GAN (SeqGAN) for imputing missing data. Both models were tested on a dataset from the North Sea, Netherlands region, focusing on different sections of 5, 10, and 50 data points. Experimental results demonstrate that this approach achieves superior accuracy in filling data gaps compared to other deep learning models for spatial series analysis. The method yielded R^2 values of 0.921, 0.899, and 0.594, with corresponding mean absolute percentage error (MAPE) values of 8.320, 0.005, and 151.154, and mean absolute error (MAE) values of 0.012, 0.005, and 0.032, respectively. These results set a new benchmark for data integrity and utility in geosciences, particularly in well log data analysis. |
|
Fairness at Every Intersection: Uncovering and Mitigating Intersectional Biases in Multimodal Clinical Predictions | 2024-11-30 | ShowBiases in automated clinical decision-making using Electronic Healthcare Records (EHR) impose significant disparities in patient care and treatment outcomes. Conventional approaches have primarily focused on bias mitigation strategies stemming from single attributes, overlooking intersectional subgroups -- groups formed across various demographic intersections (such as race, gender, ethnicity, etc.). Rendering single-attribute mitigation strategies to intersectional subgroups becomes statistically irrelevant due to the varying distribution and bias patterns across these subgroups. The multimodal nature of EHR -- data from various sources such as combinations of text, time series, tabular, events, and images -- adds another layer of complexity as the influence on minority groups may fluctuate across modalities. In this paper, we take the initial steps to uncover potential intersectional biases in predictions by sourcing extensive multimodal datasets, MIMIC-Eye1 and MIMIC-IV ED, and propose mitigation at the intersectional subgroup level. We perform and benchmark downstream tasks and bias evaluation on the datasets by learning a unified text representation from multimodal sources, harnessing the enormous capabilities of the pre-trained clinical Language Models (LM), MedBERT, Clinical BERT, and Clinical BioBERT. Our findings indicate that the proposed sub-group-specific bias mitigation is robust across different datasets, subgroups, and embeddings, demonstrating effectiveness in addressing intersectional biases in multimodal settings. |
|
Test Time Learning for Time Series Forecasting | 2024-11-30 | ShowTime-series forecasting has seen significant advancements with the introduction of token prediction mechanisms such as multi-head attention. However, these methods often struggle to achieve the same performance as in language modeling, primarily due to the quadratic computational cost and the complexity of capturing long-range dependencies in time-series data. State-space models (SSMs), such as Mamba, have shown promise in addressing these challenges by offering efficient solutions with linear RNNs capable of modeling long sequences with larger context windows. However, there remains room for improvement in accuracy and scalability. We propose the use of Test-Time Training (TTT) modules in a parallel architecture to enhance performance in long-term time series forecasting. Through extensive experiments on standard benchmark datasets, we demonstrate that TTT modules consistently outperform state-of-the-art models, including the Mamba-based TimeMachine, particularly in scenarios involving extended sequence and prediction lengths. Our results show significant improvements in Mean Squared Error (MSE) and Mean Absolute Error (MAE), especially on larger datasets such as Electricity, Traffic, and Weather, underscoring the effectiveness of TTT in capturing long-range dependencies. Additionally, we explore various convolutional architectures within the TTT framework, showing that even simple configurations like 1D convolution with small filters can achieve competitive results. This work sets a new benchmark for time-series forecasting and lays the groundwork for future research in scalable, high-performance forecasting models. |
|
Development of a Statistical Predictive Model for Daily Water Table Depth and Important Variables Selection for Inference | 2024-11-30 | ShowAccurately predicting water table dynamics is vital for sustaining groundwater resources that support ecological functions and anthropogenic activities. This study evaluates a statistical model (BigVAR) that handles three major flexibilities: (a) prediction under a sparsity assumption in coefficients, (b) consideration of a time series autoregression framework, and (c) allowance for lags in both dependent and independent variables for estimating water table depth using daily hydroclimatic data from the USDA Forest Service Santee Experimental Forest (SC) and a site in NC. Data from 2006--2019 (SC) and 1988--2008 (NC) were used, with key predictors including soil and air temperature, precipitation, wind, and radiation. For WS80, RMSE during the dormant season was 10.09 cm, with a daily testing phase RMSE of 14.94 cm. The model achieved an R^2 of 0.93 for 2019 (a dry year) and 0.96 for 2016 (a wet year). Solar radiation, rainfall, and wind direction were among the most influential variables. This predictive model aids in managing wetland hydrology and supports decision-making for forest managers and hydrologists. |
Autor...Autoregressive, Hydrology, time series modeling, variable selection, water table depth, hydrology |
Cross-Subject Domain Adaptation for Classifying Working Memory Load with Multi-Frame EEG Images | 2024-11-30 | ShowWorking memory (WM), denoting the information temporally stored in the mind, is a fundamental research topic in the field of human cognition. Electroencephalograph (EEG), which can monitor the electrical activity of the brain, has been widely used in measuring the level of WM. However, one of the critical challenges is that individual differences may cause ineffective results, especially when the established model meets an unfamiliar subject. In this work, we propose a cross-subject deep adaptation model with spatial attention (CS-DASA) to generalize the workload classifications across subjects. First, we transform EEG time series into multi-frame EEG images incorporating spatial, spectral, and temporal information. First, the Subject-Shared module in CS-DASA receives multi-frame EEG image data from both source and target subjects and learns the common feature representations. Then, in the subject-specific module, the maximum mean discrepancy is implemented to measure the domain distribution divergence in a reproducing kernel Hilbert space, which can add an effective penalty loss for domain adaptation. Additionally, the subject-to-subject spatial attention mechanism is employed to focus on the discriminative spatial features from the target image data. Experiments conducted on a public WM EEG dataset containing 13 subjects show that the proposed model is capable of achieving better performance than existing state-of-the-art methods. |
|
Supervised Autoencoders with Fractionally Differentiated Features and Triple Barrier Labelling Enhance Predictions on Noisy Data | 2024-11-30 | ShowThis paper investigates the enhancement of financial time series forecasting with the use of neural networks through supervised autoencoders (SAE), to improve investment strategy performance. Using the Sharpe and Information Ratios, it specifically examines the impact of noise augmentation and triple barrier labeling on risk-adjusted returns. The study focuses on Bitcoin, Litecoin, and Ethereum as the traded assets from January 1, 2016, to April 30, 2022. Findings indicate that supervised autoencoders, with balanced noise augmentation and bottleneck size, significantly boost strategy effectiveness. However, excessive noise and large bottleneck sizes can impair performance. |
arXiv...arXiv admin note: substantial text overlap with arXiv:2404.01866 |
Fine-Tuning Pre-trained Large Time Series Models for Prediction of Wind Turbine SCADA Data | 2024-11-30 | ShowThe remarkable achievements of large models in the fields of natural language processing (NLP) and computer vision (CV) have sparked interest in their application to time series forecasting within industrial contexts. This paper explores the application of a pre-trained large time series model, Timer, which was initially trained on a wide range of time series data from multiple domains, in the prediction of Supervisory Control and Data Acquisition (SCADA) data collected from wind turbines. The model was fine-tuned on SCADA datasets sourced from two wind farms, which exhibited differing characteristics, and its accuracy was subsequently evaluated. Additionally, the impact of data volume was studied to evaluate the few-shot ability of the Timer. Finally, an application study on one-turbine fine-tuning for whole-plant prediction was implemented where both few-shot and cross-turbine generalization capacity is required. The results reveal that the pre-trained large model does not consistently outperform other baseline models in terms of prediction accuracy whenever the data is abundant or not, but demonstrates superior performance in the application study. This result underscores the distinctive advantages of the pre-trained large time series model in facilitating swift deployment. |
|
Sorting-based FPGA Sliding Window Aggregation Engine without off-chip Memories | 2024-11-30 | ShowAggregation queries are a series of computationally-demanding analytics operations on grouped and time series data. They include tasks such as summation or finding the median among the items of a group sharing a group ID, and within a specified number of the last observed tuples for sliding window aggregation (SWAG). They have a wide range of applications including in database analytics, operating systems, bank security and medical sensors. Existing challenges include the hardware complexity that comes with efficiently handling per-group states using hash-based approaches. This paper presents a pipelined and adaptable approach for calculating a wide range of aggregation queries with high throughput. It is then adapted for SWAG to achieve up to 476x speedup over the CPU of the same platform. It outperforms the state-of-the-art such as by being able to process 7.14x more tuples per second, and support 4x the window sizes with a fraction of the resources and no DRAM. |
|
Differentiable High-Order Markov Models for Spectrum Prediction | 2024-11-30 | ShowThe advent of deep learning and recurrent neural networks revolutionized the field of time-series processing. Therefore, recent research on spectrum prediction has focused on the use of these tools. However, spectrum prediction, which involves forecasting wireless spectrum availability, is an older field where many "classical" tools were considered around the 2010s, such as Markov models. This work revisits high-order Markov models for spectrum prediction in dynamic wireless environments. We introduce a framework to address mismatches between sensing length and model order as well as state-space complexity arising with large order. Furthermore, we extend this Markov framework by enabling fine-tuning of the probability transition matrix through gradient-based supervised learning, offering a hybrid approach that bridges probabilistic modeling and modern machine learning. Simulations on real-world Wi-Fi traffic demonstrate the competitive performance of high-order Markov models compared to deep learning methods, particularly in scenarios with constrained datasets containing outliers. |
|
Linear Simple Cycle Reservoirs at the edge of stability perform Fourier decomposition of the input driving signals | 2024-11-30 | ShowThis paper explores the representational structure of linear Simple Cycle Reservoirs (SCR) operating at the edge of stability. We view SCR as providing in their state space feature representations of the input-driving time series. By endowing the state space with the canonical dot-product, we ``reverse engineer" the corresponding kernel (inner product) operating in the original time series space. The action of this time-series kernel is fully characterized by the eigenspace of the corresponding metric tensor. We demonstrate that when linear SCRs are constructed at the edge of stability, the eigenvectors of the time-series kernel align with the Fourier basis. This theoretical insight is supported by numerical experiments. |
20 pages |
Hybrid Spiking Neural Network -- Transformer Video Classification Model | 2024-11-29 | ShowIn recent years, Spiking Neural Networks (SNNs) have gathered significant interest due to their temporal understanding capabilities. This work introduces, to the best of our knowledge, the first Cortical Column like hybrid architecture for the Time-Series Data Classification Task that leverages SNNs and is inspired by the brain structure, inspired from the previous hybrid models. We introduce several encoding methods to use with this model. Finally, we develop a procedure for training this network on the training dataset. As an effort to make using these models simpler, we make all the implementations available to the public. |
37 pa...37 pages, 11 figures. BSc Thesis in Computer Science. Code available |
SIMS: Simulating Human-Scene Interactions with Real World Script Planning | 2024-11-29 | ShowSimulating long-term human-scene interaction is a challenging yet fascinating task. Previous works have not effectively addressed the generation of long-term human scene interactions with detailed narratives for physics-based animation. This paper introduces a novel framework for the planning and controlling of long-horizon physical plausible human-scene interaction. On the one hand, films and shows with stylish human locomotions or interactions with scenes are abundantly available on the internet, providing a rich source of data for script planning. On the other hand, Large Language Models (LLMs) can understand and generate logical storylines. This motivates us to marry the two by using an LLM-based pipeline to extract scripts from videos, and then employ LLMs to imitate and create new scripts, capturing complex, time-series human behaviors and interactions with environments. By leveraging this, we utilize a dual-aware policy that achieves both language comprehension and scene understanding to guide character motions within contextual and spatial constraints. To facilitate training and evaluation, we contribute a comprehensive planning dataset containing diverse motion sequences extracted from real-world videos and expand them with large language models. We also collect and re-annotate motion clips from existing kinematic datasets to enable our policy learn diverse skills. Extensive experiments demonstrate the effectiveness of our framework in versatile task execution and its generalization ability to various scenarios, showing remarkably enhanced performance compared with existing methods. Our code and data will be publicly available soon. |
|
Image segmentation of treated and untreated tumor spheroids by Fully Convolutional Networks | 2024-11-29 | ShowMulticellular tumor spheroids (MCTS) are advanced cell culture systems for assessing the impact of combinatorial radio(chemo)therapy. They exhibit therapeutically relevant in-vivo-like characteristics from 3D cell-cell and cell-matrix interactions to radial pathophysiological gradients related to proliferative activity and nutrient/oxygen supply, altering cellular radioresponse. State-of-the-art assays quantify long-term curative endpoints based on collected brightfield image time series from large treated spheroid populations per irradiation dose and treatment arm. Here, spheroid control probabilities are documented analogous to in-vivo tumor control probabilities based on Kaplan-Meier curves. This analyses require laborious spheroid segmentation of up to 100.000 images per treatment arm to extract relevant structural information from the images, e.g., diameter, area, volume and circularity. While several image analysis algorithms are available for spheroid segmentation, they all focus on compact MCTS with clearly distinguishable outer rim throughout growth. However, treated MCTS may partly be detached and destroyed and are usually obscured by dead cell debris. We successfully train two Fully Convolutional Networks, UNet and HRNet, and optimize their hyperparameters to develop an automatic segmentation for both untreated and treated MCTS. We systematically validate the automatic segmentation on larger, independent data sets of spheroids derived from two human head-and-neck cancer cell lines. We find an excellent overlap between manual and automatic segmentation for most images, quantified by Jaccard indices at around 90%. For images with smaller overlap of the segmentations, we demonstrate that this error is comparable to the variations across segmentations from different biological experts, suggesting that these images represent biologically unclear or ambiguous cases. |
30 pages, 23 figures |
Modelling Networked Dynamical System by Temporal Graph Neural ODE with Irregularly Partial Observed Time-series Data | 2024-11-29 | ShowModeling the evolution of system with time-series data is a challenging and critical task in a wide range of fields, especially when the time-series data is regularly sampled and partially observable. Some methods have been proposed to estimate the hidden dynamics between intervals like Neural ODE or Exponential decay dynamic function and combine with RNN to estimate the evolution. However, it is difficult for these methods to capture the spatial and temporal dependencies existing within graph-structured time-series data and take full advantage of the available relational information to impute missing data and predict the future states. Besides, traditional RNN-based methods leverage shared RNN cell to update the hidden state which does not capture the impact of various intervals and missing state information on the reliability of estimating the hidden state. To solve this problem, in this paper, we propose a method embedding Graph Neural ODE with reliability and time-aware mechanism which can capture the spatial and temporal dependencies in irregularly sampled and partially observable time-series data to reconstruct the dynamics. Also, a loss function is designed considering the reliability of the augment data from the above proposed method to make further prediction. The proposed method has been validated in experiments of different networked dynamical systems. |
|
A data driven approach to classify descriptors based on their efficiency in translating noisy trajectories into physically-relevant information | 2024-11-29 | ShowReconstructing the physical complexity of many-body dynamical systems can be challenging. Starting from the trajectories of their constitutive units (raw data), typical approaches require selecting appropriate descriptors to convert them into time-series, which are then analyzed to extract interpretable information. However, identifying the most effective descriptor is often non-trivial. Here, we report a data-driven approach to compare the efficiency of various descriptors in extracting information from noisy trajectories and translating it into physically relevant insights. As a prototypical system with non-trivial internal complexity, we analyze molecular dynamics trajectories of an atomistic system where ice and water coexist in equilibrium near the solid/liquid transition temperature. We compare general and specific descriptors often used in aqueous systems: number of neighbors, molecular velocities, Smooth Overlap of Atomic Positions (SOAP), Local Environments and Neighbors Shuffling (LENS), Orientational Tetrahedral Order, and distance from the fifth neighbor ( |
19 pa...19 pages, 5 figures + 3 in supporting information (at the bottom of the manuscript) |
TEAM: Topological Evolution-aware Framework for Traffic Forecasting--Extended Version | 2024-11-29 | ShowDue to the global trend towards urbanization, people increasingly move to and live in cities that then continue to grow. Traffic forecasting plays an important role in the intelligent transportation systems of cities as well as in spatio-temporal data mining. State-of-the-art forecasting is achieved by deep-learning approaches due to their ability to contend with complex spatio-temporal dynamics. However, existing methods assume the input is fixed-topology road networks and static traffic time series. These assumptions fail to align with urbanization, where time series are collected continuously and road networks evolve over time. In such settings, deep-learning models require frequent re-initialization and re-training, imposing high computational costs. To enable much more efficient training without jeopardizing model accuracy, we propose the Topological Evolution-aware Framework (TEAM) for traffic forecasting that incorporates convolution and attention. This combination of mechanisms enables better adaptation to newly collected time series, while being able to maintain learned knowledge from old time series. TEAM features a continual learning module based on the Wasserstein metric that acts as a buffer that can identify the most stable and the most changing network nodes. Then, only data related to stable nodes is employed for re-training when consolidating a model. Further, only data of new nodes and their adjacent nodes as well as data pertaining to changing nodes are used to re-train the model. Empirical studies with two real-world traffic datasets offer evidence that TEAM is capable of much lower re-training costs than existing methods are, without jeopardizing forecasting accuracy. |
16 pa...16 pages. An extended version of "TEAM: Topological Evolution-aware Framework for Traffic Forecasting" accepted at PVLDB 2025 |
Scalable Order-Preserving Pattern Mining | 2024-11-29 | ShowTime series are ubiquitous in domains ranging from medicine to marketing and finance. Frequent Pattern Mining (FPM) from a time series has thus received much attention. Recently, it has been studied under the order-preserving (OP) matching relation stating that a match occurs when two time series have the same relative order on their elements. Here, we propose exact, highly scalable algorithms for FPM in the OP setting. Our algorithms employ an OP suffix tree (OPST) as an index to store and query time series efficiently. Unfortunately, there are no practical algorithms for OPST construction. Thus, we first propose a novel and practical |
ICDM ...ICDM 2024; abstract abridged to satisfy arXiv requirements |
Hybridization of Persistent Homology with Neural Networks for Time-Series Prediction: A Case Study in Wave Height | 2024-11-29 | ShowTime-series prediction is an active area of research across various fields, often challenged by the fluctuating influence of short-term and long-term factors. In this study, we introduce a feature engineering method that enhances the predictive performance of neural network models. Specifically, we leverage computational topology techniques to derive valuable topological features from input data, boosting the predictive accuracy of our models. Our focus is on predicting wave heights, utilizing models based on topological features within feedforward neural networks (FNNs), recurrent neural networks (RNNs), long short-term memory networks (LSTM), and RNNs with gated recurrent units (GRU). For time-ahead predictions, the enhancements in |
The w...The work has problems in methods and results |
Multi-task CNN Behavioral Embedding Model For Transaction Fraud Detection | 2024-11-29 | ShowThe burgeoning e-Commerce sector requires advanced solutions for the detection of transaction fraud. With an increasing risk of financial information theft and account takeovers, deep learning methods have become integral to the embedding of behavior sequence data in fraud detection. However, these methods often struggle to balance modeling capabilities and efficiency and incorporate domain knowledge. To address these issues, we introduce the multitask CNN behavioral Embedding Model for Transaction Fraud Detection. Our contributions include 1) introducing a single-layer CNN design featuring multirange kernels which outperform LSTM and Transformer models in terms of scalability and domain-focused inductive bias, and 2) the integration of positional encoding with CNN to introduce sequence-order signals enhancing overall performance, and 3) implementing multitask learning with randomly assigned label weights, thus removing the need for manual tuning. Testing on real-world data reveals our model's enhanced performance of downstream transaction models and comparable competitiveness with the Transformer Time Series (TST) model. |
7 pag...7 pages, 2 figures, ICDMW 2024 |
Unsupervised Learning Approach to Anomaly Detection in Gravitational Wave Data | 2024-11-29 | ShowGravitational waves (GW), predicted by Einstein's General Theory of Relativity, provide a powerful probe of astrophysical phenomena and fundamental physics. In this work, we propose an unsupervised anomaly detection method using variational autoencoders (VAEs) to analyze GW time-series data. By training on noise-only data, the VAE accurately reconstructs noise inputs while failing to reconstruct anomalies, such as GW signals, which results in measurable spikes in the reconstruction error. The method was applied to data from the LIGO H1 and L1 detectors. Evaluation on testing datasets containing both noise and GW events demonstrated reliable detection, achieving an area under the ROC curve (AUC) of 0.89. This study introduces VAEs as a robust, unsupervised approach for identifying anomalies in GW data, which offers a scalable framework for detecting known and potentially new phenomena in physics. |
|
An Adversarial Learning Approach to Irregular Time-Series Forecasting | 2024-11-28 | ShowForecasting irregular time series presents significant challenges due to two key issues: the vulnerability of models to mean regression, driven by the noisy and complex nature of the data, and the limitations of traditional error-based evaluation metrics, which fail to capture meaningful patterns and penalize unrealistic forecasts. These problems result in forecasts that often misalign with human intuition. To tackle these challenges, we propose an adversarial learning framework with a deep analysis of adversarial components. Specifically, we emphasize the importance of balancing the modeling of global distribution (overall patterns) and transition dynamics (localized temporal changes) to better capture the nuances of irregular time series. Overall, this research provides practical insights for improving models and evaluation metrics, and pioneers the application of adversarial learning in the domian of irregular time-series forecasting. |
Accep...Accepted to AdvML-Frontiers Workshop @ NeurIPS 2024 |
Fractal Conditional Correlation Dimension Infers Complex Causal Networks | 2024-11-28 | ShowDetermining causal inference has become popular in physical and engineering applications. While the problem has immense challenges, it provides a way to model the complex networks by observing the time series. In this paper, we present the optimal conditional correlation dimensional geometric information flow principle ( |
|
Plots Unlock Time-Series Understanding in Multimodal Models | 2024-11-28 | ShowWhile multimodal foundation models can now natively work with data beyond text, they remain underutilized in analyzing the considerable amounts of multi-dimensional time-series data in fields like healthcare, finance, and social sciences, representing a missed opportunity for richer, data-driven insights. This paper proposes a simple but effective method that leverages the existing vision encoders of these models to "see" time-series data via plots, avoiding the need for additional, potentially costly, model training. Our empirical evaluations show that this approach outperforms providing the raw time-series data as text, with the additional benefit that visual time-series representations demonstrate up to a 90% reduction in model API costs. We validate our hypothesis through synthetic data tasks of increasing complexity, progressing from simple functional form identification on clean data, to extracting trends from noisy scatter plots. To demonstrate generalizability from synthetic tasks with clear reasoning steps to more complex, real-world scenarios, we apply our approach to consumer health tasks - specifically fall detection, activity recognition, and readiness assessment - which involve heterogeneous, noisy data and multi-step reasoning. The overall success in plot performance over text performance (up to an 120% performance increase on zero-shot synthetic tasks, and up to 150% performance increase on real-world tasks), across both GPT and Gemini model families, highlights our approach's potential for making the best use of the native capabilities of foundation models. |
57 pages |
Sparse optimization for estimating the cross-power spectrum in linear inverse models : from theory to the application in brain connectivity | 2024-11-28 | ShowIn this work we present a computationally efficient linear optimization approach for estimating the cross--power spectrum of an hidden multivariate stochastic process from that of another observed process. Sparsity in the resulting estimator of the cross--power is induced through |
|
TimeGPT in Load Forecasting: A Large Time Series Model Perspective | 2024-11-28 | ShowMachine learning models have made significant progress in load forecasting, but their forecast accuracy is limited in cases where historical load data is scarce. Inspired by the outstanding performance of large language models (LLMs) in computer vision and natural language processing, this paper aims to discuss the potential of large time series models in load forecasting with scarce historical data. Specifically, the large time series model is constructed as a time series generative pre-trained transformer (TimeGPT), which is trained on massive and diverse time series datasets consisting of 100 billion data points (e.g., finance, transportation, banking, web traffic, weather, energy, healthcare, etc.). Then, the scarce historical load data is used to fine-tune the TimeGPT, which helps it to adapt to the data distribution and characteristics associated with load forecasting. Simulation results show that TimeGPT outperforms the benchmarks (e.g., popular machine learning models and statistical models) for load forecasting on several real datasets with scarce training samples, particularly for short look-ahead times. However, it cannot be guaranteed that TimeGPT is always superior to benchmarks for load forecasting with scarce data, since the performance of TimeGPT may be affected by the distribution differences between the load data and the training data. In practical applications, we can divide the historical data into a training set and a validation set, and then use the validation set loss to decide whether TimeGPT is the best choice for a specific dataset. |
10 pa...10 pages. It was published in Applied Energy |
SoftED: Metrics for Soft Evaluation of Time Series Event Detection | 2024-11-28 | ShowTime series event detection methods are evaluated mainly by standard classification metrics that focus solely on detection accuracy. However, inaccuracy in detecting an event can often result from its preceding or delayed effects reflected in neighboring detections. These detections are valuable to trigger necessary actions or help mitigate unwelcome consequences. In this context, current metrics are insufficient and inadequate for the context of event detection. There is a demand for metrics that incorporate both the concept of time and temporal tolerance for neighboring detections. This paper introduces SoftED metrics, a new set of metrics designed for soft evaluating event detection methods. They enable the evaluation of both detection accuracy and the degree to which their detections represent events. They improved event detection evaluation by associating events and their representative detections, incorporating temporal tolerance in over 36% of experiments compared to the usual classification metrics. SoftED metrics were validated by domain specialists that indicated their contribution to detection evaluation and method selection. |
19 pages |
Data Augmentation with Diffusion Models for Colon Polyp Localization on the Low Data Regime: How much real data is enough? | 2024-11-28 | ShowThe scarcity of data in medical domains hinders the performance of Deep Learning models. Data augmentation techniques can alleviate that problem, but they usually rely on functional transformations of the data that do not guarantee to preserve the original tasks. To approximate the distribution of the data using generative models is a way of reducing that problem and also to obtain new samples that resemble the original data. Denoising Diffusion models is a promising Deep Learning technique that can learn good approximations of different kinds of data like images, time series or tabular data. Automatic colonoscopy analysis and specifically Polyp localization in colonoscopy videos is a task that can assist clinical diagnosis and treatment. The annotation of video frames for training a deep learning model is a time consuming task and usually only small datasets can be obtained. The fine tuning of application models using a large dataset of generated data could be an alternative to improve their performance. We conduct a set of experiments training different diffusion models that can generate jointly colonoscopy images with localization annotations using a combination of existing open datasets. The generated data is used on various transfer learning experiments in the task of polyp localization with a model based on YOLO v9 on the low data regime. |
|
On Consistency of Signature Using Lasso | 2024-11-28 | ShowSignatures are iterated path integrals of continuous and discrete-time processes, and their universal nonlinearity linearizes the problem of feature selection in time series data analysis. This paper studies the consistency of signature using Lasso regression, both theoretically and numerically. We establish conditions under which the Lasso regression is consistent both asymptotically and in finite sample. Furthermore, we show that the Lasso regression is more consistent with the It^o signature for time series and processes that are closer to the Brownian motion and with weaker inter-dimensional correlations, while it is more consistent with the Stratonovich signature for mean-reverting time series and processes. We demonstrate that signature can be applied to learn nonlinear functions and option prices with high accuracy, and the performance depends on properties of the underlying process and the choice of the signature. |
|
RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data | 2024-11-27 | ShowWe present RelCon, a novel self-supervised \textit{Rel}ative \textit{Con}trastive learning approach that uses a learnable distance measure in combination with a softened contrastive loss for training an motion foundation model from wearable sensors. The learnable distance measure captures motif similarity and domain-specific semantic information such as rotation invariance. The learned distance provides a measurement of semantic similarity between a pair of accelerometer time-series segments, which is used to measure the distance between an anchor and various other sampled candidate segments. The self-supervised model is trained on 1 billion segments from 87,376 participants from a large wearables dataset. The model achieves strong performance across multiple downstream tasks, encompassing both classification and regression. To our knowledge, we are the first to show the generalizability of a self-supervised learning model with motion data from wearables across distinct evaluation tasks. |
|
The Performance of the LSTM-based Code Generated by Large Language Models (LLMs) in Forecasting Time Series Data | 2024-11-27 | ShowAs an intriguing case is the goodness of the machine and deep learning models generated by these LLMs in conducting automated scientific data analysis, where a data analyst may not have enough expertise in manually coding and optimizing complex deep learning models and codes and thus may opt to leverage LLMs to generate the required models. This paper investigates and compares the performance of the mainstream LLMs, such as ChatGPT, PaLM, LLama, and Falcon, in generating deep learning models for analyzing time series data, an important and popular data type with its prevalent applications in many application domains including financial and stock market. This research conducts a set of controlled experiments where the prompts for generating deep learning-based models are controlled with respect to sensitivity levels of four criteria including 1) Clarify and Specificity, 2) Objective and Intent, 3) Contextual Information, and 4) Format and Style. While the results are relatively mix, we observe some distinct patterns. We notice that using LLMs, we are able to generate deep learning-based models with executable codes for each dataset seperatly whose performance are comparable with the manually crafted and optimized LSTM models for predicting the whole time series dataset. We also noticed that ChatGPT outperforms the other LLMs in generating more accurate models. Furthermore, we observed that the goodness of the generated models vary with respect to the ``temperature'' parameter used in configuring LLMS. The results can be beneficial for data analysts and practitioners who would like to leverage generative AIs to produce good prediction models with acceptable goodness. |
|
CanFields: Consolidating 4D Dynamic Shapes from Raw Scans | 2024-11-27 | ShowWe introduce Canonical Consolidation Fields (CanFields), a new method for reconstructing a time series of independently captured 3D scans into a single, coherent deforming shape. This 4D representation enables continuous refinement across both space and time. Unlike prior methods that often over-smooth the geometry or produce topological and geometric artifacts, CanFields effectively learns geometry and deformation in an unsupervised way by incorporating two geometric priors. First, we introduce a dynamic consolidator module that adjusts the input and assigns confidence scores, balancing the learning of the canonical shape and its deformations. Second, we use low-frequency velocity fields to guide deformation while preserving fine details in canonical shapes through high-frequency bias. We validate the robustness and accuracy of CanFields on diverse raw scans, demonstrating its superior performance even with missing regions, sparse frames, and noise. Code is available in the supplementary materials and will be released publicly upon acceptance. |
|
Weakly Supervised Framework Considering Multi-temporal Information for Large-scale Cropland Mapping with Satellite Imagery | 2024-11-27 | ShowAccurately mapping large-scale cropland is crucial for agricultural production management and planning. Currently, the combination of remote sensing data and deep learning techniques has shown outstanding performance in cropland mapping. However, those approaches require massive precise labels, which are labor-intensive. To reduce the label cost, this study presented a weakly supervised framework considering multi-temporal information for large-scale cropland mapping. Specifically, we extract high-quality labels according to their consistency among global land cover (GLC) products to construct the supervised learning signal. On the one hand, to alleviate the overfitting problem caused by the model's over-trust of remaining errors in high-quality labels, we encode the similarity/aggregation of cropland in the visual/spatial domain to construct the unsupervised learning signal, and take it as the regularization term to constrain the supervised part. On the other hand, to sufficiently leverage the plentiful information in the samples without high-quality labels, we also incorporate the unsupervised learning signal in these samples, enriching the diversity of the feature space. After that, to capture the phenological features of croplands, we introduce dense satellite image time series (SITS) to extend the proposed framework in the temporal dimension. We also visualized the high dimensional phenological features to uncover how multi-temporal information benefits cropland extraction, and assessed the method's robustness under conditions of data scarcity. The proposed framework has been experimentally validated for strong adaptability across three study areas (Hunan Province, Southeast France, and Kansas) in large-scale cropland mapping, and the internal mechanism and temporal generalizability are also investigated. |
|
Ridge Regression for Manifold-valued Time-Series with Application to Meteorological Forecast | 2024-11-27 | ShowWe propose a natural intrinsic extension of the ridge regression from Euclidean spaces to general manifolds, which relies on Riemannian least-squares fitting, empirical covariance, and Mahalanobis distance. We utilize it for time-series prediction and apply the approach to forecast hurricane tracks and their wind speeds. |
|
Visual Adversarial Attack on Vision-Language Models for Autonomous Driving | 2024-11-27 | ShowVision-language models (VLMs) have significantly advanced autonomous driving (AD) by enhancing reasoning capabilities. However, these models remain highly vulnerable to adversarial attacks. While existing research has primarily focused on general VLM attacks, the development of attacks tailored to the safety-critical AD context has been largely overlooked. In this paper, we take the first step toward designing adversarial attacks specifically targeting VLMs in AD, exposing the substantial risks these attacks pose within this critical domain. We identify two unique challenges for effective adversarial attacks on AD VLMs: the variability of textual instructions and the time-series nature of visual scenarios. To this end, we propose ADvLM, the first visual adversarial attack framework specifically designed for VLMs in AD. Our framework introduces Semantic-Invariant Induction, which uses a large language model to create a diverse prompt library of textual instructions with consistent semantic content, guided by semantic entropy. Building on this, we introduce Scenario-Associated Enhancement, an approach where attention mechanisms select key frames and perspectives within driving scenarios to optimize adversarial perturbations that generalize across the entire scenario. Extensive experiments on several AD VLMs over multiple benchmarks show that ADvLM achieves state-of-the-art attack effectiveness. Moreover, real-world attack studies further validate its applicability and potential in practice. |
|
Tree species classification at the pixel-level using deep learning and multispectral time series in an imbalanced context | 2024-11-27 | ShowThis paper investigates tree species classification using Sentinel-2 multispectral satellite image time-series. Despite their critical importance for many applications, such maps are often unavailable, outdated, or inaccurate for large areas. The interest of using remote sensing time series to produce these maps has been highlighted in many studies. However, many methods proposed in the literature still rely on a standard classification algorithm, usually the Random Forest (RF) algorithm with vegetation indices. This study shows that the use of deep learning models can lead to a significant improvement in classification results, especially in an imbalanced context where the RF algorithm tends to predict towards the majority class. In our use case in the center of France with 10 tree species, we obtain an overall accuracy (OA) around 95% and a F1-macro score around 80% using three different benchmark deep learning architectures. In contrast, using the RF algorithm yields an OA of 93% and an F1 of 60%, indicating that the minority classes are not classified with sufficient accuracy. Therefore, the proposed framework is a strong baseline that can be easily implemented in most scenarios, even with a limited amount of reference data. Our results highlight that standard multilayer perceptron can be competitive with batch normalization and a sufficient amount of parameters. Other architectures (convolutional or attention-based) can also achieve strong results when tuned properly. Furthermore, our results show that DL models are naturally robust to imbalanced data, although similar results can be obtained using dedicated techniques. |
|
Citywide Electric Vehicle Charging Demand Prediction Approach Considering Urban Region and Dynamic Influences | 2024-11-27 | ShowElectric vehicle charging demand prediction is important for vacant charging pile recommendation and charging infrastructure planning, thus facilitating vehicle electrification and green energy development. The performance of previous spatio-temporal studies is still far from satisfactory nowadays because urban region attributes and multivariate temporal influences are not adequately taken into account. To tackle these issues, we propose a learning approach for citywide electric vehicle charging demand prediction, named CityEVCP. To learn non-pairwise relationships in urban areas, we cluster service areas by the types and numbers of points of interest in the areas and develop attentive hypergraph networks accordingly. Graph attention mechanisms are employed for information propagation between neighboring areas. Additionally, we propose a variable selection network to adaptively learn dynamic auxiliary information and improve the Transformer encoder utilizing gated mechanisms for fluctuating charging time-series data. Experiments on a citywide electric vehicle charging dataset demonstrate the performances of our proposed approach compared with a broad range of competing baselines. Furthermore, we demonstrate the impact of dynamic influences on prediction results in different areas of the city and the effectiveness of our area clustering method. |
|
Federated Learning for Time-Series Healthcare Sensing with Incomplete Modalities | 2024-11-27 | ShowMany healthcare sensing applications utilize multimodal time-series data from sensors embedded in mobile and wearable devices. Federated Learning (FL), with its privacy-preserving advantages, is particularly well-suited for health applications. However, most multimodal FL methods assume the availability of complete modality data for local training, which is often unrealistic. Moreover, recent approaches tackling incomplete modalities scale poorly and become inefficient as the number of modalities increases. To address these limitations, we propose FLISM, an efficient FL training algorithm with incomplete sensing modalities while maintaining high accuracy. FLISM employs three key techniques: (1) modality-invariant representation learning to extract effective features from clients with a diverse set of modalities, (2) modality quality-aware aggregation to prioritize contributions from clients with higher-quality modality data, and (3) global-aligned knowledge distillation to reduce local update shifts caused by modality differences. Extensive experiments on real-world datasets show that FLISM not only achieves high accuracy but is also faster and more efficient compared with state-of-the-art methods handling incomplete modality problems in FL. We release the code as open-source at https://github.com/AdibaOrz/FLISM. |
|
Heterogeneous Relationships of Subjects and Shapelets for Semi-supervised Multivariate Series Classification | 2024-11-27 | ShowMultivariate time series (MTS) classification is widely applied in fields such as industry, healthcare, and finance, aiming to extract key features from complex time series data for accurate decision-making and prediction. However, existing methods for MTS often struggle due to the challenges of effectively modeling high-dimensional data and the lack of labeled data, resulting in poor classification performance. To address this issue, we propose a heterogeneous relationships of subjects and shapelets method for semi-supervised MTS classification. This method offers a novel perspective by integrating various types of additional information while capturing the relationships between them. Specifically, we first utilize a contrast temporal self-attention module to obtain sparse MTS representations, and then model the similarities between these representations using soft dynamic time warping to construct a similarity graph. Secondly, we learn the shapelets for different subject types, incorporating both the subject features and their shapelets as additional information to further refine the similarity graph, ultimately generating a heterogeneous graph. Finally, we use a dual level graph attention network to get prediction. Through this method, we successfully transform dataset into a heterogeneous graph, integrating multiple additional information and achieving precise semi-supervised node classification. Experiments on the Human Activity Recognition, sleep stage classification and University of East Anglia datasets demonstrate that our method outperforms current state-of-the-art methods in MTS classification tasks, validating its superiority. |
Submi...Submitted to IEEE International Conference on Data Engineering (ICDE) 2025 |
Causal and Local Correlations Based Network for Multivariate Time Series Classification | 2024-11-27 | ShowRecently, time series classification has attracted the attention of a large number of researchers, and hundreds of methods have been proposed. However, these methods often ignore the spatial correlations among dimensions and the local correlations among features. To address this issue, the causal and local correlations based network (CaLoNet) is proposed in this study for multivariate time series classification. First, pairwise spatial correlations between dimensions are modeled using causality modeling to obtain the graph structure. Then, a relationship extraction network is used to fuse local correlations to obtain long-term dependency features. Finally, the graph structure and long-term dependency features are integrated into the graph neural network. Experiments on the UEA datasets show that CaLoNet can obtain competitive performance compared with state-of-the-art methods. |
Submi...Submitted on April 03, 2023; major revisions on March 25, 2024; minor revisions on July 9, 2024 |
FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting | 2024-11-27 | ShowTime Series Forecasting (TSF) is key functionality in numerous fields, including in finance, weather services, and energy management. While TSF methods are emerging these days, many of them require domain-specific data collection and model training and struggle with poor generalization performance on new domains. Foundation models aim to overcome this limitation. Pre-trained on large-scale language or time series data, they exhibit promising inferencing capabilities in new or unseen data. This has spurred a surge in new TSF foundation models. We propose a new benchmark, FoundTS, to enable thorough and fair evaluation and comparison of such models. FoundTS covers a variety of TSF foundation models, including those based on large language models and those pretrained on time series. Next, FoundTS supports different forecasting strategies, including zero-shot, few-shot, and full-shot, thereby facilitating more thorough evaluations. Finally, FoundTS offers a pipeline that standardizes evaluation processes such as dataset splitting, loading, normalization, and few-shot sampling, thereby facilitating fair evaluations. Building on this, we report on an extensive evaluation of TSF foundation models on a broad range of datasets from diverse domains and with different statistical characteristics. Specifically, we identify pros and cons and inherent limitations of existing foundation models, and we identify directions for future model design. We make our code and datasets available at https://anonymous.4open.science/r/FoundTS-C2B0. |
|
Enhancing Project Performance Forecasting using Machine Learning Techniques | 2024-11-26 | ShowAccurate forecasting of project performance metrics is crucial for successfully managing and delivering urban road reconstruction projects. Traditional methods often rely on static baseline plans and fail to consider the dynamic nature of project progress and external factors. This research proposes a machine learning-based approach to forecast project performance metrics, such as cost variance and earned value, for each Work Breakdown Structure (WBS) category in an urban road reconstruction project. The proposed model utilizes time series forecasting techniques, including Autoregressive Integrated Moving Average (ARIMA) and Long Short-Term Memory (LSTM) networks, to predict future performance based on historical data and project progress. The model also incorporates external factors, such as weather patterns and resource availability, as features to enhance the accuracy of forecasts. By applying the predictive power of machine learning, the performance forecasting model enables proactive identification of potential deviations from the baseline plan, which allows project managers to take timely corrective actions. The research aims to validate the effectiveness of the proposed approach using a case study of an urban road reconstruction project, comparing the model's forecasts with actual project performance data. The findings of this research contribute to the advancement of project management practices in the construction industry, offering a data-driven solution for improving project performance monitoring and control. |
|
Single Proxy Synthetic Control | 2024-11-26 | ShowSynthetic control methods are widely used to estimate the treatment effect on a single treated unit in time-series settings. A common approach to estimate synthetic control weights is to regress the treated unit's pre-treatment outcome and covariates' time series measurements on those of untreated units via ordinary least squares. However, this approach can perform poorly if the pre-treatment fit is not near perfect, whether the weights are normalized or not. In this paper, we introduce a single proxy synthetic control approach, which views the outcomes of untreated units as proxies of the treatment-free potential outcome of the treated unit, a perspective we leverage to construct a valid synthetic control. Under this framework, we establish an alternative identification strategy and corresponding estimation methods for synthetic controls and the treatment effect on the treated unit. Notably, unlike existing proximal synthetic control methods, which require two types of proxies for identification, ours relies on a single type of proxy, thus facilitating its practical relevance. Additionally, we adapt a conformal inference approach to perform inference about the treatment effect, obviating the need for a large number of post-treatment observations. Lastly, our framework can accommodate time-varying covariates and nonlinear models. We demonstrate the proposed approach in a simulation study and a real-world application. |
|
Scalable Spatiotemporal Prediction with Bayesian Neural Fields | 2024-11-26 | ShowSpatiotemporal datasets, which consist of spatially-referenced time series, are ubiquitous in diverse applications, such as air pollution monitoring, disease tracking, and cloud-demand forecasting. As the scale of modern datasets increases, there is a growing need for statistical methods that are flexible enough to capture complex spatiotemporal dynamics and scalable enough to handle many observations. This article introduces the Bayesian Neural Field (BayesNF), a domain-general statistical model that infers rich spatiotemporal probability distributions for data-analysis tasks including forecasting, interpolation, and variography. BayesNF integrates a deep neural network architecture for high-capacity function estimation with hierarchical Bayesian inference for robust predictive uncertainty quantification. Evaluations against prominent baselines show that BayesNF delivers improvements on prediction problems from climate and public health data containing tens to hundreds of thousands of measurements. Accompanying the paper is an open-source software package (https://github.com/google/bayesnf) that runs on GPU and TPU accelerators through the JAX machine learning platform. |
29 pa...29 pages, 7 figures, 2 tables, 1 listing |
Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering | 2024-11-26 | ShowTime series data has become increasingly prevalent across numerous domains, driving a growing demand for time series machine learning techniques. Among these, time series clustering (TSCL) stands out as one of the most popular machine learning tasks. TSCL serves as a powerful exploratory analysis tool and is also employed as a preprocessing step or subroutine for various tasks, including anomaly detection, segmentation, and classification. The most popular TSCL algorithms are either fast (in terms of run time) but perform poorly on benchmark problems, or perform well on benchmarks but scale poorly. We present a new TSCL algorithm, the |
Title | Date | Abstract | Comment |
---|---|---|---|
Motion Prompting: Controlling Video Generation with Motion Trajectories | 2024-12-03 | ShowMotion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal compositions. To this end, we train a video generation model conditioned on spatio-temporally sparse or dense motion trajectories. In contrast to prior motion conditioning work, this flexible representation can encode any number of trajectories, object-specific or global scene motion, and temporally sparse motion; due to its flexibility we refer to this conditioning as motion prompts. While users may directly specify sparse trajectories, we also show how to translate high-level user requests into detailed, semi-dense motion prompts, a process we term motion prompt expansion. We demonstrate the versatility of our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing. Our results showcase emergent behaviors, such as realistic physics, suggesting the potential of motion prompts for probing video models and interacting with future generative world models. Finally, we evaluate quantitatively, conduct a human study, and demonstrate strong performance. Video results are available on our webpage: https://motion-prompting.github.io/ |
Proje...Project page: https://motion-prompting.github.io/ |
Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations | 2024-12-03 | ShowLearning to forecast the trajectories of intelligent agents like pedestrians has caught more researchers' attention. Despite researchers' efforts, it remains a challenge to accurately account for social interactions among agents when forecasting, and in particular, to simulate such social modifications to future trajectories in an explainable and decoupled way. Inspired by the resonance phenomenon of vibration systems, we propose the Resonance (short for Re) model to forecast pedestrian trajectories as co-vibrations, and regard that social interactions are associated with spectral properties of agents' trajectories. It forecasts future trajectories as three distinct vibration terms to represent agents' future plans from different perspectives in a decoupled way. Also, agents' social interactions and how they modify scheduled trajectories will be considered in a resonance-like manner by learning the similarities of their trajectory spectrums. Experiments on multiple datasets, whether pedestrian or vehicle, have verified the usefulness of our method both quantitatively and qualitatively. |
|
Who Walks With You Matters: Perceiving Social Interactions with Groups for Pedestrian Trajectory Prediction | 2024-12-03 | ShowUnderstanding and anticipating human movement has become more critical and challenging in diverse applications such as autonomous driving and surveillance. The complex interactions brought by different relations between agents are a crucial reason that poses challenges to this task. Researchers have put much effort into designing a system using rule-based or data-based models to extract and validate the patterns between pedestrian trajectories and these interactions, which has not been adequately addressed yet. Inspired by how humans perceive social interactions with different level of relations to themself, this work proposes the GrouP ConCeption (short for GPCC) model composed of the Group method, which categorizes nearby agents into either group members or non-group members based on a long-term distance kernel function, and the Conception module, which perceives both visual and acoustic information surrounding the target agent. Evaluated across multiple datasets, the GPCC model demonstrates significant improvements in trajectory prediction accuracy, validating its effectiveness in modeling both social and individual dynamics. The qualitative analysis also indicates that the GPCC framework successfully leverages grouping and perception cues human-like intuitively to validate the proposed model's explainability in pedestrian trajectory forecasting. |
15 pa...15 pages, 10 figures, submitted to CVPR 2025 |
Trajectory-based Road Autolabeling with Lidar-Camera Fusion in Winter Conditions | 2024-12-03 | ShowRobust road segmentation in all road conditions is required for safe autonomous driving and advanced driver assistance systems. Supervised deep learning methods provide accurate road segmentation in the domain of their training data but cannot be trusted in out-of-distribution scenarios. Including the whole distribution in the trainset is challenging as each sample must be labeled by hand. Trajectory-based self-supervised methods offer a potential solution as they can learn from the traversed route without manual labels. However, existing trajectory-based methods use learning schemes that rely only on the camera or only on the lidar. In this paper, trajectory-based learning is implemented jointly with lidar and camera for increased performance. Our method outperforms recent standalone camera- and lidar-based methods when evaluated with a challenging winter driving dataset including countryside and suburb driving scenes. The source code is available at https://github.com/eerik98/lidar-camera-road-autolabeling.git |
|
FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL | 2024-12-03 | ShowMulti-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or added during the inference trajectory -- a common occurrence in real-world environments like search and rescue missions and dynamic combat situations. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization, where such dynamic changes cannot be anticipated beforehand. Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods. FlickerFusion stochastically drops out parts of the observation space, emulating being in-domain when inferenced OOD. The results show that FlickerFusion not only achieves superior inference rewards but also uniquely reduces uncertainty vis-`a-vis the backbone, compared to existing methods. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings. |
NeurI...NeurIPS '24 Open-World Agents Workshop |
Optimizing Latent Goal by Learning from Trajectory Preference | 2024-12-03 | ShowA glowing body of work has emerged focusing on instruction-following policies for open-world agents, aiming to better align the agent's behavior with human intentions. However, the performance of these policies is highly susceptible to the initial prompt, which leads to extra efforts in selecting the best instructions. We propose a framework named Preference Goal Tuning (PGT). PGT allows an instruction following policy to interact with the environment to collect several trajectories, which will be categorized into positive and negative samples based on preference. Then we use preference learning to fine-tune the initial goal latent representation with the categorized trajectories while keeping the policy backbone frozen. The experiment result shows that with minimal data and training, PGT achieves an average relative improvement of 72.0% and 81.6% over 17 tasks in 2 different foundation policies respectively, and outperforms the best human-selected instructions. Moreover, PGT surpasses full fine-tuning in the out-of-distribution (OOD) task-execution environments by 13.4%, indicating that our approach retains strong generalization capabilities. Since our approach stores a single latent representation for each task independently, it can be viewed as an efficient method for continual learning, without the risk of catastrophic forgetting or task interference. In short, PGT enhances the performance of agents across nearly all tasks in the Minecraft Skillforge benchmark and demonstrates robustness to the execution environment. |
|
Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction via Spectrums | 2024-12-03 | ShowWith the fast development of AI-related techniques, the applications of trajectory prediction are no longer limited to easier scenes and trajectories. More and more trajectories with different forms, such as coordinates, bounding boxes, and even high-dimensional human skeletons, need to be analyzed and forecasted. Among these heterogeneous trajectories, interactions between different elements within a frame of trajectory, which we call ``Dimension-wise Interactions'', would be more complex and challenging. However, most previous approaches focus mainly on a specific form of trajectories, and potential dimension-wise interactions are less concerned. In this work, we expand the trajectory prediction task by introducing the trajectory dimensionality |
|
Driving Scene Synthesis on Free-form Trajectories with Generative Prior | 2024-12-02 | ShowDriving scene synthesis along free-form trajectories is essential for driving simulations to enable closed-loop evaluation of end-to-end driving policies. While existing methods excel at novel view synthesis on recorded trajectories, they face challenges with novel trajectories due to limited views of driving videos and the vastness of driving environments. To tackle this challenge, we propose a novel free-form driving view synthesis approach, dubbed DriveX, by leveraging video generative prior to optimize a 3D model across a variety of trajectories. Concretely, we crafted an inverse problem that enables a video diffusion model to be utilized as a prior for many-trajectory optimization of a parametric 3D model (e.g., Gaussian splatting). To seamlessly use the generative prior, we iteratively conduct this process during optimization. Our resulting model can produce high-fidelity virtual driving environments outside the recorded trajectory, enabling free-form trajectory driving simulation. Beyond real driving scenes, DriveX can also be utilized to simulate virtual driving worlds from AI-generated videos. |
|
Outstanding framework for simulating and generating anchor trajectory in wireless sensor networks | 2024-12-02 | ShowThis paper proposes a framework that has the ability to animate and generate different scenarios for the mobility of a movable anchor which can follow various paths in wireless sensor networks (WSNs). When the researchers use NS-2 to simulate a single anchor-assisted localization model, they face the problem of creating the movement file of the movable anchor. The proposed framework solved this problem by allowing them to create the movement scenario regarding different trajectories. The proposed framework lets the researcher set the needed parameters for simulating various static path models, which can be displayed through the graphical user interface. The researcher can also view the mobility of the movable anchor with control of its speed and communication range. The proposed framework has been validated by comparing its results to NS-2 outputs plus comparing it against existing tools. Finally, this framework has been published on the Code Project website and downloaded by many users. |
|
Differential Flatness-based Fast Trajectory Planning for Fixed-wing Unmanned Aerial Vehicles | 2024-12-02 | ShowDue to the strong nonlinearity and nonholonomic dynamics, despite that various general trajectory optimization methods have been presented, few of them can guarantee efficient compu-tation and physical feasibility for relatively complicated fixed-wing UAV dynamics. Aiming at this issue, this paper investigates a differential flatness-based trajectory optimization method for fixed-wing UAVs (DFTO-FW), which transcribes the trajectory optimization into a lightweight, unconstrained, gradient-analytical optimization with linear time complexity in each itera-tion to achieve fast trajectory generation. Through differential flat characteristics analysis and polynomial parameterization, the customized trajectory representation is presented, which implies the equality constraints to avoid the heavy computational burdens of solving complex dynamics. Through the design of integral performance costs and deduction of analytical gradients, the original trajectory optimization is transcribed into an uncon-strained, gradient-analytical optimization with linear time com-plexity to further improve efficiency. The simulation experi-ments illustrate the superior efficiency of the DFTO-FW, which takes sub-second CPU time against other competitors by orders of magnitude to generate fixed-wing UAV trajectories in ran-domly generated obstacle environments. |
Submi...Submit to IEEE Transactions on Systems, Man, and Cybernetics: Systems; Recived Reject with major revision and encouragement to resubmit (31-Oct-2024) |
TAS-TsC: A Data-Driven Framework for Estimating Time of Arrival Using Temporal-Attribute-Spatial Tri-space Coordination of Truck Trajectories | 2024-12-02 | ShowAccurately estimating time of arrival (ETA) for trucks is crucial for optimizing transportation efficiency in logistics. GPS trajectory data offers valuable information for ETA, but challenges arise due to temporal sparsity, variable sequence lengths, and the interdependencies among multiple trucks. To address these issues, we propose the Temporal-Attribute-Spatial Tri-space Coordination (TAS-TsC) framework, which leverages three feature spaces-temporal, attribute, and spatial-to enhance ETA. Our framework consists of a Temporal Learning Module (TLM) using state space models to capture temporal dependencies, an Attribute Extraction Module (AEM) that transforms sequential features into structured attribute embeddings, and a Spatial Fusion Module (SFM) that models the interactions among multiple trajectories using graph representation learning.These modules collaboratively learn trajectory embeddings, which are then used by a Downstream Prediction Module (DPM) to estimate arrival times. We validate TAS-TsC on real truck trajectory datasets collected from Shenzhen, China, demonstrating its superior performance compared to existing methods. |
|
BIGCity: A Universal Spatiotemporal Model for Unified Trajectory and Traffic State Data Analysis | 2024-12-01 | ShowTypical dynamic ST data includes trajectory data (representing individual-level mobility) and traffic state data (representing population-level mobility). Traditional studies often treat trajectory and traffic state data as distinct, independent modalities, each tailored to specific tasks within a single modality. However, real-world applications, such as navigation apps, require joint analysis of trajectory and traffic state data. Treating these data types as two separate domains can lead to suboptimal model performance. Although recent advances in ST data pre-training and ST foundation models aim to develop universal models for ST data analysis, most existing models are "multi-task, solo-data modality" (MTSM), meaning they can handle multiple tasks within either trajectory data or traffic state data, but not both simultaneously. To address this gap, this paper introduces BIGCity, the first multi-task, multi-data modality (MTMD) model for ST data analysis. The model targets two key challenges in designing an MTMD ST model: (1) unifying the representations of different ST data modalities, and (2) unifying heterogeneous ST analysis tasks. To overcome the first challenge, BIGCity introduces a novel ST-unit that represents both trajectories and traffic states in a unified format. Additionally, for the second challenge, BIGCity adopts a tunable large model with ST task-oriented prompt, enabling it to perform a range of heterogeneous tasks without the need for fine-tuning. Extensive experiments on real-world datasets demonstrate that BIGCity achieves state-of-the-art performance across 8 tasks, outperforming 18 baselines. To the best of our knowledge, BIGCity is the first model capable of handling both trajectories and traffic states for diverse heterogeneous tasks. Our code are available at https://github.com/bigscity/BIGCity |
|
Modification of muscle antagonistic relations and hand trajectory on the dynamic motion of Musculoskeletal Humanoid | 2024-12-01 | ShowIn recent years, some research on musculoskeletal humanoids is in progress. However, there are some challenges such as unmeasurable transformation of body structure and muscle path, and difficulty in measuring own motion because of lack of joint angle sensor. In this study, we suggest two motion acquisition methods. One is a method to acquire antagonistic relations of muscles by tension sensing, and the other is a method to acquire correct hand trajectory by vision sensing. Finally, we realize badminton shuttlecock-hitting motion of Kengoro with these two acquisition methods. |
Accep...Accepted at Humanoids2019 |
TraCS: Trajectory Collection in Continuous Space under Local Differential Privacy | 2024-12-01 | ShowTrajectory collection is fundamental for location-based services but often involves sensitive information, such as a user's daily routine, raising privacy concerns. Local differential privacy (LDP) provides provable privacy guarantees for users, even when the data collector is untrusted. Existing trajectory collection methods ensure LDP only for discrete location spaces, where the number of locations affects their privacy guarantees and trajectory utility. Moreover, the location space is often naturally continuous, such as in flying and sailing trajectories, making these methods unsuitable. This paper proposes two trajectory collection methods that ensure LDP for continuous spaces: TraCS-D, which perturbs the direction and distance of locations, and TraCS-C, which perturbs the Cartesian coordinates of locations. Both methods are theoretically and experimentally analyzed for trajectory utility. TraCS can also be applied to discrete spaces by rounding perturbed locations to the nearest discrete points. It is independent of the number of locations and has only |
Submi...Submitted to VLDB 2025 |
Learning Dynamic Weight Adjustment for Spatial-Temporal Trajectory Planning in Crowd Navigation | 2024-11-30 | ShowRobot navigation in dense human crowds poses a significant challenge due to the complexity of human behavior in dynamic and obstacle-rich environments. In this work, we propose a dynamic weight adjustment scheme using a neural network to predict the optimal weights of objectives in an optimization-based motion planner. We adopt a spatial-temporal trajectory planner and incorporate diverse objectives to achieve a balance among safety, efficiency, and goal achievement in complex and dynamic environments. We design the network structure, observation encoding, and reward function to effectively train the policy network using reinforcement learning, allowing the robot to adapt its behavior in real time based on environmental and pedestrian information. Simulation results show improved safety compared to the fixed-weight planner and the state-of-the-art learning-based methods, and verify the ability of the learned policy to adaptively adjust the weights based on the observed situations. The approach's feasibility is demonstrated in a navigation task using an autonomous delivery robot across a crowded corridor over a 300 m distance. |
submi...submitted to ICRA 2025 |
Strategic Application of AIGC for UAV Trajectory Design: A Channel Knowledge Map Approach | 2024-11-30 | ShowUnmanned Aerial Vehicles (UAVs) are increasingly utilized in wireless communication, yet accurate channel loss prediction remains a significant challenge, limiting resource optimization performance. To address this issue, this paper leverages Artificial Intelligence Generated Content (AIGC) for the efficient construction of Channel Knowledge Maps (CKM) and UAV trajectory design. Given the time-consuming nature of channel data collection, AI techniques are employed in a Wasserstein Generative Adversarial Network (WGAN) to extract environmental features and augment the data. Experiment results demonstrate the effectiveness of the proposed framework in improving CKM construction accuracy. Moreover, integrating CKM into UAV trajectory planning reduces channel gain uncertainty, demonstrating its potential to enhance wireless communication efficiency. |
|
InterHub: A Naturalistic Trajectory Dataset with Dense Interaction for Autonomous Driving | 2024-11-30 | ShowThe driving interaction-a critical yet complex aspect of daily driving-lies at the core of autonomous driving research. However, real-world driving scenarios sparsely capture rich interaction events, limiting the availability of comprehensive trajectory datasets for this purpose. To address this challenge, we present InterHub, a dense interaction dataset derived by mining interaction events from extensive naturalistic driving records. We employ formal methods to describe and extract multi-agent interaction events, exposing the limitations of existing autonomous driving solutions. Additionally, we introduce a user-friendly toolkit enabling the expansion of InterHub with both public and private data. By unifying, categorizing, and analyzing diverse interaction events, InterHub facilitates cross-comparative studies and large-scale research, thereby advancing the evaluation and development of autonomous driving technologies. |
|
A Multi-Loss Strategy for Vehicle Trajectory Prediction: Combining Off-Road, Diversity, and Directional Consistency Losses | 2024-11-29 | ShowTrajectory prediction is essential for the safety and efficiency of planning in autonomous vehicles. However, current models often fail to fully capture complex traffic rules and the complete range of potential vehicle movements. Addressing these limitations, this study introduces three novel loss functions: Offroad Loss, Direction Consistency Error, and Diversity Loss. These functions are designed to keep predicted paths within driving area boundaries, aligned with traffic directions, and cover a wider variety of plausible driving scenarios. As all prediction modes should adhere to road rules and conditions, this work overcomes the shortcomings of traditional "winner takes all" training methods by applying the loss functions to all prediction modes. These loss functions not only improve model training but can also serve as metrics for evaluating the realism and diversity of trajectory predictions. Extensive validation on the nuScenes and Argoverse 2 datasets with leading baseline models demonstrates that our approach not only maintains accuracy but significantly improves safety and robustness, reducing offroad errors on average by 47% on original and by 37% on attacked scenes. This work sets a new benchmark for trajectory prediction in autonomous driving, offering substantial improvements in navigating complex environments. Our code is available at https://github.com/vita-epfl/stay-on-track . |
Prepr...Preprint, 7 pages, 4 figures and 2 tables |
A data driven approach to classify descriptors based on their efficiency in translating noisy trajectories into physically-relevant information | 2024-11-29 | ShowReconstructing the physical complexity of many-body dynamical systems can be challenging. Starting from the trajectories of their constitutive units (raw data), typical approaches require selecting appropriate descriptors to convert them into time-series, which are then analyzed to extract interpretable information. However, identifying the most effective descriptor is often non-trivial. Here, we report a data-driven approach to compare the efficiency of various descriptors in extracting information from noisy trajectories and translating it into physically relevant insights. As a prototypical system with non-trivial internal complexity, we analyze molecular dynamics trajectories of an atomistic system where ice and water coexist in equilibrium near the solid/liquid transition temperature. We compare general and specific descriptors often used in aqueous systems: number of neighbors, molecular velocities, Smooth Overlap of Atomic Positions (SOAP), Local Environments and Neighbors Shuffling (LENS), Orientational Tetrahedral Order, and distance from the fifth neighbor ( |
19 pa...19 pages, 5 figures + 3 in supporting information (at the bottom of the manuscript) |
Dynamic High-Order Control Barrier Functions with Diffuser for Safety-Critical Trajectory Planning at Signal-Free Intersections | 2024-11-29 | ShowPlanning safe and efficient trajectories through signal-free intersections presents significant challenges for autonomous vehicles (AVs), particularly in dynamic, multi-task environments with unpredictable interactions and an increased possibility of conflicts. This study aims to address these challenges by developing a robust, adaptive framework to ensure safety in such complex scenarios. Existing approaches often struggle to provide reliable safety mechanisms in dynamic and learn multi-task behaviors from demonstrations in signal-free intersections. This study proposes a safety-critical planning method that integrates Dynamic High-Order Control Barrier Functions (DHOCBF) with a diffusion-based model, called Dynamic Safety-Critical Diffuser (DSC-Diffuser), offering a robust solution for adaptive, safe, and multi-task driving in signal-free intersections. Our approach incorporates a goal-oriented, task-guided diffusion model, enabling the model to learn multiple driving tasks simultaneously from real-world data. To further ensure driving safety in dynamic environments, the proposed DHOCBF framework dynamically adjusts to account for the movements of surrounding vehicles, offering enhanced adaptability compared to traditional control barrier functions. Validity evaluations of DHOCBF, conducted through numerical simulations, demonstrate its robustness in adapting to variations in obstacle velocities, sizes, uncertainties, and locations, effectively maintaining driving safety across a wide range of complex and uncertain scenarios. Performance evaluations across various scenes confirm that DSC-Diffuser provides realistic, stable, and generalizable policies, equipping it with the flexibility to adapt to diverse driving tasks. |
7 fig...7 figures, 3 tables, 12 pages |
Barrier-Enhanced Parallel Homotopic Trajectory Optimization for Safety-Critical Autonomous Driving | 2024-11-29 | ShowEnforcing safety while preventing overly conservative behaviors is essential for autonomous vehicles to achieve high task performance. In this paper, we propose a barrier-enhanced parallel homotopic trajectory optimization (BPHTO) approach with the over-relaxed alternating direction method of multipliers (ADMM) for real-time integrated decision-making and planning. To facilitate safety interactions between the ego vehicle (EV) and surrounding vehicles, a spatiotemporal safety module exhibiting bi-convexity is developed on the basis of barrier function. Varying barrier coefficients are adopted for different time steps in a planning horizon to account for the motion uncertainties of surrounding HVs and mitigate conservative behaviors. Additionally, we exploit the discrete characteristics of driving maneuvers to initialize nominal behavior-oriented free-end homotopic trajectories based on reachability analysis, and each trajectory is locally constrained to a specific driving maneuver while sharing the same task objectives. By leveraging the bi-convexity of the safety module and the kinematics of the EV, we formulate the BPHTO as a bi-convex optimization problem. Then constraint transcription and the over-relaxed ADMM are employed to streamline the optimization process, such that multiple trajectories are generated in real time with feasibility guarantees. Through a series of experiments, the proposed development demonstrates improved task accuracy, stability, and consistency in various traffic scenarios using synthetic and real-world traffic datasets. |
17 pa...17 pages, 10 figures, accepted for publication in IEEE Transactions on Intelligent Transportation Systems |
Trajectory Attention for Fine-grained Video Motion Control | 2024-11-28 | ShowRecent advancements in video generation have been greatly driven by video diffusion models, with camera motion control emerging as a crucial challenge in creating view-customized visual content. This paper introduces trajectory attention, a novel approach that performs attention along available pixel trajectories for fine-grained camera motion control. Unlike existing methods that often yield imprecise outputs or neglect temporal correlations, our approach possesses a stronger inductive bias that seamlessly injects trajectory information into the video generation process. Importantly, our approach models trajectory attention as an auxiliary branch alongside traditional temporal attention. This design enables the original temporal attention and the trajectory attention to work in synergy, ensuring both precise motion control and new content generation capability, which is critical when the trajectory is only partially available. Experiments on camera motion control for images and videos demonstrate significant improvements in precision and long-range consistency while maintaining high-quality generation. Furthermore, we show that our approach can be extended to other video motion control tasks, such as first-frame-guided video editing, where it excels in maintaining content consistency over large spatial and temporal ranges. |
Proje...Project Page: xizaoqu.github.io/trajattn/ |
Computationally efficient trajectory design from motion primitives for near time-optimal transitions for systems with oscillating internal dynamics | 2024-11-28 | ShowAn efficient approach to compute near time-optimal trajectories for linear kinematic systems with oscillatory internal dynamics is presented. Thereby, kinematic constraints with respect to velocity, acceleration and jerk are taken into account. The trajectories are composed of several motion primitives, the most crucial of which is termed jerk segment. Within this contribution, the focus is put on the composition of the overall trajectories, assuming the required motion primitives to be readily available. Since the scheme considered is not time-optimal, even decreasing particular constraints can reduce the overall transition time, which is analysed in detail. This observation implies that replanning of the underlying jerk segments is required as an integral part of the motion planning scheme, further insight into which has been analysed in a complementary contribution. Although the proposed scheme is not time-optimal, it allows for significantly shorter transition times than established methods, such as zero-vibration shaping, while requiring significantly lower computational power than a fully time-optimal scheme. |
|
TEA: Trajectory Encoding Augmentation for Robust and Transferable Policies in Offline Reinforcement Learning | 2024-11-28 | ShowIn this paper, we investigate offline reinforcement learning (RL) with the goal of training a single robust policy that generalizes effectively across environments with unseen dynamics. We propose a novel approach, Trajectory Encoding Augmentation (TEA), which extends the state space by integrating latent representations of environmental dynamics obtained from sequence encoders, such as AutoEncoders. Our findings show that incorporating these encodings with TEA improves the transferability of a single policy to novel environments with new dynamics, surpassing methods that rely solely on unmodified states. These results indicate that TEA captures critical, environment-specific characteristics, enabling RL agents to generalize effectively across dynamic conditions. |
|
RED: Effective Trajectory Representation Learning with Comprehensive Information | 2024-11-28 | ShowTrajectory representation learning (TRL) maps trajectories to vectors that can then be used for various downstream tasks, including trajectory similarity computation, trajectory classification, and travel-time estimation. However, existing TRL methods often produce vectors that, when used in downstream tasks, yield insufficiently accurate results. A key reason is that they fail to utilize the comprehensive information encompassed by trajectories. We propose a self-supervised TRL framework, called RED, which effectively exploits multiple types of trajectory information. Overall, RED adopts the Transformer as the backbone model and masks the constituting paths in trajectories to train a masked autoencoder (MAE). In particular, RED considers the moving patterns of trajectories by employing a Road-aware masking strategy} that retains key paths of trajectories during masking, thereby preserving crucial information of the trajectories. RED also adopts a spatial-temporal-user joint Embedding scheme to encode comprehensive information when preparing the trajectories as model inputs. To conduct training, RED adopts Dual-objective task learning}: the Transformer encoder predicts the next segment in a trajectory, and the Transformer decoder reconstructs the entire trajectory. RED also considers the spatial-temporal correlations of trajectories by modifying the attention mechanism of the Transformer. We compare RED with 9 state-of-the-art TRL methods for 4 downstream tasks on 3 real-world datasets, finding that RED can usually improve the accuracy of the best-performing baseline by over 5%. |
This ...This paper is accepted by VLDB2025 |
Synergizing Decision Making and Trajectory Planning Using Two-Stage Optimization for Autonomous Vehicles | 2024-11-28 | ShowThis paper introduces a local planner that synergizes the decision making and trajectory planning modules towards autonomous driving. The decision making and trajectory planning tasks are jointly formulated as a nonlinear programming problem with an integrated objective function. However, integrating the discrete decision variables into the continuous trajectory optimization leads to a mixed-integer programming (MIP) problem with inherent nonlinearity and nonconvexity. To address the challenge in solving the problem, the original problem is decomposed into two sub-stages, and a two-stage optimization (TSO) based approach is presented to ensure the coherence in outcomes for the two stages. The optimization problem in the first stage determines the optimal decision sequence that acts as an informed initialization. With the outputs from the first stage, the second stage necessitates the use of a high-fidelity vehicle model and strict enforcement of the collision avoidance constraints as part of the trajectory planning problem. We evaluate the effectiveness of our proposed planner across diverse multi-lane scenarios. The results demonstrate that the proposed planner simultaneously generates a sequence of optimal decisions and the corresponding trajectory that significantly improves driving performance in terms of driving safety and traveling efficiency as compared to alternative methods. Additionally, we implement the closed-loop simulation in CARLA, and the results showcase the effectiveness of the proposed planner to adapt to changing driving situations with high computational efficiency. |
|
Non-iterative Optimization of Trajectory and Radio Resource for Aerial Network | 2024-11-28 | ShowWe address a joint trajectory planning, user association, resource allocation, and power control problem to maximize proportional fairness in the aerial IoT network, considering practical end-to-end quality-of-service (QoS) and communication schedules. Though the problem is rather ancient, apart from the fact that the previous approaches have never considered user- and time-specific QoS, we point out a prevalent mistake in coordinate optimization approaches adopted by the majority of the literature. Coordinate optimization approaches, which repetitively optimize radio resources for a fixed trajectory and vice versa, generally converge to local optima when all variables are differentiable. However, these methods often stagnate at a non-stationary point, significantly degrading the network utility in mixed-integer problems such as joint trajectory and radio resource optimization. We detour this problem by converting the formulated problem into the Markov decision process (MDP). Exploiting the beneficial characteristics of the MDP, we design a non-iterative framework that cooperatively optimizes trajectory and radio resources without initial trajectory choice. The proposed framework can incorporate various trajectory-planning algorithms such as the genetic algorithm, tree search, and reinforcement learning. Extensive comparisons with diverse baselines verify that the proposed framework significantly outperforms the state-of-the-art method, nearly achieving the global optimum. Our implementation code is available at https://github.com/hslyu/dbspf.{https://github.com/hslyu/dbspf}. |
This ...This paper has been accepted for publication in the IEEE Transactions on Wireless Communications |
ETSM: Automating Dissection Trajectory Suggestion and Confidence Map-Based Safety Margin Prediction for Robot-assisted Endoscopic Submucosal Dissection | 2024-11-28 | ShowRobot-assisted Endoscopic Submucosal Dissection (ESD) improves the surgical procedure by providing a more comprehensive view through advanced robotic instruments and bimanual operation, thereby enhancing dissection efficiency and accuracy. Accurate prediction of dissection trajectories is crucial for better decision-making, reducing intraoperative errors, and improving surgical training. Nevertheless, predicting these trajectories is challenging due to variable tumor margins and dynamic visual conditions. To address this issue, we create the ESD Trajectory and Confidence Map-based Safety Margin (ETSM) dataset with |
|
Using iterated local alignment to aggregate trajectory data into a traffic flow map | 2024-11-27 | ShowVehicle trajectories, with their detailed geolocations, are a promising data source to compute traffic flow maps which facilitate the understanding of traffic flows at scales ranging from the city/regional level to the road level. The trade-off is that trajectory data are prone to measurement noise. While this is negligible for large-scale flow aggregation, it poses substantial obstacles for small-scale aggregation. To overcome these obstacles, we introduce innovative local alignment algorithms, where we infer road segments to serve as local reference segments, and proceed to align nearby road segments to them. We then deploy these algorithms in an iterative workflow to compute locally aligned flow maps. By applying this workflow to synthetic and empirical trajectories, we verify that our locally aligned flow maps provide high levels of accuracy and spatial resolution of flow aggregation at multiple scales. |
|
DMVC-Tracker: Distributed Multi-Agent Trajectory Planning for Target Tracking Using Dynamic Buffered Voronoi and Inter-Visibility Cells | 2024-11-27 | ShowThis letter presents a distributed trajectory planning method for multi-agent aerial tracking. The proposed method uses a Dynamic Buffered Voronoi Cell (DBVC) and a Dynamic Inter-Visibility Cell (DIVC) to formulate the distributed trajectory generation. Specifically, the DBVC and the DIVC are time-variant spaces that prevent mutual collisions and occlusions among agents, while enabling them to maintain suitable distances from the moving target. We combine the DBVC and the DIVC with an efficient Bernstein polynomial motion primitive-based tracking generation method, which has been refined into a less conservative approach than in our previous work. The proposed algorithm can compute each agent's trajectory within several milliseconds on an Intel i7 desktop. We validate the tracking performance in challenging scenarios, including environments with dozens of obstacles. |
8 pages, 5 figures |
RealTraj: Towards Real-World Pedestrian Trajectory Forecasting | 2024-11-27 | ShowThis paper jointly addresses three key limitations in conventional pedestrian trajectory forecasting: pedestrian perception errors, real-world data collection costs, and person ID annotation costs. We propose a novel framework, RealTraj, that enhances the real-world applicability of trajectory forecasting. Our approach includes two training phases--self-supervised pretraining on synthetic data and weakly-supervised fine-tuning with limited real-world data--to minimize data collection efforts. To improve robustness to real-world errors, we focus on both model design and training objectives. Specifically, we present Det2TrajFormer, a trajectory forecasting model that remains invariant in tracking noise by using past detections as inputs. Additionally, we pretrain the model using multiple pretext tasks, which enhance robustness and improve forecasting performance based solely on detection data. Unlike previous trajectory forecasting methods, our approach fine-tunes the model using only ground-truth detections, significantly reducing the need for costly person ID annotations. In the experiments, we comprehensively verify the effectiveness of the proposed method against the limitations, and the method outperforms state-of-the-art trajectory forecasting methods on multiple datasets. The code will be released at https://fujiry0.github.io/RealTraj-project-page. |
|
QP Chaser: Polynomial Trajectory Generation for Autonomous Aerial Tracking | 2024-11-27 | ShowMaintaining the visibility of the target is one of the major objectives of aerial tracking missions. This paper proposes a target-visible trajectory planning pipeline using quadratic programming (QP). Our approach can handle various tracking settings, including 1) single- and dual-target following and 2) both static and dynamic environments, unlike other works that focus on a single specific setup. In contrast to other studies that fully trust the predicted trajectory of the target and consider only the visibility of the target's center, our pipeline considers error in target path prediction and the entire body of the target to maintain the target visibility robustly. First, a prediction module uses a sample-check strategy to quickly calculate the reachable sets of moving objects, which represent the areas their bodies can reach, considering obstacles. Subsequently, the planning module formulates a single QP problem, considering path topology, to generate a tracking trajectory that maximizes the visibility of the target's reachable set among obstacles. The performance of the planner is validated in multiple scenarios, through high-fidelity simulations and real-world experiments. |
18 pages, 16 figures |
Dynamic Trajectory Adaptation for Efficient UAV Inspections of Wind Energy Units | 2024-11-26 | ShowThe research presents an automated method for determining the trajectory of an unmanned aerial vehicle (UAV) for wind turbine inspection. The proposed method enables efficient data collection from multiple wind installations using UAV optical sensors, considering the spatial positioning of blades and other components of the wind energy installation. It includes component segmentation of the wind energy unit (WEU), determination of the blade pitch angle, and generation of optimal flight trajectories, considering safe distances and optimal viewing angles. The results of computational experiments have demonstrated the advantage of the proposed method in monitoring WEU, achieving a 78% reduction in inspection time, a 17% decrease in total trajectory length, and a 6% increase in average blade surface coverage compared to traditional methods. Furthermore, the process minimizes the average deviation from the optimal trajectory by 68%, indicating its high accuracy and ability to compensate for external influences. |
Unman...Unmanned aerial vehicles, wind turbine inspection, automated trajectory determination, dynamic trajectory adaptation, image segmentation, computer vision, optical sensors, wind energy unit |
Enhancing Lane Segment Perception and Topology Reasoning with Crowdsourcing Trajectory Priors | 2024-11-26 | ShowIn autonomous driving, recent advances in lane segment perception provide autonomous vehicles with a comprehensive understanding of driving scenarios. Moreover, incorporating prior information input into such perception model represents an effective approach to ensure the robustness and accuracy. However, utilizing diverse sources of prior information still faces three key challenges: the acquisition of high-quality prior information, alignment between prior and online perception, efficient integration. To address these issues, we investigate prior augmentation from a novel perspective of trajectory priors. In this paper, we initially extract crowdsourcing trajectory data from Argoverse2 motion forecasting dataset and encode trajectory data into rasterized heatmap and vectorized instance tokens, then we incorporate such prior information into the online mapping model through different ways. Besides, with the purpose of mitigating the misalignment between prior and online perception, we design a confidence-based fusion module that takes alignment into account during the fusion process. We conduct extensive experiments on OpenLane-V2 dataset. The results indicate that our method's performance significantly outperforms the current state-of-the-art methods. |
|
Characterized Diffusion Networks for Enhanced Autonomous Driving Trajectory Prediction | 2024-11-25 | ShowIn this paper, we present a novel trajectory prediction model for autonomous driving, combining a Characterized Diffusion Module and a Spatial-Temporal Interaction Network to address the challenges posed by dynamic and heterogeneous traffic environments. Our model enhances the accuracy and reliability of trajectory predictions by incorporating uncertainty estimation and complex agent interactions. Through extensive experimentation on public datasets such as NGSIM, HighD, and MoCAD, our model significantly outperforms existing state-of-the-art methods. We demonstrate its ability to capture the underlying spatial-temporal dynamics of traffic scenarios and improve prediction precision, especially in complex environments. The proposed model showcases strong potential for application in real-world autonomous driving systems. |
7 pages, 0 figures |
InTraGen: Trajectory-controlled Video Generation for Object Interactions | 2024-11-25 | ShowAdvances in video generation have significantly improved the realism and quality of created scenes. This has fueled interest in developing intuitive tools that let users leverage video generation as world simulators. Text-to-video (T2V) generation is one such approach, enabling video creation from text descriptions only. Yet, due to the inherent ambiguity in texts and the limited temporal information offered by text prompts, researchers have explored additional control signals like trajectory-guided systems, for more accurate T2V generation. Nonetheless, methods to evaluate whether T2V models can generate realistic interactions between multiple objects are lacking. We introduce InTraGen, a pipeline for improved trajectory-based generation of object interaction scenarios. We propose 4 new datasets and a novel trajectory quality metric to evaluate the performance of the proposed InTraGen. To achieve object interaction, we introduce a multi-modal interaction encoding pipeline with an object ID injection mechanism that enriches object-environment interactions. Our results demonstrate improvements in both visual fidelity and quantitative performance. Code and datasets are available at https://github.com/insait-institute/InTraGen |
|
A Parameter Adaptive Trajectory Tracking and Motion Control Framework for Autonomous Vehicle | 2024-11-25 | ShowThis paper studies the trajectory tracking and motion control problems for autonomous vehicles (AVs). A parameter adaptive control framework for AVs is proposed to enhance tracking accuracy and yaw stability. While establishing linear quadratic regulator (LQR) and three robust controllers, the control framework addresses trajectory tracking and motion control in a modular fashion, without introducing complexity into each controller. The robust performance has been guaranteed in three robust controllers by considering the parameter uncertainties, mismatch of unmodeled subsystem as well as external disturbance, comprehensively. Also, the dynamic characteristics of uncertain parameters are identified by Recursive Least Squares (RLS) algorithm, while the boundaries of three robust factors are determined through combining Gaussian Process Regression (GPR) and Bayesian optimization machine learning methods, reducing the conservatism of the controller. Sufficient conditions for closed-loop stability under the diverse robust factors are provided by the Lyapunov method analytically. The simulation results on MATLAB/Simulink and Carsim joint platform demonstrate that the proposed methodology considerably improves tracking accuracy, driving stability, and robust performance, guaranteeing the feasibility and capability of driving in extreme scenarios. |
|
Bring the Heat: Rapid Trajectory Optimization with Pseudospectral Techniques and the Affine Geometric Heat Flow Equation | 2024-11-24 | ShowGenerating optimal trajectories for high-dimensional robotic systems in a time-efficient manner while adhering to constraints is a challenging task. This paper introduces PHLAME, which applies pseudospectral collocation and spatial vector algebra to efficiently solve the Affine Geometric Heat Flow (AGHF) Partial Differential Equation (PDE) for trajectory optimization. Unlike traditional PDE approaches like the Hamilton-Jacobi-Bellman (HJB) PDE, which solve for a function over the entire state space, computing a solution to the AGHF PDE scales more efficiently because its solution is defined over a two-dimensional domain, thereby avoiding the intractability of state-space scaling. To solve the AGHF one usually applies the Method of Lines (MOL), which discretizes one variable of the AGHF PDE, and converts the PDE into a system of ordinary differential equations (ODEs) that are solved using standard time-integration methods. Though powerful, this method requires a fine discretization to generate accurate solutions and requires evaluating the AGHF PDE which is computationally expensive for high-dimensional systems. PHLAME overcomes this deficiency by using a pseudospectral method, which reduces the number of function evaluations required to yield a high accuracy solution thereby allowing it to scale efficiently to high-dimensional robotic systems. To further increase computational speed, this paper presents analytical expressions for the AGHF and its Jacobian, both of which can be computed efficiently using rigid body dynamics algorithms. PHLAME is tested across various dynamical systems, with and without obstacles and compared to a number of state-of-the-art techniques. PHLAME generates trajectories for a 44-dimensional state-space system in |
26 pa...26 pages, 8 figures, A project page can be found at https://roahmlab.github.io/PHLAME/ |
FollowGen: A Scaled Noise Conditional Diffusion Model for Car-Following Trajectory Prediction | 2024-11-23 | ShowVehicle trajectory prediction is crucial for advancing autonomous driving and advanced driver assistance systems (ADAS). Although deep learning-based approaches - especially those utilizing transformer-based and generative models - have markedly improved prediction accuracy by capturing complex, non-linear patterns in vehicle dynamics and traffic interactions, they frequently overlook detailed car-following behaviors and the inter-vehicle interactions critical for real-world driving applications, particularly in fully autonomous or mixed traffic scenarios. To address the issue, this study introduces a scaled noise conditional diffusion model for car-following trajectory prediction, which integrates detailed inter-vehicular interactions and car-following dynamics into a generative framework, improving both the accuracy and plausibility of predicted trajectories. The model utilizes a novel pipeline to capture historical vehicle dynamics by scaling noise with encoded historical features within the diffusion process. Particularly, it employs a cross-attention-based transformer architecture to model intricate inter-vehicle dependencies, effectively guiding the denoising process and enhancing prediction accuracy. Experimental results on diverse real-world driving scenarios demonstrate the state-of-the-art performance and robustness of the proposed method. |
arXiv...arXiv admin note: text overlap with arXiv:2406.11941 |
Learning-based Trajectory Tracking for Bird-inspired Flapping-Wing Robots | 2024-11-22 | ShowBird-sized flapping-wing robots offer significant potential for agile flight in complex environments, but achieving agile and robust trajectory tracking remains a challenge due to the complex aerodynamics and highly nonlinear dynamics inherent in flapping-wing flight. In this work, a learning-based control approach is introduced to unlock the versatility and adaptiveness of flapping-wing flight. We propose a model-free reinforcement learning (RL)-based framework for a high degree-of-freedom (DoF) bird-inspired flapping-wing robot that allows for multimodal flight and agile trajectory tracking. Stability analysis was performed on the closed-loop system comprising of the flapping-wing system and the RL policy. Additionally, simulation results demonstrate that the RL-based controller can successfully learn complex wing trajectory patterns, achieve stable flight, switch between flight modes spontaneously, and track different trajectories under various aerodynamic conditions. |
|
Trajectory Planning and Control for Robotic Magnetic Manipulation | 2024-11-22 | ShowRobotic magnetic manipulation offers a minimally invasive approach to gastrointestinal examinations through capsule endoscopy. However, controlling such systems using external permanent magnets (EPM) is challenging due to nonlinear magnetic interactions, especially when there are complex navigation requirements such as avoidance of sensitive tissues. In this work, we present a novel trajectory planning and control method incorporating dynamics and navigation requirements, using a single EPM fixed to a robotic arm to manipulate an internal permanent magnet (IPM). Our approach employs a constrained iterative linear quadratic regulator that considers the dynamics of the IPM to generate optimal trajectories for both the EPM and IPM. Extensive simulations and real-world experiments, motivated by capsule endoscopy operations, demonstrate the robustness of the method, showcasing resilience to external disturbances and precise control under varying conditions. The experimental results show that the IPM reaches the goal position with a maximum mean error of 0.18 cm and a standard deviation of 0.21 cm. This work introduces a unified framework for constrained trajectory optimization in magnetic manipulation, directly incorporating both the IPM's dynamics and the EPM's manipulability. |
8 pages, 6 figures |
Bi-level Trajectory Optimization on Uneven Terrains with Differentiable Wheel-Terrain Interaction Model | 2024-11-22 | ShowNavigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-intensive and fraught with generalization issues. In this paper, we present a purely model-based approach that just requires the digital elevation information of the terrain. Specifically, we express the wheel-terrain interaction and 6dof pose prediction as a non-linear least squares (NLS) problem. As a result, trajectory planning can be viewed as a bi-level optimization. The inner optimization layer predicts the pose on the terrain along a given trajectory, while the outer layer deforms the trajectory itself to reduce the stability and kinematic costs of the pose. We improve the state-of-the-art in the following respects. First, we show that our NLS based pose prediction closely matches the output from a high-fidelity physics engine. This result coupled with the fact that we can query gradients of the NLS solver, makes our pose predictor, a differentiable wheel-terrain interaction model. We further leverage this differentiability to efficiently solve the proposed bi-level trajectory optimization problem. Finally, we perform extensive experiments, and comparison with a baseline to showcase the effectiveness of our approach in obtaining smooth, stable trajectories. |
8 pag...8 pages, 7 figures, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024) |
Grid and Road Expressions Are Complementary for Trajectory Representation Learning | 2024-11-22 | ShowTrajectory representation learning (TRL) maps trajectories to vectors that can be used for many downstream tasks. Existing TRL methods use either grid trajectories, capturing movement in free space, or road trajectories, capturing movement in a road network, as input. We observe that the two types of trajectories are complementary, providing either region and location information or providing road structure and movement regularity. Therefore, we propose a novel multimodal TRL method, dubbed GREEN, to jointly utilize Grid and Road trajectory Expressions for Effective representatioN learning. In particular, we transform raw GPS trajectories into both grid and road trajectories and tailor two encoders to capture their respective information. To align the two encoders such that they complement each other, we adopt a contrastive loss to encourage them to produce similar embeddings for the same raw trajectory and design a mask language model (MLM) loss to use grid trajectories to help reconstruct masked road trajectories. To learn the final trajectory representation, a dual-modal interactor is used to fuse the outputs of the two encoders via cross-attention. We compare GREEN with 7 state-of-the-art TRL methods for 3 downstream tasks, finding that GREEN consistently outperforms all baselines and improves the accuracy of the best-performing baseline by an average of 15.99%. |
This ...This paper is accepted by KDD2025(August Cycle) |
Landing Trajectory Prediction for UAS Based on Generative Adversarial Network | 2024-11-21 | ShowModels for trajectory prediction are an essential component of many advanced air mobility studies. These models help aircraft detect conflict and plan avoidance maneuvers, which is especially important in Unmanned Aircraft systems (UAS) landing management due to the congested airspace near vertiports. In this paper, we propose a landing trajectory prediction model for UAS based on Generative Adversarial Network (GAN). The GAN is a prestigious neural network that has been developed for many years. In previous research, GAN has achieved many state-of-the-art results in many generation tasks. The GAN consists of one neural network generator and a neural network discriminator. Because of the learning capacity of the neural networks, the generator is capable to understand the features of the sample trajectory. The generator takes the previous trajectory as input and outputs some random status of a flight. According to the results of the experiences, the proposed model can output more accurate predictions than the baseline method(GMR) in various datasets. To evaluate the proposed model, we also create a real UAV landing dataset that includes more than 2600 trajectories of drone control manually by real pilots. |
9 pag...9 pages, AIAA SCITECH 2023 |
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning | 2024-11-21 | ShowLearning from multiple domains is a primary factor that influences the generalization of a single unified robot system. In this paper, we aim to learn the trajectory prediction model by using broad out-of-domain data to improve its performance and generalization ability. Trajectory model is designed to predict any-point trajectories in the current frame given an instruction and can provide detailed control guidance for robotic policy learning. To handle the diverse out-of-domain data distribution, we propose a sparsely-gated MoE (\textbf{Top-1} gating strategy) architecture for trajectory model, coined as \textbf{Tra-MoE}. The sparse activation design enables good balance between parameter cooperation and specialization, effectively benefiting from large-scale out-of-domain data while maintaining constant FLOPs per token. In addition, we further introduce an adaptive policy conditioning technique by learning 2D mask representations for predicted trajectories, which is explicitly aligned with image observations to guide action prediction more flexibly. We perform extensive experiments on both simulation and real-world scenarios to verify the effectiveness of Tra-MoE and adaptive policy conditioning technique. We also conduct a comprehensive empirical study to train Tra-MoE, demonstrating that our Tra-MoE consistently exhibits superior performance compared to the dense baseline model, even when the latter is scaled to match Tra-MoE's parameter count. |
15 pages, 5 figures |
FlightPatchNet: Multi-Scale Patch Network with Differential Coding for Flight Trajectory Prediction | 2024-11-21 | ShowAccurate multi-step flight trajectory prediction plays an important role in Air Traffic Control, which can ensure the safety of air transportation. Two main issues limit the flight trajectory prediction performance of existing works. The first issue is the negative impact on prediction accuracy caused by the significant differences in data range. The second issue is that real-world flight trajectories involve underlying temporal dependencies, and existing methods fail to reveal the hidden complex temporal variations and only extract features from one single time scale. To address the above issues, we propose FlightPatchNet, a multi-scale patch network with differential coding for flight trajectory prediction. Specifically, FlightPatchNet first utilizes the differential coding to encode the original values of longitude and latitude into first-order differences and generates embeddings for all variables at each time step. Then, a global temporal attention is introduced to explore the dependencies between different time steps. To fully explore the diverse temporal patterns in flight trajectories, a multi-scale patch network is delicately designed to serve as the backbone. The multi-scale patch network exploits stacked patch mixer blocks to capture inter- and intra-patch dependencies under different time scales, and further integrates multi-scale temporal features across different scales and variables. Finally, FlightPatchNet ensembles multiple predictors to make direct multi-step prediction. Extensive experiments on ADS-B datasets demonstrate that our model outperforms the competitive baselines. |
|
Dynamic Trajectory and Power Control in Ultra-Dense UAV Networks: A Mean-Field Reinforcement Learning Approach | 2024-11-21 | ShowIn ultra-dense unmanned aerial vehicle (UAV) networks, it is challenging to coordinate the resource allocation and interference management among large-scale UAVs, for providing flexible and efficient service coverage to the ground users (GUs). In this paper, we propose a learning-based resource allocation scheme in an ultra-dense UAV communication network, where the GUs' service demands are time-varying with unknown distributions. We formulate the non-cooperative game among multiple co-channel UAVs as a stochastic game, where each UAV jointly optimizes its trajectory, user association, and downlink power control to maximize the expectation of its locally cumulative energy efficiency under the interference and energy constraints. To cope with the scalability issue in a large-scale network, we further formulate the problem as a mean-field game (MFG), which simplifies the interactions among the UAVs into a two-player game between a representative UAV and a mean-field. We prove the existence and uniqueness of the equilibrium for the MFG, and propose a model-free mean-field reinforcement learning algorithm named maximum entropy mean-field deep Q network (ME-MFDQN) to solve the mean-field equilibrium in both fully and partially observable scenarios. The simulation results reveal that the proposed algorithm improves the energy efficiency compared with the benchmark algorithms. Moreover, the performance can be further enhanced if the GUs' service demands exhibit higher temporal correlation or if the UAVs have wider observation capabilities over their nearby GUs. |
|
Trajectory Representation Learning on Road Networks and Grids with Spatio-Temporal Dynamics | 2024-11-21 | ShowTrajectory representation learning is a fundamental task for applications in fields including smart city, and urban planning, as it facilitates the utilization of trajectory data (e.g., vehicle movements) for various downstream applications, such as trajectory similarity computation or travel time estimation. This is achieved by learning low-dimensional representations from high-dimensional and raw trajectory data. However, existing methods for trajectory representation learning either rely on grid-based or road-based representations, which are inherently different and thus, could lose information contained in the other modality. Moreover, these methods overlook the dynamic nature of urban traffic, relying on static road network features rather than time varying traffic patterns. In this paper, we propose TIGR, a novel model designed to integrate grid and road network modalities while incorporating spatio-temporal dynamics to learn rich, general-purpose representations of trajectories. We evaluate TIGR on two realworld datasets and demonstrate the effectiveness of combining both modalities by substantially outperforming state-of-the-art methods, i.e., up to 43.22% for trajectory similarity, up to 16.65% for travel time estimation, and up to 10.16% for destination prediction. |
|
Trajectory Tracking Using Frenet Coordinates with Deep Deterministic Policy Gradient | 2024-11-21 | ShowThis paper studies the application of the DDPG algorithm in trajectory-tracking tasks and proposes a trajectorytracking control method combined with Frenet coordinate system. By converting the vehicle's position and velocity information from the Cartesian coordinate system to Frenet coordinate system, this method can more accurately describe the vehicle's deviation and travel distance relative to the center line of the road. The DDPG algorithm adopts the Actor-Critic framework, uses deep neural networks for strategy and value evaluation, and combines the experience replay mechanism and target network to improve the algorithm's stability and data utilization efficiency. Experimental results show that the DDPG algorithm based on Frenet coordinate system performs well in trajectory-tracking tasks in complex environments, achieves high-precision and stable path tracking, and demonstrates its application potential in autonomous driving and intelligent transportation systems. Keywords- DDPG; path tracking; robot navigation |
|
Almost Global Trajectory Tracking for Quadrotors Using Thrust Direction Control on $\mathcal{S}^2$ | 2024-11-20 | ShowMany of the existing works on quadrotor control address the trajectory tracking problem by employing a cascade design in which the translational and rotational dynamics are stabilized by two separate controllers. The stability of the cascade is often proved by employing trajectory-based arguments, most notably, integral input-to-state stability. In this paper, we follow a different route and present a control law ensuring that a composite function constructed from the translational and rotational tracking errors is a Lyapunov function for the closed-loop cascade. In particular, starting from a generic control law for the double integrator, we develop a suitable attitude control extension, by leveraging a backstepping-like procedure. Using this construction, we provide an almost global stability certificate. The proposed design employs the unit sphere |
|
Hierarchical Diffusion Policy: manipulation trajectory generation via contact guidance | 2024-11-20 | ShowDecision-making in robotics using denoising diffusion processes has increasingly become a hot research topic, but end-to-end policies perform poorly in tasks with rich contact and have limited controllability. This paper proposes Hierarchical Diffusion Policy (HDP), a new imitation learning method of using objective contacts to guide the generation of robot trajectories. The policy is divided into two layers: the high-level policy predicts the contact for the robot's next object manipulation based on 3D information, while the low-level policy predicts the action sequence toward the high-level contact based on the latent variables of observation and contact. We represent both level policies as conditional denoising diffusion processes, and combine behavioral cloning and Q-learning to optimize the low level policy for accurately guiding actions towards contact. We benchmark Hierarchical Diffusion Policy across 6 different tasks and find that it significantly outperforms the existing state of-the-art imitation learning method Diffusion Policy with an average improvement of 20.8%. We find that contact guidance yields significant improvements, including superior performance, greater interpretability, and stronger controllability, especially on contact-rich tasks. To further unlock the potential of HDP, this paper proposes a set of key technical contributions including snapshot gradient optimization, 3D conditioning, and prompt guidance, which improve the policy's optimization efficiency, spatial awareness, and controllability respectively. Finally, real world experiments verify that HDP can handle both rigid and deformable objects. |
arXiv...arXiv admin note: text overlap with arXiv:2303.04137 by other authors |
C$^{2}$INet: Realizing Incremental Trajectory Prediction with Prior-Aware Continual Causal Intervention | 2024-11-19 | ShowTrajectory prediction for multi-agents in complex scenarios is crucial for applications like autonomous driving. However, existing methods often overlook environmental biases, which leads to poor generalization. Additionally, hardware constraints limit the use of large-scale data across environments, and continual learning settings exacerbate the challenge of catastrophic forgetting. To address these issues, we propose the Continual Causal Intervention (C$^{2}$INet) method for generalizable multi-agent trajectory prediction within a continual learning framework. Using variational inference, we align environment-related prior with posterior estimator of confounding factors in the latent space, thereby intervening in causal correlations that affect trajectory representation. Furthermore, we store optimal variational priors across various scenarios using a memory queue, ensuring continuous debiasing during incremental task training. The proposed C$^{2}$INet enhances adaptability to diverse tasks while preserving previous task information to prevent catastrophic forgetting. It also incorporates pruning strategies to mitigate overfitting. Comparative evaluations on three real and synthetic complex datasets against state-of-the-art methods demonstrate that our proposed method consistently achieves reliable prediction performance, effectively mitigating confounding factors unique to different scenarios. This highlights the practical value of our method for real-world applications. |
|
Age of Information Minimization in UAV-Assisted Covert Communication: Trajectory and Beamforming Design | 2024-11-19 | ShowUnmanned aerial vehicles (UAVs) have the potential for time-sensitive applications. Due to wireless channel variation, received data may have an expiration time, particularly in critical situations such as rescue operations, natural disasters, or the military. Age of Information (AoI) is a metric that measures the freshness of received packets to specify the validity period of information. In addition, it is necessary to guarantee the privacy of confidential information transmission through air-to-ground links against eavesdroppers. This paper investigates UAV-assisted covert communication to minimize AoI in the presence of an aerial eavesdropper for the first time. However, to ensure the eavesdropper's error detection rate, UAV-enabled beamforming employs the power-domain non-orthogonal multiple access (PD-NOMA) technique to cover the covert user by a public user. PD-NOMA technique significantly improves the user's AoI, too. The joint optimization problem contains non-convex constraints and coupled optimization variables, including UAV trajectory, beamforming design, and the user's AoI which is challenging to derive a direct solution. We have developed an efficient alternating optimization technique to address the formulated optimization problem. Numerical results demonstrate the impact of the main parameters on the performance of the proposed communication system. |
|
A Linear Differential Inclusion for Contraction Analysis to Known Trajectories | 2024-11-18 | ShowInfinitesimal contraction analysis provides exponential convergence rates between arbitrary pairs of trajectories of a system by studying the system's linearization. An essentially equivalent viewpoint arises through stability analysis of a linear differential inclusion (LDI) encompassing the incremental behavior of the system. In this note, we study contraction of a system to a particular known trajectory, deriving a new LDI characterizing the error between arbitrary trajectories and this known trajectory. As with classical contraction analysis, this new inclusion is constructed via first partial derivatives of the system's vector field, and contraction rates are obtained with familiar tools: uniform bounding of the logarithmic norm and LMI-based Lyapunov conditions. Our LDI is guaranteed to outperform a usual contraction analysis in two special circumstances: i) when the bound on the logarithmic norm arises from an interval overapproximation of the Jacobian matrix, and ii) when the norm considered is the |
|
Enhancing Decision Transformer with Diffusion-Based Trajectory Branch Generation | 2024-11-18 | ShowDecision Transformer (DT) can learn effective policy from offline datasets by converting the offline reinforcement learning (RL) into a supervised sequence modeling task, where the trajectory elements are generated auto-regressively conditioned on the return-to-go (RTG).However, the sequence modeling learning approach tends to learn policies that converge on the sub-optimal trajectories within the dataset, for lack of bridging data to move to better trajectories, even if the condition is set to the highest RTG.To address this issue, we introduce Diffusion-Based Trajectory Branch Generation (BG), which expands the trajectories of the dataset with branches generated by a diffusion model.The trajectory branch is generated based on the segment of the trajectory within the dataset, and leads to trajectories with higher returns.We concatenate the generated branch with the trajectory segment as an expansion of the trajectory.After expanding, DT has more opportunities to learn policies to move to better trajectories, preventing it from converging to the sub-optimal trajectories.Empirically, after processing with BG, DT outperforms state-of-the-art sequence modeling methods on D4RL benchmark, demonstrating the effectiveness of adding branches to the dataset without further modifications. |
|
Map-Free Trajectory Prediction with Map Distillation and Hierarchical Encoding | 2024-11-17 | ShowReliable motion forecasting of surrounding agents is essential for ensuring the safe operation of autonomous vehicles. Many existing trajectory prediction methods rely heavily on high-definition (HD) maps as strong driving priors. However, the availability and accuracy of these priors are not guaranteed due to substantial costs to build, localization errors of vehicles, or ongoing road constructions. In this paper, we introduce MFTP, a Map-Free Trajectory Prediction method that offers several advantages. First, it eliminates the need for HD maps during inference while still benefiting from map priors during training via knowledge distillation. Second, we present a novel hierarchical encoder that effectively extracts spatial-temporal agent features and aggregates them into multiple trajectory queries. Additionally, we introduce an iterative decoder that sequentially decodes trajectory queries to generate the final predictions. Extensive experiments show that our approach achieves state-of-the-art performance on the Argoverse dataset under the map-free setting. |
|
Efficient Estimation of Relaxed Model Parameters for Robust UAV Trajectory Optimization | 2024-11-17 | ShowOnline trajectory optimization and optimal control methods are crucial for enabling sustainable unmanned aerial vehicle (UAV) services, such as agriculture, environmental monitoring, and transportation, where available actuation and energy are limited. However, optimal controllers are highly sensitive to model mismatch, which can occur due to loaded equipment, packages to be delivered, or pre-existing variability in fundamental structural and thrust-related parameters. To circumvent this problem, optimal controllers can be paired with parameter estimators to improve their trajectory planning performance and perform adaptive control. However, UAV platforms are limited in terms of onboard processing power, oftentimes making nonlinear parameter estimation too computationally expensive to consider. To address these issues, we propose a relaxed, affine-in-parameters multirotor model along with an efficient optimal parameter estimator. We convexify the nominal Moving Horizon Parameter Estimation (MHPE) problem into a linear-quadratic form (LQ-MHPE) via an affine-in-parameter relaxation on the nonlinear dynamics, resulting in fast quadratic programs (QPs) that facilitate adaptive Model Predictve Control (MPC) in real time. We compare this approach to the equivalent nonlinear estimator in Monte Carlo simulations, demonstrating a decrease in average solve time and trajectory optimality cost by 98.2% and 23.9-56.2%, respectively. |
8 pag...8 pages, 5 figures, submitted to IEEE Sustech 2025 |
Stable Continual Reinforcement Learning via Diffusion-based Trajectory Replay | 2024-11-16 | ShowGiven the inherent non-stationarity prevalent in real-world applications, continual Reinforcement Learning (RL) aims to equip the agent with the capability to address a series of sequentially presented decision-making tasks. Within this problem setting, a pivotal challenge revolves around \textit{catastrophic forgetting} issue, wherein the agent is prone to effortlessly erode the decisional knowledge associated with past encountered tasks when learning the new one. In recent progresses, the \textit{generative replay} methods have showcased substantial potential by employing generative models to replay data distribution of past tasks. Compared to storing the data from past tasks directly, this category of methods circumvents the growing storage overhead and possible data privacy concerns. However, constrained by the expressive capacity of generative models, existing \textit{generative replay} methods face challenges in faithfully reconstructing the data distribution of past tasks, particularly in scenarios with a myriad of tasks or high-dimensional data. Inspired by the success of diffusion models in various generative tasks, this paper introduces a novel continual RL algorithm DISTR (Diffusion-based Trajectory Replay) that employs a diffusion model to memorize the high-return trajectory distribution of each encountered task and wakeups these distributions during the policy learning on new tasks. Besides, considering the impracticality of replaying all past data each time, a prioritization mechanism is proposed to prioritize the trajectory replay of pivotal tasks in our method. Empirical experiments on the popular continual RL benchmark \texttt{Continual World} demonstrate that our proposed method obtains a favorable balance between \textit{stability} and \textit{plasticity}, surpassing various existing continual RL baselines in average success rate. |
10 pa...10 pages, 3 figures, 1 table, inclusion at ICLR 2024 Workshop on Generative Models for Decision Making |
UniTraj: Learning a Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces | 2024-11-16 | ShowHuman trajectory modeling is essential for deciphering movement patterns and supporting advanced applications across various domains. However, existing methods are often tailored to specific tasks and regions, resulting in limitations related to task specificity, regional dependency, and data quality sensitivity. Addressing these challenges requires a universal human trajectory foundation model capable of generalizing and scaling across diverse tasks and geographic contexts. To this end, we propose UniTraj, a Universal human Trajectory foundation model that is task-adaptive, region-independent, and highly generalizable. To further enhance performance, we construct WorldTrace, the first large-scale, high-quality, globally distributed dataset sourced from open web platforms, encompassing 2.45 million trajectories with billions of points across 70 countries. Through multiple resampling and masking strategies designed for pre-training, UniTraj effectively overcomes geographic and task constraints, adapting to heterogeneous data quality. Extensive experiments across multiple trajectory analysis tasks and real-world datasets demonstrate that UniTraj consistently outperforms existing approaches in terms of scalability and adaptability. These results underscore the potential of UniTraj as a versatile, robust solution for a wide range of trajectory analysis applications, with WorldTrace serving as an ideal but non-exclusive foundation for training. |
|
Tenure and Research Trajectories | 2024-11-15 | ShowTenure is a cornerstone of the US academic system, yet its relationship to faculty research trajectories remains poorly understood. Conceptually, tenure systems may act as a selection mechanism, screening in high-output researchers; a dynamic incentive mechanism, encouraging high output prior to tenure but low output after tenure; and a creative search mechanism, encouraging tenured individuals to undertake high-risk work. Here, we integrate data from seven different sources to trace US tenure-line faculty and their research outputs at an unprecedented scale and scope, covering over 12,000 researchers across 15 disciplines. Our analysis reveals that faculty publication rates typically increase sharply during the tenure track and peak just before obtaining tenure. Post-tenure trends, however, vary across disciplines: in lab-based fields, such as biology and chemistry, research output typically remains high post-tenure, whereas in non-lab-based fields, such as mathematics and sociology, research output typically declines substantially post-tenure. Turning to creative search, faculty increasingly produce novel, high-risk research after securing tenure. However, this shift toward novelty and risk-taking comes with a decline in impact, with post-tenure research yielding fewer highly cited papers. Comparing outcomes across common career ages but different tenure years or comparing research trajectories in tenure-based and non-tenure-based research settings underscores that breaks in the research trajectories are sharply tied to the individual's tenure year. Overall, these findings provide a new empirical basis for understanding the tenure system, individual research trajectories, and the shape of scientific output. |
|
Temporal Patterns of Multiple Long-Term Conditions in Individuals with Intellectual Disability Living in Wales: An Unsupervised Clustering Approach to Disease Trajectories | 2024-11-15 | ShowIdentifying and understanding the co-occurrence of multiple long-term conditions (MLTC) in individuals with intellectual disabilities (ID) is vital for effective healthcare management. These individuals often face earlier onset and higher prevalence of MLTCs, yet specific co-occurrence patterns remain unexplored. This study applies an unsupervised approach to characterise MLTC clusters based on shared disease trajectories using electronic health records (EHRs) from 13069 individuals with ID in Wales (2000-2021). Disease associations and temporal directionality were assessed, followed by spectral clustering to group shared trajectories. The population consisted of 52.3% males and 47.7% females, with an average of 4.5 conditions per patient. Males under 45 formed a single cluster dominated by neurological conditions (32.4%), while males above 45 had three clusters, the largest characterised circulatory (51.8%). Females under 45 formed one cluster with digestive conditions (24.6%) as most prevalent, while those aged 45 and older showed two clusters: one dominated by circulatory (34.1%), and the other by digestive (25.9%) and musculoskeletal (21.9%) system conditions. Mental illness, epilepsy, and reflux were common across groups. These clusters offer insights into disease progression in individuals with ID, informing targeted interventions and personalised healthcare strategies. |
|
Explanation for Trajectory Planning using Multi-modal Large Language Model for Autonomous Driving | 2024-11-15 | ShowEnd-to-end style autonomous driving models have been developed recently. These models lack interpretability of decision-making process from perception to control of the ego vehicle, resulting in anxiety for passengers. To alleviate it, it is effective to build a model which outputs captions describing future behaviors of the ego vehicle and their reason. However, the existing approaches generate reasoning text that inadequately reflects the future plans of the ego vehicle, because they train models to output captions using momentary control signals as inputs. In this study, we propose a reasoning model that takes future planning trajectories of the ego vehicle as inputs to solve this limitation with the dataset newly collected. |
Accep...Accepted and presented at ECCV 2024 2nd Workshop on Vision-Centric Autonomous Driving (VCAD) on September 30, 2024. 13 pages, 5 figures |
Enhancing Maritime Trajectory Forecasting via H3 Index and Causal Language Modelling (CLM) | 2024-11-14 | ShowThe prediction of ship trajectories is a growing field of study in artificial intelligence. Traditional methods rely on the use of LSTM, GRU networks, and even Transformer architectures for the prediction of spatio-temporal series. This study proposes a viable alternative for predicting these trajectories using only GNSS positions. It considers this spatio-temporal problem as a natural language processing problem. The latitude/longitude coordinates of AIS messages are transformed into cell identifiers using the H3 index. Thanks to the pseudo-octal representation, it becomes easier for language models to learn the spatial hierarchy of the H3 index. The method is compared with a classical Kalman filter, widely used in the maritime domain, and introduces the Fr'echet distance as the main evaluation metric. We show that it is possible to predict ship trajectories quite precisely up to 8 hours ahead with 30 minutes of context, using solely GNSS positions, without relying on any additional information such as speed, course, or external conditions - unlike many traditional methods. We demonstrate that this alternative works well enough to predict trajectories worldwide. |
28 pages, 18 figures |
Integrated Precoder and Trajectory Design for MIMO UAV-Assisted Relay System With Finite-Alphabet Inputs | 2024-11-13 | ShowUnmanned aerial vehicles (UAVs) are gaining widespread use in wireless relay systems due to their exceptional flexibility and cost-effectiveness. This paper focuses on the integrated design of UAV trajectories and the precoders at both the transmitter and UAV in a UAV-assisted relay communication system, accounting for transmit power constraints and UAV flight limitations. Unlike previous works that primarily address multiple-input single-output (MISO) systems with Gaussian inputs, we investigate a more realistic scenario involving multiple-input multiple-output (MIMO) systems with finite-alphabet inputs. To tackle the challenging and inherently non-convex problem, we propose an efficient solution algorithm that leverages successive convex approximation and alternating optimization techniques. Simulation results validate the effectiveness of the proposed algorithm, demonstrating its capability to optimize system performance. |
|
DiVR: incorporating context from diverse VR scenes for human trajectory prediction | 2024-11-13 | ShowVirtual environments provide a rich and controlled setting for collecting detailed data on human behavior, offering unique opportunities for predicting human trajectories in dynamic scenes. However, most existing approaches have overlooked the potential of these environments, focusing instead on static contexts without considering userspecific factors. Employing the CREATTIVE3D dataset, our work models trajectories recorded in virtual reality (VR) scenes for diverse situations including road-crossing tasks with user interactions and simulated visual impairments. We propose Diverse Context VR Human Motion Prediction (DiVR), a cross-modal transformer based on the Perceiver architecture that integrates both static and dynamic scene context using a heterogeneous graph convolution network. We conduct extensive experiments comparing DiVR against existing architectures including MLP, LSTM, and transformers with gaze and point cloud context. Additionally, we also stress test our model's generalizability across different users, tasks, and scenes. Results show that DiVR achieves higher accuracy and adaptability compared to other models and to static graphs. This work highlights the advantages of using VR datasets for context-aware human trajectory modeling, with potential applications in enhancing user experiences in the metaverse. Our source code is publicly available at https://gitlab.inria.fr/ffrancog/creattive3d-divr-model. |
|
Efficient Trajectory Generation in 3D Environments with Multi-Level Map Construction | 2024-11-13 | ShowWe propose a robust and efficient framework to generate global trajectories for ground robots in complex 3D environments. The proposed method takes point cloud as input and efficiently constructs a multi-level map using triangular patches as the basic elements. A kinematic path search is adopted on the patches, where motion primitives on different patches combine to form the global min-time cost initial trajectory. We use a same-level expansion method to locate the nearest obstacle for each trajectory waypoint and construct an objective function with curvature, smoothness and obstacle terms for optimization. We evaluate the method on several complex 3D point cloud maps. Compared to existing methods, our method demonstrates higher robustness to point cloud noise, enabling the generation of high quality trajectory while maintaining high computational efficiency. Our code will be publicly available at https://github.com/ck-tian/MLMC-planner. |
|
In-Trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates | 2024-11-12 | ShowInverse reinforcement learning (IRL) aims to learn a reward function and a corresponding policy that best fit the demonstrated trajectories of an expert. However, current IRL works cannot learn incrementally from an ongoing trajectory because they have to wait to collect at least one complete trajectory to learn. To bridge the gap, this paper considers the problem of learning a reward function and a corresponding policy while observing the initial state-action pair of an ongoing trajectory and keeping updating the learned reward and policy when new state-action pairs of the ongoing trajectory are observed. We formulate this problem as an online bi-level optimization problem where the upper level dynamically adjusts the learned reward according to the newly observed state-action pairs with the help of a meta-regularization term, and the lower level learns the corresponding policy. We propose a novel algorithm to solve this problem and guarantee that the algorithm achieves sub-linear local regret |
|
UniTE: A Survey and Unified Pipeline for Pre-training Spatiotemporal Trajectory Embeddings | 2024-11-12 | ShowSpatiotemporal trajectories are sequences of timestamped locations, which enable a variety of analyses that in turn enable important real-world applications. It is common to map trajectories to vectors, called embeddings, before subsequent analyses. Thus, the qualities of embeddings are very important. Methods for pre-training embeddings, which leverage unlabeled trajectories for training universal embeddings, have shown promising applicability across different tasks, thus attracting considerable interest. However, research progress on this topic faces two key challenges: a lack of a comprehensive overview of existing methods, resulting in several related methods not being well-recognized, and the absence of a unified pipeline, complicating the development of new methods and the analysis of methods. We present UniTE, a survey and a unified pipeline for this domain. In doing so, we present a comprehensive list of existing methods for pre-training trajectory embeddings, which includes methods that either explicitly or implicitly employ pre-training techniques. Further, we present a unified and modular pipeline with publicly available underlying code, simplifying the process of constructing and evaluating methods for pre-training trajectory embeddings. Additionally, we contribute a selection of experimental results using the proposed pipeline on real-world datasets. Implementation of the pipeline is publicly available at https://github.com/Logan-Lin/UniTE. |
|
Cross-Domain Transfer Learning using Attention Latent Features for Multi-Agent Trajectory Prediction | 2024-11-12 | ShowWith the advancements of sensor hardware, traffic infrastructure and deep learning architectures, trajectory prediction of vehicles has established a solid foundation in intelligent transportation systems. However, existing solutions are often tailored to specific traffic networks at particular time periods. Consequently, deep learning models trained on one network may struggle to generalize effectively to unseen networks. To address this, we proposed a novel spatial-temporal trajectory prediction framework that performs cross-domain adaption on the attention representation of a Transformer-based model. A graph convolutional network is also integrated to construct dynamic graph feature embeddings that accurately model the complex spatial-temporal interactions between the multi-agent vehicles across multiple traffic domains. The proposed framework is validated on two case studies involving the cross-city and cross-period settings. Experimental results show that our proposed framework achieves superior trajectory prediction and domain adaptation performances over the state-of-the-art models. |
Accep...Accepted at the IEEE International Conference on Systems, Man, and Cybernetics 2024 |
Tracing the Roots: Leveraging Temporal Dynamics in Diffusion Trajectories for Origin Attribution | 2024-11-12 | ShowDiffusion models have revolutionized image synthesis, garnering significant research interest in recent years. Diffusion is an iterative algorithm in which samples are generated step-by-step, starting from pure noise. This process introduces the notion of diffusion trajectories, i.e., paths from the standard Gaussian distribution to the target image distribution. In this context, we study discriminative algorithms operating on these trajectories. Specifically, given a pre-trained diffusion model, we consider the problem of classifying images as part of the training dataset, generated by the model or originating from an external source. Our approach demonstrates the presence of patterns across steps that can be leveraged for classification. We also conduct ablation studies, which reveal that using higher-order gradient features to characterize the trajectories leads to significant performance gains and more robust algorithms. |
|
'Explaining RL Decisions with Trajectories': A Reproducibility Study | 2024-11-11 | ShowThis work investigates the reproducibility of the paper 'Explaining RL decisions with trajectories'. The original paper introduces a novel approach in explainable reinforcement learning based on the attribution decisions of an agent to specific clusters of trajectories encountered during training. We verify the main claims from the paper, which state that (i) training on less trajectories induces a lower initial state value, (ii) trajectories in a cluster present similar high-level patterns, (iii) distant trajectories influence the decision of an agent, and (iv) humans correctly identify the attributed trajectories to the decision of the agent. We recover the environments used by the authors based on the partial original code they provided for one of the environments (Grid-World), and implemented the remaining from scratch (Seaquest, HalfCheetah, Breakout and Q*Bert). While we confirm that (i), (ii), and (iii) partially hold, we extend on the largely qualitative experiments from the authors by introducing a quantitative metric to further support (iii), and new experiments and visual results for (i). Moreover, we investigate the use of different clustering algorithms and encoder architectures to further support (ii). We could not support (iv), given the limited extent of the original experiments. We conclude that, while some of the claims can be supported, further investigations and experiments could be of interest. We recognise the novelty of the work from the authors and hope that our work paves the way for clearer and more transparent approaches. |
|
Quadrotor Trajectory Tracking Using Linear and Nonlinear Model Predictive Control | 2024-11-11 | ShowAccurate trajectory tracking is an essential characteristic for the safe navigation of a quadrotor in cluttered or disturbed environments. In this paper, we present in detail two state-of-the-art model-based control frameworks for trajectory tracking: the Linear Model Predictive Controller (LMPC) and the Nonlinear Model Predictive Controller (NMPC). Additionally, the kinematic and dynamic models of the quadrotor are comprehensively described. Finally, a simulation system is implemented to verify feasibility, demonstrating the effectiveness of both controllers. |
In Vi...In Vietnamese language, in the 25th National Conference on Electronics, Communications and Information Technology (REV-ECIT 2022), Hanoi, Vietnam |
$\mathsf{QuITO}$ $\textsf{v.2}$: Trajectory Optimization with Uniform Error Guarantees under Path Constraints | 2024-11-11 | ShowThis article introduces a new transcription, change point localization, and mesh refinement scheme for direct optimization-based solutions and for uniform approximation of optimal control trajectories associated with a class of nonlinear constrained optimal control problems (OCPs). The base transcription algorithm for which we establish the refinement algorithm is a direct multiple shooting technique -- |
Submi...Submitted; 42 pages, comments are welcome |
Time-delayed Dynamic Mode Decomposition for families of periodic trajectories in Cislunar Space | 2024-11-10 | ShowIn recent years, the development of the Lunar Gateway and Artemis missions has renewed interest in lunar exploration, including both manned and unmanned missions. This interest necessitates accurate initial orbit determination (IOD) and orbit prediction (OP) in this domain, which faces significant challenges such as severe nonlinearity, sensitivity to initial conditions, large state-space volume, and sparse, faint, and unreliable measurements. This paper explores the capability of data-driven Koopman operator-based approximations for OP in these scenarios. Three stable periodic trajectories from distinct cislunar families are analyzed. The analysis includes theoretical justification for using a linear time-invariant system as the data-driven surrogate. This theoretical framework is supported by experimental validation. Furthermore, the accuracy is assessed by comparing the spectral content captured to period estimates derived from the fast Fourier transform (FFT) and Poincare-like sections. |
arXiv...arXiv admin note: text overlap with arXiv:2401.13784 |
RRT* Based Optimal Trajectory Generation with Linear Temporal Logic Specifications under Kinodynamic Constraints | 2024-11-09 | ShowIn this paper, we present a novel RRT*-based strategy for generating kinodynamically feasible paths that satisfy temporal logic specifications. Our approach integrates a robustness metric for Linear Temporal Logics (LTL) with the system's motion constraints, ensuring that the resulting trajectories are both optimal and executable. We introduce a cost function that recursively computes the robustness of temporal logic specifications while penalizing time and control effort, striking a balance between path feasibility and logical correctness. We validate our approach with simulations and real-world experiments in complex environments, demonstrating its effectiveness in producing robust and practical motion plans. This work represents a significant step towards expanding the applicability of motion planning algorithms to more complex, real-world scenarios. |
|
Online Omnidirectional Jumping Trajectory Planning for Quadrupedal Robots on Uneven Terrains | 2024-11-09 | ShowNatural terrain complexity often necessitates agile movements like jumping in animals to improve traversal efficiency. To enable similar capabilities in quadruped robots, complex real-time jumping maneuvers are required. Current research does not adequately address the problem of online omnidirectional jumping and neglects the robot's kinodynamic constraints during trajectory generation. This paper proposes a general and complete cascade online optimization framework for omnidirectional jumping for quadruped robots. Our solution systematically encompasses jumping trajectory generation, a trajectory tracking controller, and a landing controller. It also incorporates environmental perception to navigate obstacles that standard locomotion cannot bypass, such as jumping from high platforms. We introduce a novel jumping plane to parameterize omnidirectional jumping motion and formulate a tightly coupled optimization problem accounting for the kinodynamic constraints, simultaneously optimizing CoM trajectory, Ground Reaction Forces (GRFs), and joint states. To meet the online requirements, we propose an accelerated evolutionary algorithm as the trajectory optimizer to address the complexity of kinodynamic constraints. To ensure stability and accuracy in environmental perception post-landing, we introduce a coarse-to-fine relocalization method that combines global Branch and Bound (BnB) search with Maximum a Posteriori (MAP) estimation for precise positioning during navigation and jumping. The proposed framework achieves jump trajectory generation in approximately 0.1 seconds with a warm start and has been successfully validated on two quadruped robots on uneven terrains. Additionally, we extend the framework's versatility to humanoid robots. |
Submitted to IJRR |
TranSPORTmer: A Holistic Approach to Trajectory Understanding in Multi-Agent Sports | 2024-11-09 | ShowUnderstanding trajectories in multi-agent scenarios requires addressing various tasks, including predicting future movements, imputing missing observations, inferring the status of unseen agents, and classifying different global states. Traditional data-driven approaches often handle these tasks separately with specialized models. We introduce TranSPORTmer, a unified transformer-based framework capable of addressing all these tasks, showcasing its application to the intricate dynamics of multi-agent sports scenarios like soccer and basketball. Using Set Attention Blocks, TranSPORTmer effectively captures temporal dynamics and social interactions in an equivariant manner. The model's tasks are guided by an input mask that conceals missing or yet-to-be-predicted observations. Additionally, we introduce a CLS extra agent to classify states along soccer trajectories, including passes, possessions, uncontrolled states, and out-of-play intervals, contributing to an enhancement in modeling trajectories. Evaluations on soccer and basketball datasets show that TranSPORTmer outperforms state-of-the-art task-specific models in player forecasting, player forecasting-imputation, ball inference, and ball imputation. https://youtu.be/8VtSRm8oGoE |
Accep...Accepted to ACCV 2024 |
Energy-efficient Hybrid Model Predictive Trajectory Planning for Autonomous Electric Vehicles | 2024-11-09 | ShowTo tackle the twin challenges of limited battery life and lengthy charging durations in electric vehicles (EVs), this paper introduces an Energy-efficient Hybrid Model Predictive Planner (EHMPP), which employs an energy-saving optimization strategy. EHMPP focuses on refining the design of the motion planner to be seamlessly integrated with the existing automatic driving algorithms, without additional hardware. It has been validated through simulation experiments on the Prescan, CarSim, and Matlab platforms, demonstrating that it can increase passive recovery energy by 11.74% and effectively track motor speed and acceleration at optimal power. To sum up, EHMPP not only aids in trajectory planning but also significantly boosts energy efficiency in autonomous EVs. |
Accep...Accepted at the IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2024 |
Safe Reinforcement Learning of Robot Trajectories in the Presence of Moving Obstacles | 2024-11-08 | ShowIn this paper, we present an approach for learning collision-free robot trajectories in the presence of moving obstacles. As a first step, we train a backup policy to generate evasive movements from arbitrary initial robot states using model-free reinforcement learning. When learning policies for other tasks, the backup policy can be used to estimate the potential risk of a collision and to offer an alternative action if the estimated risk is considered too high. No matter which action is selected, our action space ensures that the kinematic limits of the robot joints are not violated. We analyze and evaluate two different methods for estimating the risk of a collision. A physics simulation performed in the background is computationally expensive but provides the best results in deterministic environments. If a data-based risk estimator is used instead, the computational effort is significantly reduced, but an additional source of error is introduced. For evaluation, we successfully learn a reaching task and a basketball task while keeping the risk of collisions low. The results demonstrate the effectiveness of our approach for deterministic and stochastic environments, including a human-robot scenario and a ball environment, where no state can be considered permanently safe. By conducting experiments with a real robot, we show that our approach can generate safe trajectories in real time. |
IEEE ...IEEE Robotics and Automation Letters (RA-L); 8 pages; 7 figures |
Generating Synthetic Functional Data for Privacy-Preserving GPS Trajectories | 2024-11-08 | ShowThis research presents FDASynthesis, a novel algorithm designed to generate synthetic GPS trajectory data while preserving privacy. After pre-processing the input GPS data, human mobility traces are modeled as multidimensional curves using Functional Data Analysis (FDA). Then, the synthesis process identifies the K-nearest trajectories and averages their Square-Root Velocity Functions (SRVFs) to generate synthetic data. This results in synthetic trajectories that maintain the utility of the original data while ensuring privacy. Although applied for human mobility research, FDASynthesis is highly adaptable to different types of functional data, offering a scalable solution in various application domains. |
Updat...Updated version, correction of the notation |
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation | 2024-11-07 | ShowMethods for image-to-video generation have achieved impressive, photo-realistic quality. However, adjusting specific elements in generated videos, such as object motion or camera movement, is often a tedious process of trial and error, e.g., involving re-generating videos with different random seeds. Recent techniques address this issue by fine-tuning a pre-trained model to follow conditioning signals, such as bounding boxes or point trajectories. Yet, this fine-tuning procedure can be computationally expensive, and it requires datasets with annotated object motion, which can be difficult to procure. In this work, we introduce SG-I2V, a framework for controllable image-to-video generation that is self-guided$\unicode{x2013}$offering zero-shot control by relying solely on the knowledge present in a pre-trained image-to-video diffusion model without the need for fine-tuning or external knowledge. Our zero-shot method outperforms unsupervised baselines while being competitive with supervised models in terms of visual quality and motion fidelity. |
Proje...Project page: https://kmcode1.github.io/Projects/SG-I2V/ |
Optimal Flow Matching: Learning Straight Trajectories in Just One Step | 2024-11-07 | ShowOver the several recent years, there has been a boom in development of Flow Matching (FM) methods for generative modeling. One intriguing property pursued by the community is the ability to learn flows with straight trajectories which realize the Optimal Transport (OT) displacements. Straightness is crucial for the fast integration (inference) of the learned flow's paths. Unfortunately, most existing flow straightening methods are based on non-trivial iterative FM procedures which accumulate the error during training or exploit heuristics based on minibatch OT. To address these issues, we develop and theoretically justify the novel \textbf{Optimal Flow Matching} (OFM) approach which allows recovering the straight OT displacement for the quadratic transport in just one FM step. The main idea of our approach is the employment of vector field for FM which are parameterized by convex functions. |
Title | Date | Abstract | Comment |
---|---|---|---|
MERGE: Multi-faceted Hierarchical Graph-based GNN for Gene Expression Prediction from Whole Slide Histopathology Images | 2024-12-03 | ShowRecent advances in Spatial Transcriptomics (ST) pair histology images with spatially resolved gene expression profiles, enabling predictions of gene expression across different tissue locations based on image patches. This opens up new possibilities for enhancing whole slide image (WSI) prediction tasks with localized gene expression. However, existing methods fail to fully leverage the interactions between different tissue locations, which are crucial for accurate joint prediction. To address this, we introduce MERGE (Multi-faceted hiErarchical gRaph for Gene Expressions), which combines a multi-faceted hierarchical graph construction strategy with graph neural networks (GNN) to improve gene expression predictions from WSIs. By clustering tissue image patches based on both spatial and morphological features, and incorporating intra- and inter-cluster edges, our approach fosters interactions between distant tissue locations during GNN learning. As an additional contribution, we evaluate different data smoothing techniques that are necessary to mitigate artifacts in ST data, often caused by technical imperfections. We advocate for adopting gene-aware smoothing methods that are more biologically justified. Experimental results on gene expression prediction show that our GNN method outperforms state-of-the-art techniques across multiple metrics. |
Main ...Main Paper: 8 pages, Supplementary Material: 9 pages, Figures: 16 |
Mobile Cell-Free Massive MIMO with Multi-Agent Reinforcement Learning: A Scalable Framework | 2024-12-03 | ShowCell-free massive multiple-input multiple-output (mMIMO) offers significant advantages in mobility scenarios, mainly due to the elimination of cell boundaries and strong macro diversity. In this paper, we examine the downlink performance of cell-free mMIMO systems equipped with mobile-APs utilizing the concept of unmanned aerial vehicles, where mobility and power control are jointly considered to effectively enhance coverage and suppress interference. However, the high computational complexity, poor collaboration, limited scalability, and uneven reward distribution of conventional optimization schemes lead to serious performance degradation and instability. These factors complicate the provision of consistent and high-quality service across all user equipments in downlink cell-free mMIMO systems. Consequently, we propose a novel scalable framework enhanced by multi-agent reinforcement learning (MARL) to tackle these challenges. The established framework incorporates a graph neural network (GNN)-aided communication mechanism to facilitate effective collaboration among agents, a permutation architecture to improve scalability, and a directional decoupling architecture to accurately distinguish contributions. In the numerical results, we present comparisons of different optimization schemes and network architectures, which reveal that the proposed scheme can effectively enhance system performance compared to conventional schemes due to the adoption of advanced technologies. In particular, appropriately compressing the observation space of agents is beneficial for achieving a better balance between performance and convergence. |
|
The Descriptive Complexity of Graph Neural Networks | 2024-12-03 | ShowWe analyse the power of graph neural networks (GNNs) in terms of Boolean circuit complexity and descriptive complexity. We prove that the graph queries that can be computed by a polynomial-size bounded-depth family of GNNs are exactly those definable in the guarded fragment GFO+C of first-order logic with counting and with built-in relations. This puts GNNs in the circuit complexity class (non-uniform) |
Journ...Journal version for TheoretiCS |
Interpolation and differentiation of alchemical degrees of freedom in machine learning interatomic potentials | 2024-12-03 | ShowMachine learning interatomic potentials (MLIPs) have become a workhorse of modern atomistic simulations, and recently published universal MLIPs, pre-trained on large datasets, have demonstrated remarkable accuracy and generalizability. However, the computational cost of MLIPs limits their applicability to chemically disordered systems requiring large simulation cells or to sample-intensive statistical methods. Here, we report the use of continuous and differentiable alchemical degrees of freedom in atomistic materials simulations, exploiting the fact that graph neural network MLIPs represent discrete elements as real-valued tensors. The proposed method introduces alchemical atoms with corresponding weights into the input graph, alongside modifications to the message-passing and readout mechanisms of MLIPs, and allows smooth interpolation between the compositional states of materials. The end-to-end differentiability of MLIPs enables efficient calculation of the gradient of energy with respect to the compositional weights. With this modification, we propose methodologies for optimizing the composition of solid solutions towards target macroscopic properties, characterizing order and disorder in multicomponent oxides, and conducting alchemical free energy simulations to quantify the free energy of vacancy formation and composition changes. The approach offers an avenue for extending the capabilities of universal MLIPs in the modeling of compositional disorder and characterizing the phase stability of complex materials systems. |
|
An Automated Data Mining Framework Using Autoencoders for Feature Extraction and Dimensionality Reduction | 2024-12-03 | ShowThis study proposes an automated data mining framework based on autoencoders and experimentally verifies its effectiveness in feature extraction and data dimensionality reduction. Through the encoding-decoding structure, the autoencoder can capture the data's potential characteristics and achieve noise reduction and anomaly detection, providing an efficient and stable solution for the data mining process. The experiment compared the performance of the autoencoder with traditional dimensionality reduction methods (such as PCA, FA, T-SNE, and UMAP). The results showed that the autoencoder performed best in terms of reconstruction error and root mean square error and could better retain data structure and enhance the generalization ability of the model. The autoencoder-based framework not only reduces manual intervention but also significantly improves the automation of data processing. In the future, with the advancement of deep learning and big data technology, the autoencoder method combined with a generative adversarial network (GAN) or graph neural network (GNN) is expected to be more widely used in the fields of complex data processing, real-time data analysis and intelligent decision-making. |
|
Generalizing Weisfeiler-Lehman Kernels to Subgraphs | 2024-12-03 | ShowSubgraph representation learning has been effective in solving various real-world problems. However, current graph neural networks (GNNs) produce suboptimal results for subgraph-level tasks due to their inability to capture complex interactions within and between subgraphs. To provide a more expressive and efficient alternative, we propose WLKS, a Weisfeiler-Lehman (WL) kernel generalized for subgraphs by applying the WL algorithm on induced |
15 pages |
GNN-based Auto-Encoder for Short Linear Block Codes: A DRL Approach | 2024-12-03 | ShowThis paper presents a novel auto-encoder based end-to-end channel encoding and decoding. It integrates deep reinforcement learning (DRL) and graph neural networks (GNN) in code design by modeling the generation of code parity-check matrices as a Markov Decision Process (MDP), to optimize key coding performance metrics such as error-rates and code algebraic properties. An edge-weighted GNN (EW-GNN) decoder is proposed, which operates on the Tanner graph with an iterative message-passing structure. Once trained on a single linear block code, the EW-GNN decoder can be directly used to decode other linear block codes of different code lengths and code rates. An iterative joint training of the DRL-based code designer and the EW-GNN decoder is performed to optimize the end-end encoding and decoding process. Simulation results show the proposed auto-encoder significantly surpasses several traditional coding schemes at short block lengths, including low-density parity-check (LDPC) codes with the belief propagation (BP) decoding and the maximum-likelihood decoding (MLD), and BCH with BP decoding, offering superior error-correction capabilities while maintaining low decoding complexity. |
13 pa...13 pages; submitted to IEEE Trans. arXiv admin note: text overlap with arXiv:2211.06962 |
Structure-Guided Input Graph for GNNs facing Heterophily | 2024-12-02 | ShowGraph Neural Networks (GNNs) have emerged as a promising tool to handle data exhibiting an irregular structure. However, most GNN architectures perform well on homophilic datasets, where the labels of neighboring nodes are likely to be the same. In recent years, an increasing body of work has been devoted to the development of GNN architectures for heterophilic datasets, where labels do not exhibit this low-pass behavior. In this work, we create a new graph in which nodes are connected if they share structural characteristics, meaning a higher chance of sharing their labels, and then use this new graph in the GNN architecture. To do this, we compute the k-nearest neighbors graph according to distances between structural features, which are either (i) role-based, such as degree, or (ii) global, such as centrality measures. Experiments show that the labels are smoother in this newly defined graph and that the performance of GNN architectures improves when using this alternative structure. |
Prese...Presented as a conference paper in the Asilomar Conference on Signals, Systems, and Computers 2024 |
End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization | 2024-12-02 | ShowSpeaker diarization, the task of segmenting an audio recording based on speaker identity, constitutes an important speech pre-processing step for several downstream applications.The conventional approach to diarization involves multiple steps of embedding extraction and clustering, which are often optimized in an isolated fashion. While end-to-end diarization systems attempt to learn a single model for the task, they are often cumbersome to train and require large supervised datasets. In this paper, we propose an end-to-end supervised hierarchical clustering algorithm based on graph neural networks (GNN), called End-to-end Supervised HierARchical Clustering (E-SHARC). The embedding extractor is initialized using a pre-trained x-vector model while the GNN model is trained initially using the x-vector embeddings from the pre-trained model. Finally, the E-SHARC model uses the front-end mel-filterbank features as input and jointly optimizes the embedding extractor and the GNN clustering module, performing representation learning, metric learning, and clustering with end-to-end optimization. Further, with additional inputs from an external overlap detector, the E-SHARC approach is capable of predicting the speakers in the overlapping speech regions. The experimental evaluation on benchmark datasets like AMI, Voxconverse and DISPLACE, illustrates that the proposed E-SHARC framework provides competitive diarization results using graph based clustering methods. |
11 pa...11 pages. Under review IEEE TASLP. \c{opyright} 2024 IEEE |
DGNN-YOLO: Dynamic Graph Neural Networks with YOLO11 for Small Object Detection and Tracking in Traffic Surveillance | 2024-12-02 | ShowAccurate detection and tracking of small objects such as pedestrians, cyclists, and motorbikes are critical for traffic surveillance systems, which are crucial in improving road safety and decision-making in intelligent transportation systems. However, traditional methods struggle with challenges such as occlusion, low resolution, and dynamic traffic conditions, necessitating innovative approaches to address these limitations. This paper introduces DGNN-YOLO, a novel framework integrating dynamic graph neural networks (DGNN) with YOLO11 to enhance small object detection and tracking in traffic surveillance systems. The framework leverages YOLO11's advanced spatial feature extraction capabilities for precise object detection and incorporates DGNN to model spatial-temporal relationships for robust real-time tracking dynamically. By constructing and updating graph structures, DGNN-YOLO effectively represents objects as nodes and their interactions as edges, ensuring adaptive and accurate tracking in complex and dynamic environments. Extensive experiments demonstrate that DGNN-YOLO consistently outperforms state-of-the-art methods in detecting and tracking small objects under diverse traffic conditions, achieving the highest precision (0.8382), recall (0.6875), and [email protected]:0.95 (0.6476), showcasing its robustness and scalability, particularly in challenging scenarios involving small and occluded objects. This work provides a scalable, real-time traffic surveillance and analysis solution, significantly contributing to intelligent transportation systems. |
|
Probabilistic Graph Rewiring via Virtual Nodes | 2024-12-02 | ShowMessage-passing graph neural networks (MPNNs) have emerged as a powerful paradigm for graph-based machine learning. Despite their effectiveness, MPNNs face challenges such as under-reaching and over-squashing, where limited receptive fields and structural bottlenecks hinder information flow in the graph. While graph transformers hold promise in addressing these issues, their scalability is limited due to quadratic complexity regarding the number of nodes, rendering them impractical for larger graphs. Here, we propose implicitly rewired message-passing neural networks (IPR-MPNNs), a novel approach that integrates implicit probabilistic graph rewiring into MPNNs. By introducing a small number of virtual nodes, i.e., adding additional nodes to a given graph and connecting them to existing nodes, in a differentiable, end-to-end manner, IPR-MPNNs enable long-distance message propagation, circumventing quadratic complexity. Theoretically, we demonstrate that IPR-MPNNs surpass the expressiveness of traditional MPNNs. Empirically, we validate our approach by showcasing its ability to mitigate under-reaching and over-squashing effects, achieving state-of-the-art performance across multiple graph datasets. Notably, IPR-MPNNs outperform graph transformers while maintaining significantly faster computational efficiency. |
Accep...Accepted at 38th Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, Canada |
A Survey on Deep Neural Networks in Collaborative Filtering Recommendation Systems | 2024-12-02 | ShowThis survey provides an examination of the use of Deep Neural Networks (DNN) in Collaborative Filtering (CF) recommendation systems. As the digital world increasingly relies on data-driven approaches, traditional CF techniques face limitations in scalability and flexibility. DNNs can address these challenges by effectively modeling complex, non-linear relationships within the data. We begin by exploring the fundamental principles of both collaborative filtering and deep neural networks, laying the groundwork for understanding their integration. Subsequently, we review key advancements in the field, categorizing various deep learning models that enhance CF systems, including Multilayer Perceptrons (MLP), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Graph Neural Networks (GNN), autoencoders, Generative Adversarial Networks (GAN), and Restricted Boltzmann Machines (RBM). The paper also discusses evaluation protocols, various publicly available auxiliary information, and data features. Furthermore, the survey concludes with a discussion of the challenges and future research opportunities in enhancing collaborative filtering systems with deep learning. |
32 pages, 12 figures |
Morphological-Symmetry-Equivariant Heterogeneous Graph Neural Network for Robotic Dynamics Learning | 2024-12-02 | ShowWe present a morphological-symmetry-equivariant heterogeneous graph neural network, namely MS-HGNN, for robotic dynamics learning, that integrates robotic kinematic structures and morphological symmetries into a single graph network. These structural priors are embedded into the learning architecture as constraints, ensuring high generalizability, sample and model efficiency. The proposed MS-HGNN is a versatile and general architecture that is applicable to various multi-body dynamic systems and a wide range of dynamics learning problems. We formally prove the morphological-symmetry-equivariant property of our MS-HGNN and validate its effectiveness across multiple quadruped robot learning problems using both real-world and simulated data. Our code is made publicly available at https://github.com/lunarlab-gatech/MorphSym-HGNN/. |
|
Superhypergraph Neural Networks and Plithogenic Graph Neural Networks: Theoretical Foundations | 2024-12-02 | ShowHypergraphs extend traditional graphs by allowing edges to connect multiple nodes, while superhypergraphs further generalize this concept to represent even more complex relationships. Neural networks, inspired by biological systems, are widely used for tasks such as pattern recognition, data classification, and prediction. Graph Neural Networks (GNNs), a well-established framework, have recently been extended to Hypergraph Neural Networks (HGNNs), with their properties and applications being actively studied. The Plithogenic Graph framework enhances graph representations by integrating multi-valued attributes, as well as membership and contradiction functions, enabling the detailed modeling of complex relationships. In the context of handling uncertainty, concepts such as Fuzzy Graphs and Neutrosophic Graphs have gained prominence. It is well established that Plithogenic Graphs serve as a generalization of both Fuzzy Graphs and Neutrosophic Graphs. Furthermore, the Fuzzy Graph Neural Network has been proposed and is an active area of research. This paper establishes the theoretical foundation for the development of SuperHyperGraph Neural Networks (SHGNNs) and Plithogenic Graph Neural Networks, expanding the applicability of neural networks to these advanced graph structures. While mathematical generalizations and proofs are presented, future computational experiments are anticipated. |
77 pages; 3 figures |
Lossless and Privacy-Preserving Graph Convolution Network for Federated Item Recommendation | 2024-12-02 | ShowGraph neural network (GNN) has emerged as a state-of-the-art solution for item recommendation. However, existing GNN-based recommendation methods rely on a centralized storage of fragmented user-item interaction sub-graphs and training on an aggregated global graph, which will lead to privacy concerns. As a response, some recent works develop GNN-based federated recommendation methods by exploiting decentralized and fragmented user-item sub-graphs in order to preserve user privacy. However, due to privacy constraints, the graph convolution process in existing federated recommendation methods is incomplete compared with the centralized counterpart, causing a degradation of the recommendation performance. In this paper, we propose a novel lossless and privacy-preserving graph convolution network (LP-GCN), which fully completes the graph convolution process with decentralized user-item interaction sub-graphs while ensuring privacy. It is worth mentioning that its performance is equivalent to that of the non-federated (i.e., centralized) counterpart. Moreover, we validate its effectiveness through both theoretical analysis and empirical studies. Extensive experiments on three real-world datasets show that our LP-GCN outperforms the existing federated recommendation methods. The code will be publicly available once the paper is accepted. |
|
Recurrent Aggregators in Neural Algorithmic Reasoning | 2024-12-01 | ShowNeural algorithmic reasoning (NAR) is an emerging field that seeks to design neural networks that mimic classical algorithmic computations. Today, graph neural networks (GNNs) are widely used in neural algorithmic reasoners due to their message passing framework and permutation equivariance. In this extended abstract, we challenge this design choice, and replace the equivariant aggregation function with a recurrent neural network. While seemingly counter-intuitive, this approach has appropriate grounding when nodes have a natural ordering -- and this is the case frequently in established reasoning benchmarks like CLRS-30. Indeed, our recurrent NAR (RNAR) model performs very strongly on such tasks, while handling many others gracefully. A notable achievement of RNAR is its decisive state-of-the-art result on the Heapsort and Quickselect tasks, both deemed as a significant challenge for contemporary neural algorithmic reasoners -- especially the latter, where RNAR achieves a mean micro-F1 score of 87%. |
Prese...Presented at the Third Learning on Graphs Conference (LoG 2024). 10 pages, 1 figure |
A Cognac shot to forget bad memories: Corrective Unlearning in GNNs | 2024-12-01 | ShowGraph Neural Networks (GNNs) are increasingly being used for a variety of ML applications on graph data. As graph data does not follow the independently and identically distributed (i.i.d) assumption, adversarial manipulations or incorrect data can propagate to other data points through message passing, deteriorating the model's performance. To allow model developers to remove the adverse effects of manipulated entities from a trained GNN, we study the recently formulated problem of Corrective Unlearning. We find that current graph unlearning methods fail to unlearn the effect of manipulations even when the whole manipulated set is known. We introduce a new graph unlearning method, Cognac, which can unlearn the effect of the manipulation set even when only 5% of it is identified. It recovers most of the performance of a strong oracle with fully corrected training data, even beating retraining from scratch without the deletion set while being 8x more efficient. We hope our work guides GNN developers in fixing harmful effects due to issues in real-world data post-training. |
|
BEV-SUSHI: Multi-Target Multi-Camera 3D Detection and Tracking in Bird's-Eye View | 2024-12-01 | ShowObject perception from multi-view cameras is crucial for intelligent systems, particularly in indoor environments, e.g., warehouses, retail stores, and hospitals. Most traditional multi-target multi-camera (MTMC) detection and tracking methods rely on 2D object detection, single-view multi-object tracking (MOT), and cross-view re-identification (ReID) techniques, without properly handling important 3D information by multi-view image aggregation. In this paper, we propose a 3D object detection and tracking framework, named BEV-SUSHI, which first aggregates multi-view images with necessary camera calibration parameters to obtain 3D object detections in bird's-eye view (BEV). Then, we introduce hierarchical graph neural networks (GNNs) to track these 3D detections in BEV for MTMC tracking results. Unlike existing methods, BEV-SUSHI has impressive generalizability across different scenes and diverse camera settings, with exceptional capability for long-term association handling. As a result, our proposed BEV-SUSHI establishes the new state-of-the-art on the AICity'24 dataset with 81.22 HOTA, and 95.6 IDF1 on the WildTrack dataset. |
|
Towards Dynamic Message Passing on Graphs | 2024-12-01 | ShowMessage passing plays a vital role in graph neural networks (GNNs) for effective feature learning. However, the over-reliance on input topology diminishes the efficacy of message passing and restricts the ability of GNNs. Despite efforts to mitigate the reliance, existing study encounters message-passing bottlenecks or high computational expense problems, which invokes the demands for flexible message passing with low complexity. In this paper, we propose a novel dynamic message-passing mechanism for GNNs. It projects graph nodes and learnable pseudo nodes into a common space with measurable spatial relations between them. With nodes moving in the space, their evolving relations facilitate flexible pathway construction for a dynamic message-passing process. Associating pseudo nodes to input graphs with their measured relations, graph nodes can communicate with each other intermediately through pseudo nodes under linear complexity. We further develop a GNN model named |
Accep...Accepted by NeurIPS 2024 |
Large Language Models as Interpolated and Extrapolated Event Predictors | 2024-11-30 | ShowSalient facts of sociopolitical events are distilled into quadruples following a format of subject, relation, object, and timestamp. Machine learning methods, such as graph neural networks (GNNs) and recurrent neural networks (RNNs), have been built to make predictions and infer relations on the quadruple-based knowledge graphs (KGs). In many applications, quadruples are extended to quintuples with auxiliary attributes such as text summaries that describe the quadruple events. In this paper, we comprehensively investigate how large language models (LLMs) streamline the design of event prediction frameworks using quadruple-based or quintuple-based data while maintaining competitive accuracy. We propose LEAP, a unified framework that leverages large language models as event predictors. Specifically, we develop multiple prompt templates to frame the object prediction (OP) task as a standard question-answering (QA) task, suitable for instruction fine-tuning with an encoder-decoder LLM. For multi-event forecasting (MEF) task, we design a simple yet effective prompt template for each event quintuple. This novel approach removes the need for GNNs and RNNs, instead utilizing an encoder-only LLM to generate fixed intermediate embeddings, which are processed by a customized downstream head with a self-attention mechanism to predict potential relation occurrences in the future. Extensive experiments on multiple real-world datasets using various evaluation metrics validate the effectiveness of our approach. |
11 pa...11 pages, 3 figures, 10 tables |
Exact Certification of (Graph) Neural Networks Against Label Poisoning | 2024-11-30 | ShowMachine learning models are highly vulnerable to label flipping, i.e., the adversarial modification (poisoning) of training labels to compromise performance. Thus, deriving robustness certificates is important to guarantee that test predictions remain unaffected and to understand worst-case robustness behavior. However, for Graph Neural Networks (GNNs), the problem of certifying label flipping has so far been unsolved. We change this by introducing an exact certification method, deriving both sample-wise and collective certificates. Our method leverages the Neural Tangent Kernel (NTK) to capture the training dynamics of wide networks enabling us to reformulate the bilevel optimization problem representing label flipping into a Mixed-Integer Linear Program (MILP). We apply our method to certify a broad range of GNN architectures in node classification tasks. Thereby, concerning the worst-case robustness to label flipping: |
Under review |
A Self-Explainable Heterogeneous GNN for Relational Deep Learning | 2024-11-30 | ShowRecently, significant attention has been given to the idea of viewing relational databases as heterogeneous graphs, enabling the application of graph neural network (GNN) technology for predictive tasks. However, existing GNN methods struggle with the complexity of the heterogeneous graphs induced by databases with numerous tables and relations. Traditional approaches either consider all possible relational meta-paths, thus failing to scale with the number of relations, or rely on domain experts to identify relevant meta-paths. A recent solution does manage to learn informative meta-paths without expert supervision, but assumes that a node's class depends solely on the existence of a meta-path occurrence. In this work, we present a self-explainable heterogeneous GNN for relational data, that supports models in which class membership depends on aggregate information obtained from multiple occurrences of a meta-path. Experimental results show that in the context of relational databases, our approach effectively identifies informative meta-paths that faithfully capture the model's reasoning mechanisms. It significantly outperforms existing methods in both synthetic and real-world scenario. |
|
Graph-to-SFILES: Control structure prediction from process topologies using generative artificial intelligence | 2024-11-30 | ShowControl structure design is an important but tedious step in P&ID development. Generative artificial intelligence (AI) promises to reduce P&ID development time by supporting engineers. Previous research on generative AI in chemical process design mainly represented processes by sequences. However, graphs offer a promising alternative because of their permutation invariance. We propose the Graph-to-SFILES model, a generative AI method to predict control structures from flowsheet topologies. The Graph-to-SFILES model takes the flowsheet topology as a graph input and returns a control-extended flowsheet as a sequence in the SFILES 2.0 notation. We compare four different graph encoder architectures, one of them being a graph neural network (GNN) proposed in this work. The Graph-to-SFILES model achieves a top-5 accuracy of 73.2% when trained on 10,000 flowsheet topologies. In addition, the proposed GNN performs best among the encoder architectures. Compared to a purely sequence-based approach, the Graph-to-SFILES model improves the top-5 accuracy for a relatively small training dataset of 1,000 flowsheets from 0.9% to 28.4%. However, the sequence-based approach performs better on a large-scale dataset of 100,000 flowsheets. These results highlight the potential of graph-based AI models to accelerate P&ID development in small-data regimes but their effectiveness on industry relevant case studies still needs to be investigated. |
|
Toward Fair Graph Neural Networks Via Dual-Teacher Knowledge Distillation | 2024-11-30 | ShowGraph Neural Networks (GNNs) have demonstrated strong performance in graph representation learning across various real-world applications. However, they often produce biased predictions caused by sensitive attributes, such as religion or gender, an issue that has been largely overlooked in existing methods. Recently, numerous studies have focused on reducing biases in GNNs. However, these approaches often rely on training with partial data (e.g., using either node features or graph structure alone), which can enhance fairness but frequently compromises model utility due to the limited utilization of available graph information. To address this tradeoff, we propose an effective strategy to balance fairness and utility in knowledge distillation. Specifically, we introduce FairDTD, a novel Fair representation learning framework built on Dual-Teacher Distillation, leveraging a causal graph model to guide and optimize the design of the distillation process. Specifically, FairDTD employs two fairness-oriented teacher models: a feature teacher and a structure teacher, to facilitate dual distillation, with the student model learning fairness knowledge from the teachers while also leveraging full data to mitigate utility loss. To enhance information transfer, we incorporate graph-level distillation to provide an indirect supplement of graph information during training, as well as a node-specific temperature module to improve the comprehensive transfer of fair knowledge. Experiments on diverse benchmark datasets demonstrate that FairDTD achieves optimal fairness while preserving high model utility, showcasing its effectiveness in fair representation learning for GNNs. |
|
Towards Neural Scaling Laws on Graphs | 2024-11-30 | ShowDeep graph models (e.g., graph neural networks and graph transformers) have become important techniques for leveraging knowledge across various types of graphs. Yet, the neural scaling laws on graphs, i.e., how the performance of deep graph models changes with model and dataset sizes, have not been systematically investigated, casting doubts on the feasibility of achieving large graph models. To fill this gap, we benchmark many graph datasets from different tasks and make an attempt to establish the neural scaling laws on graphs from both model and data perspectives. The model size we investigated is up to 100 million parameters, and the dataset size investigated is up to 50 million samples. We first verify the validity of such laws on graphs, establishing proper formulations to describe the scaling behaviors. For model scaling, we identify that despite the parameter numbers, the model depth also plays an important role in affecting the model scaling behaviors, which differs from observations in other domains such as CV and NLP. For data scaling, we suggest that the number of graphs can not effectively measure the graph data volume in scaling law since the sizes of different graphs are highly irregular. Instead, we reform the data scaling law with the number of nodes or edges as the metric to address the irregular graph sizes. We further demonstrate that the reformed law offers a unified view of the data scaling behaviors for various fundamental graph tasks including node classification, link prediction, and graph classification. This work provides valuable insights into neural scaling laws on graphs, which can serve as an important tool for collecting new graph data and developing large graph models. |
|
One Model for One Graph: A New Perspective for Pretraining with Cross-domain Graphs | 2024-11-30 | ShowGraph Neural Networks (GNNs) have emerged as a powerful tool to capture intricate network patterns, achieving success across different domains. However, existing GNNs require careful domain-specific architecture designs and training from scratch on each dataset, leading to an expertise-intensive process with difficulty in generalizing across graphs from different domains. Therefore, it can be hard for practitioners to infer which GNN model can generalize well to graphs from their domains. To address this challenge, we propose a novel cross-domain pretraining framework, "one model for one graph," which overcomes the limitations of previous approaches that failed to use a single GNN to capture diverse graph patterns across domains with significant gaps. Specifically, we pretrain a bank of expert models, with each one corresponding to a specific dataset. When inferring to a new graph, gating functions choose a subset of experts to effectively integrate prior model knowledge while avoiding negative transfer. Extensive experiments consistently demonstrate the superiority of our proposed method on both link prediction and node classification tasks. |
|
Attribute-Enhanced Similarity Ranking for Sparse Link Prediction | 2024-11-29 | ShowLink prediction is a fundamental problem in graph data. In its most realistic setting, the problem consists of predicting missing or future links between random pairs of nodes from the set of disconnected pairs. Graph Neural Networks (GNNs) have become the predominant framework for link prediction. GNN-based methods treat link prediction as a binary classification problem and handle the extreme class imbalance -- real graphs are very sparse -- by sampling (uniformly at random) a balanced number of disconnected pairs not only for training but also for evaluation. However, we show that the reported performance of GNNs for link prediction in the balanced setting does not translate to the more realistic imbalanced setting and that simpler topology-based approaches are often better at handling sparsity. These findings motivate Gelato, a similarity-based link-prediction method that applies (1) graph learning based on node attributes to enhance a topological heuristic, (2) a ranking loss for addressing class imbalance, and (3) a negative sampling scheme that efficiently selects hard training pairs via graph partitioning. Experiments show that Gelato outperforms existing GNN-based alternatives. |
To ap...To appear at the 31st SIGKDD Conference on Knowledge Discovery and Data Mining - Research Track (August 2024 Deadline) |
Multigraph Message Passing with Bi-Directional Multi-Edge Aggregations | 2024-11-29 | ShowGraph Neural Networks (GNNs) have seen significant advances in recent years, yet their application to multigraphs, where parallel edges exist between the same pair of nodes, remains under-explored. Standard GNNs, designed for simple graphs, compute node representations by combining all connected edges at once, without distinguishing between edges from different neighbors. There are some GNN architectures proposed specifically for multigraph tasks, yet these architectures perform only node-level aggregation in their message-passing layers, which limits their expressive power. Furthermore, these approaches either lack permutation equivariance when a strict total edge ordering is absent, or fail to preserve the topological structure of the multigraph. To address all these shortcomings, we propose MEGA-GNN, a unified framework for message passing on multigraphs that can effectively perform diverse graph learning tasks. Our approach introduces a two-stage aggregation process in the message passing layers: first, parallel edges are aggregated, followed by a node-level aggregation that operates on aggregated messages from distinct neighbors. We show that MEGA-GNN supports permutation equivariance and invariance properties. We also show that MEGA-GNN is universal given a strict total order on the edges. Experiments on synthetic and real-world financial transaction datasets demonstrate that MEGA-GNN either significantly outperforms or is on par with the accuracy of state-of-the-art solutions. |
19 pages, 5 figures |
Twisted Convolutional Networks (TCNs): Enhancing Feature Interactions for Non-Spatial Data Classification | 2024-11-29 | ShowTwisted Convolutional Networks (TCNs) are introduced as a novel neural network architecture designed to effectively process one-dimensional data with arbitrary feature order and minimal spatial relationships. Unlike traditional Convolutional Neural Networks (CNNs), which excel at handling structured two-dimensional data like images, TCNs reduce dependency on feature order by combining input features in innovative ways to create new representations. By explicitly enhancing feature interactions and employing diverse feature combinations, TCNs generate richer and more informative representations, making them especially effective for classification tasks on datasets with arbitrary feature arrangements. This paper details the TCN architecture and its feature combination strategy, providing a comprehensive comparison with traditional CNNs, DeepSets, Transformers, and Graph Neural Networks (GNNs). Extensive experiments on benchmark datasets demonstrate that TCNs achieve superior performance, particularly in classification scenarios involving one-dimensional data. |
The s...The source code for the TCNs can be accessed at https://github.com/junbolian/Twisted-Convolutional-Networks |
Spatial Clustering of Molecular Localizations with Graph Neural Networks | 2024-11-29 | ShowSingle-molecule localization microscopy generates point clouds corresponding to fluorophore localizations. Spatial cluster identification and analysis of these point clouds are crucial for extracting insights about molecular organization. However, this task becomes challenging in the presence of localization noise, high point density, or complex biological structures. Here, we introduce MIRO (Multimodal Integration through Relational Optimization), an algorithm that uses recurrent graph neural networks to transform the point clouds in order to improve clustering efficiency when applying conventional clustering techniques. We show that MIRO supports simultaneous processing of clusters of different shapes and at multiple scales, demonstrating improved performance across varied datasets. Our comprehensive evaluation demonstrates MIRO's transformative potential for single-molecule localization applications, showcasing its capability to revolutionize cluster analysis and provide accurate, reliable details of molecular architecture. In addition, MIRO's robust clustering capabilities hold promise for applications in various fields such as neuroscience, for the analysis of neural connectivity patterns, and environmental science, for studying spatial distributions of ecological data. |
|
SDR-GNN: Spectral Domain Reconstruction Graph Neural Network for Incomplete Multimodal Learning in Conversational Emotion Recognition | 2024-11-29 | ShowMultimodal Emotion Recognition in Conversations (MERC) aims to classify utterance emotions using textual, auditory, and visual modal features. Most existing MERC methods assume each utterance has complete modalities, overlooking the common issue of incomplete modalities in real-world scenarios. Recently, graph neural networks (GNNs) have achieved notable results in Incomplete Multimodal Emotion Recognition in Conversations (IMERC). However, traditional GNNs focus on binary relationships between nodes, limiting their ability to capture more complex, higher-order information. Moreover, repeated message passing can cause over-smoothing, reducing their capacity to preserve essential high-frequency details. To address these issues, we propose a Spectral Domain Reconstruction Graph Neural Network (SDR-GNN) for incomplete multimodal learning in conversational emotion recognition. SDR-GNN constructs an utterance semantic interaction graph using a sliding window based on both speaker and context relationships to model emotional dependencies. To capture higher-order and high-frequency information, SDR-GNN utilizes weighted relationship aggregation, ensuring consistent semantic feature extraction across utterances. Additionally, it performs multi-frequency aggregation in the spectral domain, enabling efficient recovery of incomplete modalities by extracting both high- and low-frequency information. Finally, multi-head attention is applied to fuse and optimize features for emotion recognition. Extensive experiments on various real-world datasets demonstrate that our approach is effective in incomplete multimodal learning and outperforms current state-of-the-art methods. |
17 pages, 8 figures |
PerLA: Perceptive 3D Language Assistant | 2024-11-29 | ShowEnabling Large Language Models (LLMs) to understand the 3D physical world is an emerging yet challenging research direction. Current strategies for processing point clouds typically downsample the scene or divide it into smaller parts for separate analysis. However, both approaches risk losing key local details or global contextual information. In this paper, we introduce PerLA, a 3D language assistant designed to be more perceptive to both details and context, making visual representations more informative for the LLM. PerLA captures high-resolution (local) details in parallel from different point cloud areas and integrates them with (global) context obtained from a lower-resolution whole point cloud. We present a novel algorithm that preserves point cloud locality through the Hilbert curve and effectively aggregates local-to-global information via cross-attention and a graph neural network. Lastly, we introduce a novel loss for local representation consensus to promote training stability. PerLA outperforms state-of-the-art 3D language assistants, with gains of up to +1.34 CiDEr on ScanQA for question answering, and +4.22 on ScanRefer and +3.88 on Nr3D for dense captioning.\url{https://gfmei.github.io/PerLA/} |
|
Graph Neural Networks for Heart Failure Prediction on an EHR-Based Patient Similarity Graph | 2024-11-29 | ShowObjective: In modern healthcare, accurately predicting diseases is a crucial matter. This study introduces a novel approach using graph neural networks (GNNs) and a Graph Transformer (GT) to predict the incidence of heart failure (HF) on a patient similarity graph at the next hospital visit. Materials and Methods: We used electronic health records (EHR) from the MIMIC-III dataset and applied the K-Nearest Neighbors (KNN) algorithm to create a patient similarity graph using embeddings from diagnoses, procedures, and medications. Three models - GraphSAGE, Graph Attention Network (GAT), and Graph Transformer (GT) - were implemented to predict HF incidence. Model performance was evaluated using F1 score, AUROC, and AUPRC metrics, and results were compared against baseline algorithms. An interpretability analysis was performed to understand the model's decision-making process. Results: The GT model demonstrated the best performance (F1 score: 0.5361, AUROC: 0.7925, AUPRC: 0.5168). Although the Random Forest (RF) baseline achieved a similar AUPRC value, the GT model offered enhanced interpretability due to the use of patient relationships in the graph structure. A joint analysis of attention weights, graph connectivity, and clinical features provided insight into model predictions across different classification groups. Discussion and Conclusion: Graph-based approaches such as GNNs provide an effective framework for predicting HF. By leveraging a patient similarity graph, GNNs can capture complex relationships in EHR data, potentially improving prediction accuracy and clinical interpretability. |
|
Powerformer: A Section-adaptive Transformer for Power Flow Adjustment | 2024-11-29 | ShowIn this paper, we present a novel transformer architecture tailored for learning robust power system state representations, which strives to optimize power dispatch for the power flow adjustment across different transmission sections. Specifically, our proposed approach, named Powerformer, develops a dedicated section-adaptive attention mechanism, separating itself from the self-attention used in conventional transformers. This mechanism effectively integrates power system states with transmission section information, which facilitates the development of robust state representations. Furthermore, by considering the graph topology of power system and the electrical attributes of bus nodes, we introduce two customized strategies to further enhance the expressiveness: graph neural network propagation and multi-factor attention mechanism. Extensive evaluations are conducted on three power system scenarios, including the IEEE 118-bus system, a realistic 300-bus system in China, and a large-scale European system with 9241 buses, where Powerformer demonstrates its superior performance over several baseline methods. |
8 figures |
RL-MILP Solver: A Reinforcement Learning Approach for Solving Mixed-Integer Linear Programs with Graph Neural Networks | 2024-11-29 | ShowMixed-Integer Linear Programming (MILP) is an optimization technique widely used in various fields. Primal heuristics, which reduce the search space of MILP, have enabled traditional solvers (e.g., Gurobi) to efficiently find high-quality solutions. However, traditional primal heuristics rely on expert knowledge, motivating the advent of machine learning (ML)-based primal heuristics that learn repetitive patterns in MILP. Nonetheless, existing ML-based primal heuristics do not guarantee solution feasibility (i.e., satisfying all constraints) and primarily focus on prediction for binary decision variables. When addressing MILP involving non-binary integer variables using ML-based approaches, feasibility issues can become even more pronounced. Since finding an optimal solution requires satisfying all constraints, addressing feasibility is critical. To overcome these limitations, we propose a novel reinforcement learning (RL)-based solver that interacts with MILP to find feasible solutions, rather than delegating sub-problems to traditional solvers. We design reward functions tailored for MILP, which enables the RL agent to learn relationships between decision variables and constraints. Additionally, to effectively model complex relationships among decision variables, we leverage a Transformer encoder-based graph neural network (GNN). Our experimental results demonstrate that the proposed method can solve MILP problems and find near-optimal solutions without delegating the remainder to traditional solvers. The proposed method provides a meaningful step forward as an initial study in solving MILP problems end-to-end based solely on ML. |
|
ContextGNN: Beyond Two-Tower Recommendation Systems | 2024-11-29 | ShowRecommendation systems predominantly utilize two-tower architectures, which evaluate user-item rankings through the inner product of their respective embeddings. However, one key limitation of two-tower models is that they learn a pair-agnostic representation of users and items. In contrast, pair-wise representations either scale poorly due to their quadratic complexity or are too restrictive on the candidate pairs to rank. To address these issues, we introduce Context-based Graph Neural Networks (ContextGNNs), a novel deep learning architecture for link prediction in recommendation systems. The method employs a pair-wise representation technique for familiar items situated within a user's local subgraph, while leveraging two-tower representations to facilitate the recommendation of exploratory items. A final network then predicts how to fuse both pair-wise and two-tower recommendations into a single ranking of items. We demonstrate that ContextGNN is able to adapt to different data characteristics and outperforms existing methods, both traditional and GNN-based, on a diverse set of practical recommendation tasks, improving performance by 20% on average. |
14 pa...14 pages, 1 figure, 5 tables |
Graph-Enhanced EEG Foundation Model | 2024-11-29 | ShowElectroencephalography (EEG) signals provide critical insights for applications in disease diagnosis and healthcare. However, the scarcity of labeled EEG data poses a significant challenge. Foundation models offer a promising solution by leveraging large-scale unlabeled data through pre-training, enabling strong performance across diverse tasks. While both temporal dynamics and inter-channel relationships are vital for understanding EEG signals, existing EEG foundation models primarily focus on the former, overlooking the latter. To address this limitation, we propose a novel foundation model for EEG that integrates both temporal and inter-channel information. Our architecture combines Graph Neural Networks (GNNs), which effectively capture relational structures, with a masked autoencoder to enable efficient pre-training. We evaluated our approach using three downstream tasks and experimented with various GNN architectures. The results demonstrate that our proposed model, particularly when employing the GCN architecture with optimized configurations, consistently outperformed baseline methods across all tasks. These findings suggest that our model serves as a robust foundation model for EEG analysis. |
|
Gradient Inversion Attack on Graph Neural Networks | 2024-11-29 | ShowGraph federated learning is of essential importance for training over large graph datasets while protecting data privacy, where each client stores a subset of local graph data, while the server collects the local gradients and broadcasts only the aggregated gradients. Recent studies reveal that a malicious attacker can steal private image data from gradient exchanging of neural networks during federated learning. However, none of the existing works have studied the vulnerability of graph data and graph neural networks under such attack. To answer this question, the present paper studies the problem of whether private data can be recovered from leaked gradients in both node classification and graph classification tasks and { proposes a novel attack named Graph Leakage from Gradients (GLG)}. Two widely-used GNN frameworks are analyzed, namely GCN and GraphSAGE. The effects of different model settings on recovery are extensively discussed. Through theoretical analysis and empirical validation, it is shown that parts of the graph data can be leaked from the gradients. |
|
Scale Invariance of Graph Neural Networks | 2024-11-28 | ShowWe address two fundamental challenges in Graph Neural Networks (GNNs): (1) the lack of theoretical support for invariance learning, a critical property in image processing, and (2) the absence of a unified model capable of excelling on both homophilic and heterophilic graph datasets. To tackle these issues, we establish and prove scale invariance in graphs, extending this key property to graph learning, and validate it through experiments on real-world datasets. Leveraging directed multi-scaled graphs and an adaptive self-loop strategy, we propose ScaleNet, a unified network architecture that achieves state-of-the-art performance across four homophilic and two heterophilic benchmark datasets. Furthermore, we show that through graph transformation based on scale invariance, uniform weights can replace computationally expensive edge weights in digraph inception networks while maintaining or improving performance. For another popular GNN approach to digraphs, we demonstrate the equivalence between Hermitian Laplacian methods and GraphSAGE with incidence normalization. ScaleNet bridges the gap between homophilic and heterophilic graph learning, offering both theoretical insights into scale invariance and practical advancements in unified graph learning. Our implementation is publicly available at https://github.com/Qin87/ScaleNet/tree/Aug23. |
13 pa...13 pages,. arXiv admin note: substantial text overlap with arXiv:2411.08758 |
Multi-modal graph neural networks for localized off-grid weather forecasting | 2024-11-28 | ShowUrgent applications like wildfire management and renewable energy generation require precise, localized weather forecasts near the Earth's surface. However, weather forecast products from machine learning or numerical weather models are currently generated on a global regular grid, on which a naive interpolation cannot accurately reflect fine-grained weather patterns close to the ground. In this work, we train a heterogeneous graph neural network (GNN) end-to-end to downscale gridded forecasts to off-grid locations of interest. This multi-modal GNN takes advantage of local historical weather observations (e.g., wind, temperature) to correct the gridded weather forecast at different lead times towards locally accurate forecasts. Each data modality is modeled as a different type of node in the graph. Using message passing, the node at the prediction location aggregates information from its heterogeneous neighbor nodes. Experiments using weather stations across the Northeastern United States show that our model outperforms a range of data-driven and non-data-driven off-grid forecasting methods. Our approach demonstrates how the gap between global large-scale weather models and locally accurate predictions can be bridged to inform localized decision-making. |
|
Balancing Molecular Information and Empirical Data in the Prediction of Physico-Chemical Properties | 2024-11-28 | ShowPredicting the physico-chemical properties of pure substances and mixtures is a central task in thermodynamics. Established prediction methods range from fully physics-based ab-initio calculations, which are only feasible for very simple systems, over descriptor-based methods that use some information on the molecules to be modeled together with fitted model parameters (e.g., quantitative-structure-property relationship methods or classical group contribution methods), to representation-learning methods, which may, in extreme cases, completely ignore molecular descriptors and extrapolate only from existing data on the property to be modeled (e.g., matrix completion methods). In this work, we propose a general method for combining molecular descriptors with representation learning using the so-called expectation maximization algorithm from the probabilistic machine learning literature, which uses uncertainty estimates to trade off between the two approaches. The proposed hybrid model exploits chemical structure information using graph neural networks, but it automatically detects cases where structure-based predictions are unreliable, in which case it corrects them by representation-learning based predictions that can better specialize to unusual cases. The effectiveness of the proposed method is demonstrated using the prediction of activity coefficients in binary mixtures as an example. The results are compelling, as the method significantly improves predictive accuracy over the current state of the art, showcasing its potential to advance the prediction of physico-chemical properties in general. |
14 pa...14 pages, including 11 pages of main text and 3 pages of appendix, added analysis of improvements in predictive accuracy, added Figure 5, Figure 6, Figure 7 |
GRU-PFG: Extract Inter-Stock Correlation from Stock Factors with Graph Neural Network | 2024-11-28 | ShowThe complexity of stocks and industries presents challenges for stock prediction. Currently, stock prediction models can be divided into two categories. One category, represented by GRU and ALSTM, relies solely on stock factors for prediction, with limited effectiveness. The other category, represented by HIST and TRA, incorporates not only stock factors but also industry information, industry financial reports, public sentiment, and other inputs for prediction. The second category of models can capture correlations between stocks by introducing additional information, but the extra data is difficult to standardize and generalize. Considering the current state and limitations of these two types of models, this paper proposes the GRU-PFG (Project Factors into Graph) model. This model only takes stock factors as input and extracts inter-stock correlations using graph neural networks. It achieves prediction results that not only outperform the others models relies solely on stock factors, but also achieve comparable performance to the second category models. The experimental results show that on the CSI300 dataset, the IC of GRU-PFG is 0.134, outperforming HIST's 0.131 and significantly surpassing GRU and Transformer, achieving results better than the second category models. Moreover as a model that relies solely on stock factors, it has greater potential for generalization. |
17pages |
NeuroLifting: Neural Inference on Markov Random Fields at Scale | 2024-11-28 | ShowInference in large-scale Markov Random Fields (MRFs) is a critical yet challenging task, traditionally approached through approximate methods like belief propagation and mean field, or exact methods such as the Toulbar2 solver. These strategies often fail to strike an optimal balance between efficiency and solution quality, particularly as the problem scale increases. This paper introduces NeuroLifting, a novel technique that leverages Graph Neural Networks (GNNs) to reparameterize decision variables in MRFs, facilitating the use of standard gradient descent optimization. By extending traditional lifting techniques into a non-parametric neural network framework, NeuroLifting benefits from the smooth loss landscape of neural networks, enabling efficient and parallelizable optimization. Empirical results demonstrate that, on moderate scales, NeuroLifting performs very close to the exact solver Toulbar2 in terms of solution quality, significantly surpassing existing approximate methods. Notably, on large-scale MRFs, NeuroLifting delivers superior solution quality against all baselines, as well as exhibiting linear computational complexity growth. This work presents a significant advancement in MRF inference, offering a scalable and effective solution for large-scale problems. |
|
Towards Data-centric Machine Learning on Directed Graphs: a Survey | 2024-11-28 | ShowIn recent years, Graph Neural Networks (GNNs) have made significant advances in processing structured data. However, most of them primarily adopted a model-centric approach, which simplifies graphs by converting it into undirected formats and emphasizes model designs. This approach is inherently constrained in real-world applications due to inevitable information loss in simple undirected graphs and data-driven model optimization dilemmas associated with exceeding the upper bounds of representational capacity. As a result, there has been a shift toward data-centric methods that prioritize improving graph quality and representation. Specifically, various types of graphs can be derived from naturally structured data, including heterogeneous graphs, hypergraphs, and directed graphs. Among these, directed graphs offer distinct advantages in topological systems by modeling causal relationships, and directed GNNs have been extensively studied in recent years. However, a comprehensive survey of this emerging topic is still lacking. Therefore, we aim to provide a comprehensive review of directed graph learning, with a particular focus on a data-centric perspective. Specifically, we first introduce a novel taxonomy for existing studies. Subsequently, we re-examine these methods from the data-centric perspective, with an emphasis on understanding and improving data representation. It demonstrates that a deep understanding of directed graphs and its quality plays a crucial role in model performance. Additionally, we explore the diverse applications of directed GNNs across 10+ domains, highlighting their broad applicability. Finally, we identify key opportunities and challenges within the field, offering insights that can guide future research and development in directed graph learning. |
In Progress |
Federated Continual Graph Learning | 2024-11-28 | ShowIn the era of big data, managing evolving graph data poses substantial challenges due to storage costs and privacy issues. Training graph neural networks (GNNs) on such evolving data usually causes catastrophic forgetting, impairing performance on earlier tasks. Despite existing continual graph learning (CGL) methods mitigating this to some extent, they predominantly operate in centralized architectures and overlook the potential of distributed graph databases to harness collective intelligence for enhanced performance optimization. To address these challenges, we present a pioneering study on Federated Continual Graph Learning (FCGL), which adapts GNNs to multiple evolving graphs within decentralized settings while adhering to storage and privacy constraints. Our work begins with a comprehensive empirical analysis of FCGL, assessing its data characteristics, feasibility, and effectiveness, and reveals two principal challenges: local graph forgetting (LGF), where local GNNs forget prior knowledge when adapting to new tasks, and global expertise conflict (GEC), where the global GNN exhibits sub-optimal performance in both adapting to new tasks and retaining old ones, arising from inconsistent client expertise during server-side parameter aggregation. To tackle these, we propose the POWER framework, which mitigates LGF by preserving and replaying experience nodes with maximum local-global coverage at each client and addresses GEC by using a pseudo prototype reconstruction strategy and trajectory-aware knowledge transfer at the central server. Extensive evaluations across multiple graph datasets demonstrate POWER's superior performance over straightforward federated extensions of the centralized CGL algorithms and vision-focused federated continual learning algorithms. Our code is available at https://github.com/zyl24/FCGL_POWER. |
Under Review |
FedRGL: Robust Federated Graph Learning for Label Noise | 2024-11-28 | ShowFederated Graph Learning (FGL) is a distributed machine learning paradigm based on graph neural networks, enabling secure and collaborative modeling of local graph data among clients. However, label noise can degrade the global model's generalization performance. Existing federated label noise learning methods, primarily focused on computer vision, often yield suboptimal results when applied to FGL. To address this, we propose a robust federated graph learning method with label noise, termed FedRGL. FedRGL introduces dual-perspective consistency noise node filtering, leveraging both the global model and subgraph structure under class-aware dynamic thresholds. To enhance client-side training, we incorporate graph contrastive learning, which improves encoder robustness and assigns high-confidence pseudo-labels to noisy nodes. Additionally, we measure model quality via predictive entropy of unlabeled nodes, enabling adaptive robust aggregation of the global model. Comparative experiments on multiple real-world graph datasets show that FedRGL outperforms 12 baseline methods across various noise rates, types, and numbers of clients. |
|
Perturbation Ontology based Graph Attention Networks | 2024-11-27 | ShowIn recent years, graph representation learning has undergone a paradigm shift, driven by the emergence and proliferation of graph neural networks (GNNs) and their heterogeneous counterparts. Heterogeneous GNNs have shown remarkable success in extracting low-dimensional embeddings from complex graphs that encompass diverse entity types and relationships. While meta-path-based techniques have long been recognized for their ability to capture semantic affinities among nodes, their dependence on manual specification poses a significant limitation. In contrast, matrix-focused methods accelerate processing by utilizing structural cues but often overlook contextual richness. In this paper, we challenge the current paradigm by introducing ontology as a fundamental semantic primitive within complex graphs. Our goal is to integrate the strengths of both matrix-centric and meta-path-based approaches into a unified framework. We propose perturbation Ontology-based Graph Attention Networks (POGAT), a novel methodology that combines ontology subgraphs with an advanced self-supervised learning paradigm to achieve a deep contextual understanding. The core innovation of POGAT lies in our enhanced homogeneous perturbing scheme designed to generate rigorous negative samples, encouraging the model to explore minimal contextual features more thoroughly. Through extensive empirical evaluations, we demonstrate that POGAT significantly outperforms state-of-the-art baselines, achieving a groundbreaking improvement of up to 10.78% in F1-score for the critical task of link prediction and 12.01% in Micro-F1 for the critical task of node classification. |
|
Multiscale Hodge Scattering Networks for Data Analysis | 2024-11-27 | ShowWe propose new scattering networks for signals measured on simplicial complexes, which we call \emph{Multiscale Hodge Scattering Networks} (MHSNs). Our construction is based on multiscale basis dictionaries on simplicial complexes, i.e., the |
20 Pa...20 Pages, Comments Welcome |
Learning optimal objective values for MILP | 2024-11-27 | ShowModern Mixed Integer Linear Programming (MILP) solvers use the Branch-and-Bound algorithm together with a plethora of auxiliary components that speed up the search. In recent years, there has been an explosive development in the use of machine learning for enhancing and supporting these algorithmic components. Within this line, we propose a methodology for predicting the optimal objective value, or, equivalently, predicting if the current incumbent is optimal. For this task, we introduce a predictor based on a graph neural network (GNN) architecture, together with a set of dynamic features. Experimental results on diverse benchmarks demonstrate the efficacy of our approach, achieving high accuracy in the prediction task and outperforming existing methods. These findings suggest new opportunities for integrating ML-driven predictions into MILP solvers, enabling smarter decision-making and improved performance. |
|
CaT-GNN: Enhancing Credit Card Fraud Detection via Causal Temporal Graph Neural Networks | 2024-11-27 | ShowCredit card fraud poses a significant threat to the economy. While Graph Neural Network (GNN)-based fraud detection methods perform well, they often overlook the causal effect of a node's local structure on predictions. This paper introduces a novel method for credit card fraud detection, the \textbf{\underline{Ca}}usal \textbf{\underline{T}}emporal \textbf{\underline{G}}raph \textbf{\underline{N}}eural \textbf{N}etwork (CaT-GNN), which leverages causal invariant learning to reveal inherent correlations within transaction data. By decomposing the problem into discovery and intervention phases, CaT-GNN identifies causal nodes within the transaction graph and applies a causal mixup strategy to enhance the model's robustness and interpretability. CaT-GNN consists of two key components: Causal-Inspector and Causal-Intervener. The Causal-Inspector utilizes attention weights in the temporal attention mechanism to identify causal and environment nodes without introducing additional parameters. Subsequently, the Causal-Intervener performs a causal mixup enhancement on environment nodes based on the set of nodes. Evaluated on three datasets, including a private financial dataset and two public datasets, CaT-GNN demonstrates superior performance over existing state-of-the-art methods. Our findings highlight the potential of integrating causal reasoning with graph neural networks to improve fraud detection capabilities in financial transactions. |
|
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | 2024-11-27 | ShowRecent advancements in large language model (LLM)-based agents have demonstrated that collective intelligence can significantly surpass the capabilities of individual agents, primarily due to well-crafted inter-agent communication topologies. Despite the diverse and high-performing designs available, practitioners often face confusion when selecting the most effective pipeline for their specific task: \textit{Which topology is the best choice for my task, avoiding unnecessary communication token overhead while ensuring high-quality solution?} In response to this dilemma, we introduce G-Designer, an adaptive, efficient, and robust solution for multi-agent deployment, which dynamically designs task-aware, customized communication topologies. Specifically, G-Designer models the multi-agent system as a multi-agent network, leveraging a variational graph auto-encoder to encode both the nodes (agents) and a task-specific virtual node, and decodes a task-adaptive and high-performing communication topology. Extensive experiments on six benchmarks showcase that G-Designer is: \textbf{(1) high-performing}, achieving superior results on MMLU with accuracy at |
|
Enhancing Signed Graph Neural Networks through Curriculum-Based Training | 2024-11-27 | ShowSigned graphs are powerful models for representing complex relations with both positive and negative connections. Recently, Signed Graph Neural Networks (SGNNs) have emerged as potent tools for analyzing such graphs. To our knowledge, no prior research has been conducted on devising a training plan specifically for SGNNs. The prevailing training approach feeds samples (edges) to models in a random order, resulting in equal contributions from each sample during the training process, but fails to account for varying learning difficulties based on the graph's structure. We contend that SGNNs can benefit from a curriculum that progresses from easy to difficult, similar to human learning. The main challenge is evaluating the difficulty of edges in a signed graph. We address this by theoretically analyzing the difficulty of SGNNs in learning adequate representations for edges in unbalanced cycles and propose a lightweight difficulty measurer. This forms the basis for our innovative Curriculum representation learning framework for Signed Graphs, referred to as CSG. The process involves using the measurer to assign difficulty scores to training samples, adjusting their order using a scheduler and training the SGNN model accordingly. We empirically our approach on six real-world signed graph datasets. Our method demonstrates remarkable results, enhancing the accuracy of popular SGNN models by up to 23.7% and showing a reduction of 8.4% in standard deviation, enhancing model stability. |
Submi...Submitted to Neural Networks |
Graph Neural Networks for Job Shop Scheduling Problems: A Survey | 2024-11-27 | ShowJob shop scheduling problems (JSSPs) represent a critical and challenging class of combinatorial optimization problems. Recent years have witnessed a rapid increase in the application of graph neural networks (GNNs) to solve JSSPs, albeit lacking a systematic survey of the relevant literature. This paper aims to thoroughly review prevailing GNN methods for different types of JSSPs and the closely related flow-shop scheduling problems (FSPs), especially those leveraging deep reinforcement learning (DRL). We begin by presenting the graph representations of various JSSPs, followed by an introduction to the most commonly used GNN architectures. We then review current GNN-based methods for each problem type, highlighting key technical elements such as graph representations, GNN architectures, GNN tasks, and training algorithms. Finally, we summarize and analyze the advantages and limitations of GNNs in solving JSSPs and provide potential future research opportunities. We hope this survey can motivate and inspire innovative approaches for more powerful GNN-based approaches in tackling JSSPs and other scheduling problems. |
Accep...Accepted by Computers & Operations Research |
Heterophilic Graph Neural Networks Optimization with Causal Message-passing | 2024-11-27 | ShowIn this work, we discover that causal inference provides a promising approach to capture heterophilic message-passing in Graph Neural Network (GNN). By leveraging cause-effect analysis, we can discern heterophilic edges based on asymmetric node dependency. The learned causal structure offers more accurate relationships among nodes. To reduce the computational complexity, we introduce intervention-based causal inference in graph learning. We first simplify causal analysis on graphs by formulating it as a structural learning model and define the optimization problem within the Bayesian scheme. We then present an analysis of decomposing the optimization target into a consistency penalty and a structure modification based on cause-effect relations. We then estimate this target by conditional entropy and present insights into how conditional entropy quantifies the heterophily. Accordingly, we propose CausalMP, a causal message-passing discovery network for heterophilic graph learning, that iteratively learns the explicit causal structure of input graphs. We conduct extensive experiments in both heterophilic and homophilic graph settings. The result demonstrates that the our model achieves superior link prediction performance. Training on causal structure can also enhance node representation in classification task across different base models. |
|
Causal and Local Correlations Based Network for Multivariate Time Series Classification | 2024-11-27 | ShowRecently, time series classification has attracted the attention of a large number of researchers, and hundreds of methods have been proposed. However, these methods often ignore the spatial correlations among dimensions and the local correlations among features. To address this issue, the causal and local correlations based network (CaLoNet) is proposed in this study for multivariate time series classification. First, pairwise spatial correlations between dimensions are modeled using causality modeling to obtain the graph structure. Then, a relationship extraction network is used to fuse local correlations to obtain long-term dependency features. Finally, the graph structure and long-term dependency features are integrated into the graph neural network. Experiments on the UEA datasets show that CaLoNet can obtain competitive performance compared with state-of-the-art methods. |
Submi...Submitted on April 03, 2023; major revisions on March 25, 2024; minor revisions on July 9, 2024 |
Graph Neural Network for Cerebral Blood Flow Prediction With Clinical Datasets | 2024-11-27 | ShowAccurate prediction of cerebral blood flow is essential for the diagnosis and treatment of cerebrovascular diseases. Traditional computational methods, however, often incur significant computational costs, limiting their practicality in real-time clinical applications. This paper proposes a graph neural network (GNN) to predict blood flow and pressure in previously unseen cerebral vascular network structures that were not included in training data. The GNN was developed using clinical datasets from patients with stenosis, featuring complex and abnormal vascular geometries. Additionally, the GNN model was trained on data incorporating a wide range of inflow conditions, vessel topologies, and network connectivities to enhance its generalization capability. The approach achieved Pearson's correlation coefficients of 0.727 for pressure and 0.824 for flow rate, with sufficient training data. These findings demonstrate the potential of the GNN for real-time cerebrovascular diagnostics, particularly in handling intricate and pathological vascular networks. |
4 pages, 3 figures |
Spatio-temporal Causal Learning for Streamflow Forecasting | 2024-11-26 | ShowStreamflow plays an essential role in the sustainable planning and management of national water resources. Traditional hydrologic modeling approaches simulate streamflow by establishing connections across multiple physical processes, such as rainfall and runoff. These data, inherently connected both spatially and temporally, possess intrinsic causal relations that can be leveraged for robust and accurate forecasting. Recently, spatio-temporal graph neural networks (STGNNs) have been adopted, excelling in various domains, such as urban traffic management, weather forecasting, and pandemic control, and they also promise advances in streamflow management. However, learning causal relationships directly from vast observational data is theoretically and computationally challenging. In this study, we employ a river flow graph as prior knowledge to facilitate the learning of the causal structure and then use the learned causal graph to predict streamflow at targeted sites. The proposed model, Causal Streamflow Forecasting (CSF) is tested in a real-world study in the Brazos River basin in Texas. Our results demonstrate that our method outperforms regular spatio-temporal graph neural networks and achieves higher computational efficiency compared to traditional simulation methods. By effectively integrating river flow graphs with STGNNs, this research offers a novel approach to streamflow prediction, showcasing the potential of combining advanced neural network techniques with domain-specific knowledge for enhanced performance in hydrologic modeling. |
To be...To be published at IEEE Big Data 2024 |
MADE: Graph Backdoor Defense with Masked Unlearning | 2024-11-26 | ShowGraph Neural Networks (GNNs) have garnered significant attention from researchers due to their outstanding performance in handling graph-related tasks, such as social network analysis, protein design, and so on. Despite their widespread application, recent research has demonstrated that GNNs are vulnerable to backdoor attacks, implemented by injecting triggers into the training datasets. Trained on the poisoned data, GNNs will predict target labels when attaching trigger patterns to inputs. This vulnerability poses significant security risks for applications of GNNs in sensitive domains, such as drug discovery. While there has been extensive research into backdoor defenses for images, strategies to safeguard GNNs against such attacks remain underdeveloped. Furthermore, we point out that conventional backdoor defense methods designed for images cannot work well when directly implemented on graph data. In this paper, we first analyze the key difference between image backdoor and graph backdoor attacks. Then we tackle the graph defense problem by presenting a novel approach called MADE, which devises an adversarial mask generation mechanism that selectively preserves clean sub-graphs and further leverages masks on edge weights to eliminate the influence of triggers effectively. Extensive experiments across various graph classification tasks demonstrate the effectiveness of MADE in significantly reducing the attack success rate (ASR) while maintaining a high classification accuracy. |
15 pages, 10 figures |
GNN 101: Visual Learning of Graph Neural Networks in Your Web Browser | 2024-11-26 | ShowGraph Neural Networks (GNNs) have achieved significant success across various applications. However, their complex structures and inner workings can be challenging for non-AI experts to understand. To address this issue, we present \name, an educational visualization tool for interactive learning of GNNs. GNN 101 seamlessly integrates mathematical formulas with visualizations via multiple levels of abstraction, including a model overview, layer operations, and detailed animations for matrix calculations. Users can easily switch between two complementary views: a node-link view that offers an intuitive understanding of the graph data, and a matrix view that provides a space-efficient and comprehensive overview of all features and their transformations across layers. GNN 101 not only demystifies GNN computations in an engaging and intuitive way but also effectively illustrates what a GNN learns about graph nodes at each layer. To ensure broad educational access, GNN 101 is open-source and available directly in web browsers without requiring any installations. |
|
Instance-Aware Graph Prompt Learning | 2024-11-26 | ShowGraph neural networks stand as the predominant technique for graph representation learning owing to their strong expressive power, yet the performance highly depends on the availability of high-quality labels in an end-to-end manner. Thus the pretraining and fine-tuning paradigm has been proposed to mitigate the label cost issue. Subsequently, the gap between the pretext tasks and downstream tasks has spurred the development of graph prompt learning which inserts a set of graph prompts into the original graph data with minimal parameters while preserving competitive performance. However, the current exploratory works are still limited since they all concentrate on learning fixed task-specific prompts which may not generalize well across the diverse instances that the task comprises. To tackle this challenge, we introduce Instance-Aware Graph Prompt Learning (IA-GPL) in this paper, aiming to generate distinct prompts tailored to different input instances. The process involves generating intermediate prompts for each instance using a lightweight architecture, quantizing these prompts through trainable codebook vectors, and employing the exponential moving average technique to ensure stable training. Extensive experiments conducted on multiple datasets and settings showcase the superior performance of IA-GPL compared to state-of-the-art baselines. |
|
CliquePH: Higher-Order Information for Graph Neural Networks through Persistent Homology on Clique Graphs | 2024-11-26 | ShowGraph neural networks have become the default choice by practitioners for graph learning tasks such as graph classification and node classification. Nevertheless, popular graph neural network models still struggle to capture higher-order information, i.e., information that goes \emph{beyond} pairwise interactions. Recent work has shown that persistent homology, a tool from topological data analysis, can enrich graph neural networks with topological information that they otherwise could not capture. Calculating such features is efficient for dimension 0 (connected components) and dimension 1 (cycles). However, when it comes to higher-order structures, it does not scale well, with a complexity of |
Publi...Published in Proceedings of the Third Learning on Graphs Conference (LoG 2024), PMLR 269 |
Orientation-Aware Graph Neural Networks for Protein Structure Representation Learning | 2024-11-26 | ShowBy folding to particular 3D structures, proteins play a key role in living beings. To learn meaningful representation from a protein structure for downstream tasks, not only the global backbone topology but the local fine-grained orientational relations between amino acids should also be considered. In this work, we propose the Orientation-Aware Graph Neural Networks (OAGNNs) to better sense the geometric characteristics in protein structure (e.g. inner-residue torsion angles, inter-residue orientations). Extending a single weight from a scalar to a 3D vector, we construct a rich set of geometric-meaningful operations to process both the classical and SO(3) representations of a given structure. To plug our designed perceptron unit into existing Graph Neural Networks, we further introduce an equivariant message passing paradigm, showing superior versatility in maintaining SO(3)-equivariance at the global scale. Experiments have shown that our OAGNNs have a remarkable ability to sense geometric orientational features compared to classical networks. OAGNNs have also achieved state-of-the-art performance on various computational biology applications related to protein 3D structures. |
|
A Graph Neural Network deep-dive into successful counterattacks | 2024-11-26 | ShowA counterattack in soccer is a high speed, high intensity direct attack that can occur when a team transitions from a defensive state to an attacking state after regaining possession of the ball. The aim is to create a goal-scoring opportunity by convering a lot of ground with minimal passes before the opposing team can recover their defensive shape. The purpose of this research is to build gender-specific Graph Neural Networks to model the likelihood of a counterattack being successful and uncover what factors make them successful in professional soccer. These models are trained on a total of 20863 frames of synchronized on-ball event and spatiotemporal (broadcast) tracking data. This dataset is derived from 632 games of MLS (2022), NWSL (2022) and international soccer (2020-2022). With this data we demonstrate that gender-specific Graph Neural Networks outperform architecturally identical gender-ambiguous models in predicting the successful outcome of counterattacks. We show, using Permutation Feature Importance, that byline to byline speed, angle to the goal, angle to the ball and sideline to sideline speed are the node features with the highest impact on model performance. Additionally, we offer some illustrative examples on how to navigate the infinite solution search space to aid in identifying improvements for player decision making. This research is accompanied by an open-source repository containing all data and code, and it is also accompanied by an open-source Python package which simplifies converting spatiotemporal data into graphs. This package also facilitates testing, validation, training and prediction with this data. This should allow the reader to replicate and improve upon our research more easily. |
11 pa...11 pages, 11 figures, first submitted (and accepted) at MIT Sloan Sports Analytics Conference 2023 |
Rewiring Techniques to Mitigate Oversquashing and Oversmoothing in GNNs: A Survey | 2024-11-26 | ShowGraph Neural Networks (GNNs) are powerful tools for learning from graph-structured data, but their effectiveness is often constrained by two critical challenges: oversquashing, where the excessive compression of information from distant nodes results in significant information loss, and oversmoothing, where repeated message-passing iterations homogenize node representations, obscuring meaningful distinctions. These issues, intrinsically linked to the underlying graph structure, hinder information flow and constrain the expressiveness of GNNs. In this survey, we examine graph rewiring techniques, a class of methods designed to address these structural bottlenecks by modifying graph topology to enhance information diffusion. We provide a comprehensive review of state-of-the-art rewiring approaches, delving into their theoretical underpinnings, practical implementations, and performance trade-offs. |
|
Epidemiology-informed Graph Neural Network for Heterogeneity-aware Epidemic Forecasting | 2024-11-26 | ShowAmong various spatio-temporal prediction tasks, epidemic forecasting plays a critical role in public health management. Recent studies have demonstrated the strong potential of spatio-temporal graph neural networks (STGNNs) in extracting heterogeneous spatio-temporal patterns for epidemic forecasting. However, most of these methods bear an over-simplified assumption that two locations (e.g., cities) with similar observed features in previous time steps will develop similar infection numbers in the future. In fact, for any epidemic disease, there exists strong heterogeneity of its intrinsic evolution mechanisms across geolocation and time, which can eventually lead to diverged infection numbers in two ``similar'' locations. However, such mechanistic heterogeneity is non-trivial to be captured due to the existence of numerous influencing factors like medical resource accessibility, virus mutations, mobility patterns, etc., most of which are spatio-temporal yet unreachable or even unobservable. To address this challenge, we propose a Heterogeneous Epidemic-Aware Transmission Graph Neural Network (HeatGNN), a novel epidemic forecasting framework. By binding the epidemiology mechanistic model into a GNN, HeatGNN learns epidemiology-informed location embeddings of different locations that reflect their own transmission mechanisms over time. With the time-varying mechanistic affinity graphs computed with the epidemiology-informed location embeddings, a heterogeneous transmission graph network is designed to encode the mechanistic heterogeneity among locations, providing additional predictive signals to facilitate accurate forecasting. Experiments on three benchmark datasets have revealed that HeatGNN outperforms various strong baselines. Moreover, our efficiency analysis verifies the real-world practicality of HeatGNN on datasets of different sizes. |
14 pa...14 pages, 6 figures, 3 tables |
Knowledge-aware Evolutionary Graph Neural Architecture Search | 2024-11-26 | ShowGraph neural architecture search (GNAS) can customize high-performance graph neural network architectures for specific graph tasks or datasets. However, existing GNAS methods begin searching for architectures from a zero-knowledge state, ignoring the prior knowledge that may improve the search efficiency. The available knowledge base (e.g. NAS-Bench-Graph) contains many rich architectures and their multiple performance metrics, such as the accuracy (#Acc) and number of parameters (#Params). This study proposes exploiting such prior knowledge to accelerate the multi-objective evolutionary search on a new graph dataset, named knowledge-aware evolutionary GNAS (KEGNAS). KEGNAS employs the knowledge base to train a knowledge model and a deep multi-output Gaussian process (DMOGP) in one go, which generates and evaluates transfer architectures in only a few GPU seconds. The knowledge model first establishes a dataset-to-architecture mapping, which can quickly generate candidate transfer architectures for a new dataset. Subsequently, the DMOGP with architecture and dataset encodings is designed to predict multiple performance metrics for candidate transfer architectures on the new dataset. According to the predicted metrics, non-dominated candidate transfer architectures are selected to warm-start the multi-objective evolutionary algorithm for optimizing the #Acc and #Params on a new dataset. Empirical studies on NAS-Bench-Graph and five real-world datasets show that KEGNAS swiftly generates top-performance architectures, achieving 4.27% higher accuracy than advanced evolutionary baselines and 11.54% higher accuracy than advanced differentiable baselines. In addition, ablation studies demonstrate that the use of prior knowledge significantly improves the search performance. |
This ...This work has been accepted by Knowledge-Based Systems |
MetaGraphLoc: A Graph-based Meta-learning Scheme for Indoor Localization via Sensor Fusion | 2024-11-26 | ShowAccurate indoor localization remains challenging due to variations in wireless signal environments and limited data availability. This paper introduces MetaGraphLoc, a novel system leveraging sensor fusion, graph neural networks (GNNs), and meta-learning to overcome these limitations. MetaGraphLoc integrates received signal strength indicator measurements with inertial measurement unit data to enhance localization accuracy. Our proposed GNN architecture, featuring dynamic edge construction (DEC), captures the spatial relationships between access points and underlying data patterns. MetaGraphLoc employs a meta-learning framework to adapt the GNN model to new environments with minimal data collection, significantly reducing calibration efforts. Extensive evaluations demonstrate the effectiveness of MetaGraphLoc. Data fusion reduces localization error by 15.92%, underscoring its importance. The GNN with DEC outperforms traditional deep neural networks by up to 30.89%, considering accuracy. Furthermore, the meta-learning approach enables efficient adaptation to new environments, minimizing data collection requirements. These advancements position MetaGraphLoc as a promising solution for indoor localization, paving the way for improved navigation and location-based services in the ever-evolving Internet of Things networks. |
|
GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers | 2024-11-26 | ShowGraph Transformers (GTs) have demonstrated remarkable performance in incorporating various graph structure information, e.g., long-range structural dependency, into graph representation learning. However, self-attention -- the core module of GTs -- preserves only low-frequency signals on graph features, retaining only homophilic patterns that capture similar features among the connected nodes. Consequently, it has insufficient capacity in modeling complex node label patterns, such as the opposite of homophilic patterns -- heterophilic patterns. Some improved GTs deal with the problem by learning polynomial filters or performing self-attention over the first-order graph spectrum. However, these GTs either ignore rich information contained in the whole spectrum or neglect higher-order spectrum information, resulting in limited flexibility and frequency response in their spectral filters. To tackle these challenges, we propose a novel GT network, namely Graph Fourier Kolmogorov-Arnold Transformers (GrokFormer), to go beyond the self-attention in GTs. GrokFormer leverages learnable activation functions in order-$K$ graph spectrum through Fourier series modeling to i) learn eigenvalue-targeted filter functions producing learnable base that can capture a broad range of frequency signals flexibly, and ii) extract first- and higher-order graph spectral information adaptively. In doing so, GrokFormer can effectively capture intricate patterns hidden across different orders and levels of frequency signals, learning expressive, order-and-frequency-adaptive graph representations. Comprehensive experiments conducted on 10 node classification datasets across various domains, scales, and levels of graph heterophily, as well as 5 graph classification datasets, demonstrate that GrokFormer outperforms state-of-the-art GTs and other advanced graph neural networks. |
13 pa...13 pages, 6 figures, 7tables |
Generalization, Expressivity, and Universality of Graph Neural Networks on Attributed Graphs | 2024-11-26 | ShowWe analyze the universality and generalization of graph neural networks (GNNs) on attributed graphs, i.e., with node attributes. To this end, we propose pseudometrics over the space of all attributed graphs that describe the fine-grained expressivity of GNNs. Namely, GNNs are both Lipschitz continuous with respect to our pseudometrics and can separate attributed graphs that are distant in the metric. Moreover, we prove that the space of all attributed graphs is relatively compact with respect to our metrics. Based on these properties, we prove a universal approximation theorem for GNNs and generalization bounds for GNNs on any data distribution of attributed graphs. The proposed metrics compute the similarity between the structures of attributed graphs via a hierarchical optimal transport between computation trees. Our work extends and unites previous approaches which either derived theory only for graphs with no attributes, derived compact metrics under which GNNs are continuous but without separation power, or derived metrics under which GNNs are continuous and separate points but the space of graphs is not relatively compact, which prevents universal approximation and generalization analysis. |
|
GraphSubDetector: Time Series Subsequence Anomaly Detection via Density-Aware Adaptive Graph Neural Network | 2024-11-26 | ShowTime series subsequence anomaly detection is an important task in a large variety of real-world applications ranging from health monitoring to AIOps, and is challenging due to the following reasons: 1) how to effectively learn complex dynamics and dependencies in time series; 2) diverse and complicated anomalous subsequences as well as the inherent variance and noise of normal patterns; 3) how to determine the proper subsequence length for effective detection, which is a required parameter for many existing algorithms. In this paper, we present a novel approach to subsequence anomaly detection, namely GraphSubDetector. First, it adaptively learns the appropriate subsequence length with a length selection mechanism that highlights the characteristics of both normal and anomalous patterns. Second, we propose a density-aware adaptive graph neural network (DAGNN), which can generate further robust representations against variance of normal data for anomaly detection by message passing between subsequences. The experimental results demonstrate the effectiveness of the proposed algorithm, which achieves superior performance on multiple time series anomaly benchmark datasets compared to state-of-the-art algorithms. |
|
Depth-PC: A Visual Servo Framework Integrated with Cross-Modality Fusion for Sim2Real Transfer | 2024-11-26 | ShowVisual servo techniques guide robotic motion using visual information to accomplish manipulation tasks, requiring high precision and robustness against noise. Traditional methods often require prior knowledge and are susceptible to external disturbances. Learning-driven alternatives, while promising, frequently struggle with the scarcity of training data and fall short in generalization. To address these challenges, we propose a novel visual servo framework Depth-PC that leverages simulation training and exploits semantic and geometric information of keypoints from images, enabling zero-shot transfer to real-world servo tasks. Our framework focuses on the servo controller which intertwines keypoint feature queries and relative depth information. Subsequently, the fused features from these two modalities are then processed by a Graph Neural Network to establish geometric and semantic correspondence between keypoints and update the robot state. Through simulation and real-world experiments, our approach demonstrates superior convergence basin and accuracy compared to state-of-the-art methods, fulfilling the requirements for robotic servo tasks while enabling zero-shot application to real-world scenarios. In addition to the enhancements achieved with our proposed framework, we have also substantiated the efficacy of cross-modality feature fusion within the realm of servo tasks. |
|
ScaleNet: Scale Invariance Learning in Directed Graphs | 2024-11-26 | ShowGraph Neural Networks (GNNs) have advanced relational data analysis but lack invariance learning techniques common in image classification. In node classification with GNNs, it is actually the ego-graph of the center node that is classified. This research extends the scale invariance concept to node classification by drawing an analogy to image processing: just as scale invariance being used in image classification to capture multi-scale features, we propose the concept of |
Scale...Scale invariance in node classification is demonstrated and applied in graph transformation to develop ScaleNet, which achieves state-of-the-art performance on both homophilic and heterophilic directed graphs |
X-MeshGraphNet: Scalable Multi-Scale Graph Neural Networks for Physics Simulation | 2024-11-26 | ShowGraph Neural Networks (GNNs) have gained significant traction for simulating complex physical systems, with models like MeshGraphNet demonstrating strong performance on unstructured simulation meshes. However, these models face several limitations, including scalability issues, requirement for meshing at inference, and challenges in handling long-range interactions. In this work, we introduce X-MeshGraphNet, a scalable, multi-scale extension of MeshGraphNet designed to address these challenges. X-MeshGraphNet overcomes the scalability bottleneck by partitioning large graphs and incorporating halo regions that enable seamless message passing across partitions. This, combined with gradient aggregation, ensures that training across partitions is equivalent to processing the entire graph at once. To remove the dependency on simulation meshes, X-MeshGraphNet constructs custom graphs directly from CAD files by generating uniform point clouds on the surface or volume of the object and connecting k-nearest neighbors. Additionally, our model builds multi-scale graphs by iteratively combining coarse and fine-resolution point clouds, where each level refines the previous, allowing for efficient long-range interactions. Our experiments demonstrate that X-MeshGraphNet maintains the predictive accuracy of full-graph GNNs while significantly improving scalability and flexibility. This approach eliminates the need for time-consuming mesh generation at inference, offering a practical solution for real-time simulation across a wide range of applications. The code for reproducing the results presented in this paper is available through NVIDIA Modulus: github.com/NVIDIA/modulus/tree/main/examples/cfd/xaeronet. |
|
Contrastive Graph Condensation: Advancing Data Versatility through Self-Supervised Learning | 2024-11-26 | ShowWith the increasing computation of training graph neural networks (GNNs) on large-scale graphs, graph condensation (GC) has emerged as a promising solution to synthesize a compact, substitute graph of the large-scale original graph for efficient GNN training. However, existing GC methods predominantly employ classification as the surrogate task for optimization, thus excessively relying on node labels and constraining their utility in label-sparsity scenarios. More critically, this surrogate task tends to overfit class-specific information within the condensed graph, consequently restricting the generalization capabilities of GC for other downstream tasks. To address these challenges, we introduce Contrastive Graph Condensation (CTGC), which adopts a self-supervised surrogate task to extract critical, causal information from the original graph and enhance the cross-task generalizability of the condensed graph. Specifically, CTGC employs a dual-branch framework to disentangle the generation of the node attributes and graph structures, where a dedicated structural branch is designed to explicitly encode geometric information through nodes' positional embeddings. By implementing an alternating optimization scheme with contrastive loss terms, CTGC promotes the mutual enhancement of both branches and facilitates high-quality graph generation through the model inversion technique. Extensive experiments demonstrate that CTGC excels in handling various downstream tasks with a limited number of labels, consistently outperforming state-of-the-art GC methods. |
|
Limeade: Let integer molecular encoding aid | 2024-11-25 | ShowMixed-integer programming (MIP) is a well-established framework for computer-aided molecular design (CAMD). By precisely encoding the molecular space and score functions, e.g., a graph neural network, the molecular design problem is represented and solved as an optimization problem, the solution of which corresponds to a molecule with optimal score. However, both the extremely large search space and complicated scoring process limit the use of MIP-based CAMD to specific and tiny problems. Moreover, optimal molecule may not be meaningful in practice if scores are imperfect. Instead of pursuing optimality, this paper exploits the ability of MIP in molecular generation and proposes Limeade as an end-to-end tool from real-world needs to feasible molecules. Beyond the basic constraints for structural feasibility, Limeade supports inclusion and exclusion of SMARTS patterns, automating the process of interpreting and formulating chemical requirements to mathematical constraints. |
32 pages, 2 figures |
TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs | 2024-11-25 | ShowText-Attributed Graphs (TAGs) augment graph structures with natural language descriptions, facilitating detailed depictions of data and their interconnections across various real-world settings. However, existing TAG datasets predominantly feature textual information only at the nodes, with edges typically represented by mere binary or categorical attributes. This lack of rich textual edge annotations significantly limits the exploration of contextual relationships between entities, hindering deeper insights into graph-structured data. To address this gap, we introduce Textual-Edge Graphs Datasets and Benchmark (TEG-DB), a comprehensive and diverse collection of benchmark textual-edge datasets featuring rich textual descriptions on nodes and edges. The TEG-DB datasets are large-scale and encompass a wide range of domains, from citation networks to social networks. In addition, we conduct extensive benchmark experiments on TEG-DB to assess the extent to which current techniques, including pre-trained language models, graph neural networks, and their combinations, can utilize textual node and edge information. Our goal is to elicit advancements in textual-edge graph research, specifically in developing methodologies that exploit rich textual node and edge descriptions to enhance graph analysis and provide deeper insights into complex real-world networks. The entire TEG-DB project is publicly accessible as an open-source repository on Github, accessible at https://github.com/Zhuofeng-Li/TEG-Benchmark. |
Accep...Accepted by NeurIPS 2024 |
Graph neural networks with configuration cross-attention for tensor compilers | 2024-11-25 | ShowWith the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's |
|
Graph Neural Networks-based Parameter Design towards Large-Scale Superconducting Quantum Circuits for Crosstalk Mitigation | 2024-11-25 | ShowTo demonstrate supremacy of quantum computing, increasingly large-scale superconducting quantum computing chips are being designed and fabricated, sparking the demand for electronic design automation in pursuit of better efficiency and effectiveness. However, the complexity of simulating quantum systems poses a significant challenge to computer-aided design of quantum chips. Harnessing the scalability of graph neural networks (GNNs), we here propose a parameter designing algorithm for large-scale superconducting quantum circuits. The algorithm depends on the so-called 'three-stair scaling' mechanism, which comprises two neural-network models: an evaluator supervisedly trained on small-scale circuits for applying to medium-scale circuits, and a designer unsupervisedly trained on medium-scale circuits for applying to large-scale ones. We demonstrate our algorithm in mitigating quantum crosstalk errors, which are commonly present and closely related to the graph structures and parameter assignments of superconducting quantum circuits. Parameters for both single- and two-qubit gates are considered simultaneously. Numerical results indicate that the well-trained designer achieves notable advantages not only in efficiency but also in effectiveness, especially for large-scale circuits. For example, in superconducting quantum circuits consisting of around 870 qubits, the trained designer requires only 27 seconds to complete the frequency designing task which necessitates 90 minutes for the traditional Snake algorithm. More importantly, the crosstalk errors using our algorithm are only 51% of those produced by the Snake algorithm. Overall, this study initially demonstrates the advantages of applying graph neural networks to design parameters in quantum processors, and provides insights for systems where large-scale numerical simulations are challenging in electronic design automation. |
|
A Data-Driven Approach to Dataflow-Aware Online Scheduling for Graph Neural Network Inference | 2024-11-25 | ShowGraph Neural Networks (GNNs) have shown significant promise in various domains, such as recommendation systems, bioinformatics, and network analysis. However, the irregularity of graph data poses unique challenges for efficient computation, leading to the development of specialized GNN accelerator architectures that surpass traditional CPU and GPU performance. Despite this, the structural diversity of input graphs results in varying performance across different GNN accelerators, depending on their dataflows. This variability in performance due to differing dataflows and graph properties remains largely unexplored, limiting the adaptability of GNN accelerators. To address this, we propose a data-driven framework for dataflow-aware latency prediction in GNN inference. Our approach involves training regressors to predict the latency of executing specific graphs on particular dataflows, using simulations on synthetic graphs. Experimental results indicate that our regressors can predict the optimal dataflow for a given graph with up to 91.28% accuracy and a Mean Absolute Percentage Error (MAPE) of 3.78%. Additionally, we introduce an online scheduling algorithm that uses these regressors to enhance scheduling decisions. Our experiments demonstrate that this algorithm achieves up to |
Accep...Accepted for ASP-DAC 2025 |
CafkNet: GNN-Empowered Forward Kinematic Modeling for Cable-Driven Parallel Robots | 2024-11-25 | ShowCable-driven parallel robots (CDPRs) have gained significant attention due to their promising advantages. When deploying CDPRs in practice, the kinematic modeling is a key question. Unlike serial robots, CDPRs have a simple inverse kinematics problem but a complex forward kinematics (FK) issue. So, the development of accurate and efficient FK solvers has been a prominent research focus in CDPR applications. By observing the topology within CDPRs, in this paper, we propose a graph-based representation to model CDPRs and introduce CafkNet, a fast and general FK solving method, leveraging Graph Neural Network (GNN) to learn the topological structure and yield the real FK solutions with superior generality, high accuracy, and low time cost. CafkNet is extensively tested on 3D and 2D CDPRs in different configurations, both in simulators and real scenarios. The results demonstrate its ability to learn CDPRs' internal topology and accurately solve the FK problem. Then, the zero-shot generalization from one configuration to another is validated. Also, the sim2real gap can be bridged by CafkNet using both simulation and real-world data. To the best of our knowledge, it is the first study that employs the GNN to solve the FK problem for CDPRs. |
The 2...The 2024 IEEE International Conference on Robotics and Biomimetics (IEEE ROBIO 2024). Bangkok, Thailand, December 10-14 2024. Videos and codes are available at https://sites.google.com/view/cafknet/site |
Graph Adapter of EEG Foundation Models for Parameter Efficient Fine Tuning | 2024-11-25 | ShowIn diagnosing mental diseases from electroencephalography (EEG) data, neural network models such as Transformers have been employed to capture temporal dynamics. Additionally, it is crucial to learn the spatial relationships between EEG sensors, for which Graph Neural Networks (GNNs) are commonly used. However, fine-tuning large-scale complex neural network models simultaneously to capture both temporal and spatial features increases computational costs due to the more significant number of trainable parameters. It causes the limited availability of EEG datasets for downstream tasks, making it challenging to fine-tune large models effectively. We propose EEG-GraphAdapter (EGA), a parameter-efficient fine-tuning (PEFT) approach to address these challenges. EGA is integrated into pre-trained temporal backbone models as a GNN-based module and fine-tuned itself alone while keeping the backbone model parameters frozen. This enables the acquisition of spatial representations of EEG signals for downstream tasks, significantly reducing computational overhead and data requirements. Experimental evaluations on healthcare-related downstream tasks of Major Depressive Disorder and Abnormality Detection demonstrate that our EGA improves performance by up to 16.1% in the F1-score compared with the backbone BENDR model. |
Under review |
DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs | 2024-11-25 | ShowAttention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train AT-GNNs on GPUs due to their intricate computation patterns. The execution of AT-GNN operations without kernel fusion results in heavy data movement and significant kernel launch overhead, while fixed thread scheduling in existing GNN kernel fusion strategies leads to sub-optimal performance, redundant computation and unbalanced workload. To address these challenges, we propose a dynamic kernel fusion framework, DF-GNN, for the AT-GNN family. DF-GNN introduces a dynamic bi-level thread scheduling strategy, enabling flexible adjustments to thread scheduling while retaining the benefits of shared memory within the fused kernel. DF-GNN tailors specific thread scheduling for operations in AT-GNNs and considers the performance bottleneck shift caused by the presence of super nodes. Additionally, DF-GNN is integrated with the PyTorch framework for high programmability. Evaluations across diverse GNN models and multiple datasets reveal that DF-GNN surpasses existing GNN kernel optimization works like cuGraph and dgNN, with speedups up to |
|
Federated Hypergraph Learning: Hyperedge Completion with Local Differential Privacy | 2024-11-25 | ShowAs the volume and complexity increase, graph-structured data commonly need to be split and stored across distributed systems. To enable data mining on subgraphs within these distributed systems, federated graph learning has been proposed, allowing collaborative training of Graph Neural Networks (GNNs) across clients without sharing raw node features. However, when dealing with graph structures that involve high-order relationships between nodes, known as hypergraphs, existing federated graph learning methods are less effective. In this study, we introduce FedHGL, an innovative federated hypergraph learning algorithm. FedHGL is designed to collaboratively train a comprehensive hypergraph neural network across multiple clients, facilitating mining tasks on subgraphs of a hypergraph where relationships are not merely pairwise. To address the high-order information loss between subgraphs caused by distributed storage, we introduce a pre-propagation hyperedge completion operation before the federated training process. In this pre-propagation step, cross-client feature aggregation is performed and distributed at the central server to ensure that this information can be utilized by the clients. Furthermore, by incorporating local differential privacy (LDP) mechanisms, we ensure that the original node features are not disclosed during this aggregation process. Experimental results on seven real-world datasets confirm the effectiveness of our approach and demonstrate its performance advantages over traditional federated graph learning methods. |
|
Towards a General Recipe for Combinatorial Optimization with Multi-Filter GNNs | 2024-11-24 | ShowGraph neural networks (GNNs) have achieved great success for a variety of tasks such as node classification, graph classification, and link prediction. However, the use of GNNs (and machine learning more generally) to solve combinatorial optimization (CO) problems is much less explored. Here, we introduce GCON, a novel GNN architecture that leverages a complex filter bank and localized attention mechanisms to solve CO problems on graphs. We show how our method differentiates itself from prior GNN-based CO solvers and how it can be effectively applied to the maximum cut, minimum dominating set, and maximum clique problems in a unsupervised learning setting. GCON is competitive across all tasks and consistently outperforms other specialized GNN-based approaches, and is on par with the powerful Gurobi solver on the max-cut problem. We provide an open-source implementation of our work at https://github.com/WenkelF/copt. |
In Pr...In Proceedings of the Third Learning on Graphs Conference (LoG 2024, Oral); 20 pages, 2 figures |
Bias-Free Sentiment Analysis through Semantic Blinding and Graph Neural Networks | 2024-11-24 | ShowThis paper introduces the Semantic Propagation Graph Neural Network (SProp GNN), a machine learning sentiment analysis (SA) architecture that relies exclusively on syntactic structures and word-level emotional cues to predict emotions in text. By semantically blinding the model to information about specific words, it is robust to biases such as political or gender bias that have been plaguing previous machine learning-based SA systems. The SProp GNN shows performance superior to lexicon-based alternatives such as VADER and EmoAtlas on two different prediction tasks, and across two languages. Additionally, it approaches the accuracy of transformer-based models while significantly reducing bias in emotion prediction tasks. By offering improved explainability and reducing bias, the SProp GNN bridges the methodological gap between interpretable lexicon approaches and powerful, yet often opaque, deep learning models, offering a robust tool for fair and effective emotion analysis in understanding human behavior through text. |
|
TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning | 2024-11-23 | ShowRecently, Temporal Graph Neural Networks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that significantly compromise the accuracy of TGNNs: (1) models are supervised by inferior interactions, and (2) noisy input induces high variance in the aggregated messages. However, current TGNN denoising techniques do not consider the diverse and dynamic noise pattern of each node. In addition, they also suffer from the excessive mini-batch generation overheads caused by traversing more neighbors. We believe the remedy for fast and accurate TGNNs lies in temporal adaptive sampling. In this work, we propose TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability. TASER adapts its mini-batch selection based on training dynamics and temporal neighbor selection based on the contextual, structural, and temporal properties of past interactions. To alleviate the bottleneck in mini-batch generation, TASER implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache. We evaluate the performance of TASER using two state-of-the-art backbone TGNNs. On five popular datasets, TASER outperforms the corresponding baselines by an average of 2.3% in Mean Reciprocal Rank (MRR) while achieving an average of 5.1x speedup in training time. |
IPDPS 2024 |
Adaptive Least Mean pth Power Graph Neural Networks | 2024-11-23 | ShowIn the presence of impulsive noise, and missing observations, accurate online prediction of time-varying graph signals poses a crucial challenge in numerous application domains. We propose the Adaptive Least Mean |
|
A GAN Approach for Node Embedding in Heterogeneous Graphs Using Subgraph Sampling | 2024-11-23 | ShowGraph neural networks (GNNs) face significant challenges with class imbalance, leading to biased inference results. To address this issue in heterogeneous graphs, we propose a novel framework that combines Graph Neural Network (GNN) and Generative Adversarial Network (GAN) to enhance classification for underrepresented node classes. The framework incorporates an advanced edge generation and selection module, enabling the simultaneous creation of synthetic nodes and edges through adversarial learning. Unlike previous methods, which predominantly focus on homogeneous graphs due to the difficulty of representing heterogeneous graph structures in matrix form, this approach is specifically designed for heterogeneous data. Existing solutions often rely on pre-trained models to incorporate synthetic nodes, which can lead to optimization inconsistencies and mismatches in data representation. Our framework avoids these pitfalls by generating data that aligns closely with the inherent graph topology and attributes, ensuring a more cohesive integration. Evaluations on multiple real-world datasets demonstrate the method's superiority over baseline models, particularly in tasks focused on identifying minority node classes, with notable improvements in performance metrics such as F-score and AUC-PRC score. These findings highlight the potential of this approach for addressing critical challenges in the field. |
|
TANGNN: a Concise, Scalable and Effective Graph Neural Networks with Top-m Attention Mechanism for Graph Representation Learning | 2024-11-23 | ShowIn the field of deep learning, Graph Neural Networks (GNNs) and Graph Transformer models, with their outstanding performance and flexible architectural designs, have become leading technologies for processing structured data, especially graph data. Traditional GNNs often face challenges in capturing information from distant vertices effectively. In contrast, Graph Transformer models are particularly adept at managing long-distance node relationships. Despite these advantages, Graph Transformer models still encounter issues with computational and storage efficiency when scaled to large graph datasets. To address these challenges, we propose an innovative Graph Neural Network (GNN) architecture that integrates a Top-m attention mechanism aggregation component and a neighborhood aggregation component, effectively enhancing the model's ability to aggregate relevant information from both local and extended neighborhoods at each layer. This method not only improves computational efficiency but also enriches the node features, facilitating a deeper analysis of complex graph structures. Additionally, to assess the effectiveness of our proposed model, we have applied it to citation sentiment prediction, a novel task previously unexplored in the GNN field. Accordingly, we constructed a dedicated citation network, ArXivNet. In this dataset, we specifically annotated the sentiment polarity of the citations (positive, neutral, negative) to enable in-depth sentiment analysis. Our approach has shown superior performance across a variety of tasks including vertex classification, link prediction, sentiment prediction, graph regression, and visualization. It outperforms existing methods in terms of effectiveness, as demonstrated by experimental results on multiple datasets. |
The c...The code and ArXivNet dataset are available at https://github.com/ejwww/TANGNN |
Enriching GNNs with Text Contextual Representations for Detecting Disinformation Campaigns on Social Media | 2024-11-23 | ShowDisinformation on social media poses both societal and technical challenges, requiring robust detection systems. While previous studies have integrated textual information into propagation networks, they have yet to fully leverage the advancements in Transformer-based language models for high-quality contextual text representations. This work addresses this gap by incorporating Transformer-based textual features into Graph Neural Networks (GNNs) for fake news detection. We demonstrate that contextual text representations enhance GNN performance, achieving 33.8% relative improvement in Macro F1 over models without textual features and 9.3% over static text representations. We further investigate the impact of different feature sources and the effects of noisy data augmentation. We expect our methodology to open avenues for further research, and we made code publicly available. |
Work ...Work still in progress. Accepted as Extended Abstract Poster at LoG Conference 2024 |
GeoScatt-GNN: A Geometric Scattering Transform-Based Graph Neural Network Model for Ames Mutagenicity Prediction | 2024-11-22 | ShowThis paper tackles the pressing challenge of mutagenicity prediction by introducing three ground-breaking approaches. First, it showcases the superior performance of 2D scattering coefficients extracted from molecular images, compared to traditional molecular descriptors. Second, it presents a hybrid approach that combines geometric graph scattering (GGS), Graph Isomorphism Networks (GIN), and machine learning models, achieving strong results in mutagenicity prediction. Third, it introduces a novel graph neural network architecture, MOLG3-SAGE, which integrates GGS node features into a fully connected graph structure, delivering outstanding predictive accuracy. Experimental results on the ZINC dataset demonstrate significant improvements, emphasizing the effectiveness of blending 2D and geometric scattering techniques with graph neural networks. This study illustrates the potential of GNNs and GGS for mutagenicity prediction, with broad implications for drug discovery and chemical safety assessment. |
|
Lie-Equivariant Quantum Graph Neural Networks | 2024-11-22 | ShowDiscovering new phenomena at the Large Hadron Collider (LHC) involves the identification of rare signals over conventional backgrounds. Thus binary classification tasks are ubiquitous in analyses of the vast amounts of LHC data. We develop a Lie-Equivariant Quantum Graph Neural Network (Lie-EQGNN), a quantum model that is not only data efficient, but also has symmetry-preserving properties. Since Lorentz group equivariance has been shown to be beneficial for jet tagging, we build a Lorentz-equivariant quantum GNN for quark-gluon jet discrimination and show that its performance is on par with its classical state-of-the-art counterpart LorentzNet, making it a viable alternative to the conventional computing paradigm. |
10 pa...10 pages, 5 figures, accepted to the Machine Learning with New Compute Paradigms (MLNCP) Workshop at NeurIPS 2024 |
Generalizable data-driven turbulence closure modeling on unstructured grids with differentiable physics | 2024-11-22 | ShowDifferentiable physical simulators are proving to be valuable tools for developing data-driven models in computational fluid dynamics (CFD). These simulators enable end-to-end training of machine learning (ML) models embedded within CFD solvers. This paradigm enables novel algorithms which combine the generalization power and low cost of physics-based simulations with the flexibility and automation of deep learning methods. In this study, we introduce a framework for embedding deep learning models within a generic finite element solver to solve the Navier-Stokes equations, specifically applying this approach to learn a subgrid scale closure with a graph neural network (GNN). We validate our method for flow over a backwards-facing step and test its performance on novel geometries, demonstrating the ability to generalize to novel geometries without sacrificing stability. Additionally, we show that our GNN-based closure model may be learned in a data-limited scenario by interpreting closure modeling as a solver-constrained optimization. Our end-to-end learning paradigm demonstrates a viable pathway for physically consistent and generalizable data-driven closure modeling across complex geometries. |
|
Financial Fraud Detection using Jump-Attentive Graph Neural Networks | 2024-11-22 | ShowAs the availability of financial services online continues to grow, the incidence of fraud has surged correspondingly. Fraudsters continually seek new and innovative ways to circumvent the detection algorithms in place. Traditionally, fraud detection relied on rule-based methods, where rules were manually created based on transaction data features. However, these techniques soon became ineffective due to their reliance on manual rule creation and their inability to detect complex data patterns. Today, a significant portion of the financial services sector employs various machine learning algorithms, such as XGBoost, Random Forest, and neural networks, to model transaction data. While these techniques have proven more efficient than rule-based methods, they still fail to capture interactions between different transactions and their interrelationships. Recently, graph-based techniques have been adopted for financial fraud detection, leveraging graph topology to aggregate neighborhood information of transaction data using Graph Neural Networks (GNNs). Despite showing improvements over previous methods, these techniques still struggle to keep pace with the evolving camouflaging tactics of fraudsters and suffer from information loss due to over-smoothing. In this paper, we propose a novel algorithm that employs an efficient neighborhood sampling method, effective for camouflage detection and preserving crucial feature information from non-similar nodes. Additionally, we introduce a novel GNN architecture that utilizes attention mechanisms and preserves holistic neighborhood information to prevent information loss. We test our algorithm on financial data to show that our method outperforms other state-of-the-art graph algorithms. |
Inter...International Conference on Machine Learning and Applications 2024 |
What Do GNNs Actually Learn? Towards Understanding their Representations | 2024-11-22 | ShowIn recent years, graph neural networks (GNNs) have achieved great success in the field of graph representation learning. Although prior work has shed light on the expressiveness of those models (\ie whether they can distinguish pairs of non-isomorphic graphs), it is still not clear what structural information is encoded into the node representations that are learned by those models. In this paper, we address this gap by studying the node representations learned by four standard GNN models. We find that some models produce identical representations for all nodes, while the representations learned by other models are linked to some notion of walks of specific length that start from the nodes. We establish Lipschitz bounds for these models with respect to the number of (normalized) walks. Additionally, we investigate the influence of node features on the learned representations. We find that if the initial representations of all nodes point in the same direction, the representations learned at the |
|
Machine Learning for Practical Quantum Error Mitigation | 2024-11-22 | ShowQuantum computers progress toward outperforming classical supercomputers, but quantum errors remain their primary obstacle. The key to overcoming errors on near-term devices has emerged through the field of quantum error mitigation, enabling improved accuracy at the cost of additional run time. Here, through experiments on state-of-the-art quantum computers using up to 100 qubits, we demonstrate that without sacrificing accuracy machine learning for quantum error mitigation (ML-QEM) drastically reduces the cost of mitigation. We benchmark ML-QEM using a variety of machine learning models -- linear regression, random forests, multi-layer perceptrons, and graph neural networks -- on diverse classes of quantum circuits, over increasingly complex device-noise profiles, under interpolation and extrapolation, and in both numerics and experiments. These tests employ the popular digital zero-noise extrapolation method as an added reference. Finally, we propose a path toward scalable mitigation by using ML-QEM to mimic traditional mitigation methods with superior runtime efficiency. Our results show that classical machine learning can extend the reach and practicality of quantum error mitigation by reducing its overheads and highlight its broader potential for practical quantum computations. |
11 pa...11 pages, 7 figures (main text) + 9 pages, 4 figures (supplementary information) |
Can GNNs Learn Link Heuristics? A Concise Review and Evaluation of Link Prediction Methods | 2024-11-22 | ShowThis paper explores the ability of Graph Neural Networks (GNNs) in learning various forms of information for link prediction, alongside a brief review of existing link prediction methods. Our analysis reveals that GNNs cannot effectively learn structural information related to the number of common neighbors between two nodes, primarily due to the nature of set-based pooling of the neighborhood aggregation scheme. Also, our extensive experiments indicate that trainable node embeddings can improve the performance of GNN-based link prediction models. Importantly, we observe that the denser the graph, the greater such the improvement. We attribute this to the characteristics of node embeddings, where the link state of each link sample could be encoded into the embeddings of nodes that are involved in the neighborhood aggregation of the two nodes in that link sample. In denser graphs, every node could have more opportunities to attend the neighborhood aggregation of other nodes and encode states of more link samples to its embedding, thus learning better node embeddings for link prediction. Lastly, we demonstrate that the insights gained from our research carry important implications in identifying the limitations of existing link prediction methods, which could guide the future development of more robust algorithms. |
|
Enhancing Link Prediction with Fuzzy Graph Attention Networks and Dynamic Negative Sampling | 2024-11-22 | ShowLink prediction is crucial for understanding complex networks but traditional Graph Neural Networks (GNNs) often rely on random negative sampling, leading to suboptimal performance. This paper introduces Fuzzy Graph Attention Networks (FGAT), a novel approach integrating fuzzy rough sets for dynamic negative sampling and enhanced node feature aggregation. Fuzzy Negative Sampling (FNS) systematically selects high-quality negative edges based on fuzzy similarities, improving training efficiency. FGAT layer incorporates fuzzy rough set principles, enabling robust and discriminative node representations. Experiments on two research collaboration networks demonstrate FGAT's superior link prediction accuracy, outperforming state-of-the-art baselines by leveraging the power of fuzzy rough sets for effective negative sampling and node feature learning. |
5 pages |