ICML 2024 Paper Summaries

This repository contains a collection of summaries for papers presented at the International Conference on Machine Learning (ICML) 2024. Each summary includes the following elements:

Title: The title of the paper.
Authors: The authors who contributed to the paper.
Affiliations: The institutions or organizations with which the authors are associated.
TL;DR: A concise summary generated by AI, providing an overview of the paper's key points and contributions.
Keywords: Key themes, technical methods, and application areas identified for each paper.

Note Most of the content, except for the titles and authors, has been generated using AI tools. While efforts have been made to ensure accuracy, there may be errors or inaccuracies.

Future Work The repository is currently organized as a list, and additional detailed organization and content refinement are planned. Please stay tuned for updates as we work on enhancing the structure and content of these summaries.

Papers List

Consistent Submodular Maximization (Poster)
- Authors: PAUL DUETTING, Federico Fusco, Silvio Lattanzi, Ashkan Norouzi-Fard, Morteza Zadimoghaddam
- Affiliations: Google Research, Sapienza University of Rome, Rome, Italy
- TL;DR: This paper investigates the problem of maximizing monotone submodular functions under cardinality constraints in a dynamic environment, focusing on maintaining a stable solution with bounded changes. The authors propose algorithms that balance consistency and approximation quality, demonstrating their effectiveness through experimental analysis.
- Keywords: submodular maximization, consistency constraints, approximation algorithms, data mining, machine learning, data summarization, stability of solutions, dynamic environments, algorithms with trade-offs between consistency and approximation quality
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality (Poster)
- Authors: Tri Dao, Albert Gu
- Affiliations: Department of Computer Science, Princeton University, Machine Learning Department, Carnegie Mellon University
- TL;DR: This paper establishes theoretical connections between Transformers and structured state-space models (SSMs), leading to the development of a new architecture, Mamba-2, which is significantly faster and competitive in language modeling. The authors aim to enhance the efficiency of SSMs by leveraging optimizations originally designed for Transformers.
- Keywords: Transformers, State-Space Models, Language Modeling, Structured State-Space Models (SSMs), Linear Attention (LA), Structured Masked Attention (SMA), Efficiency issues in Transformers, Scaling in sequence length, Mamba-2 architecture, Theoretical connections between SSMs and attention, Semiseparable matrices, Dual forms, Tensor contractions
MusicRL: Aligning Music Generation to Human Preferences (Poster)
- Authors: Geoffrey Cideron, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, Zalán Borsos, Brian McWilliams, Victor Ungureanu, Olivier Bachem, Olivier Pietquin, Matthieu Geist, Léonard Hussenot, Neil Zeghidour, Andrea Agostinelli
- Affiliations: Google DeepMind, Kyutai, Cohere
- TL;DR: The study introduces MusicRL, a music generation system fine-tuned using human feedback to enhance the subjective appreciation of generated music. The results indicate that MusicRL-R and MusicRL-U outperform baseline models, highlighting the importance of human involvement in music generation model fine-tuning.
- Keywords: music generation, human feedback, text-to-music models, reinforcement learning, autoregressive models, sequence-level rewards, subjectivity in musical appreciation, challenges in supervised training, MusicRL, MusicRL-R, MusicRL-U, MusicRL-RU, 300,000 pairwise preferences dataset, MusicLM, audio tokens, text adherence
Relaxing the Accurate Imputation Assumption in Doubly Robust Learning for Debiased Collaborative Filtering (Spotlight Poster)
- Authors: Haoxuan Li, Chunyuan Zheng, Shuyi Wang, Kunhan Wu, Eric Wang, Peng Wu, zhi geng, Xu Chen, Xiao-Hua Zhou
- Affiliations: Peking University, University of Pennsylvania, Renmin University of China, Beijing Technology and Business University, Carnegie Mellon University, Zhejiang University
- TL;DR: This paper proposes novel doubly robust estimators for debiasing collaborative filtering in recommender systems, addressing the challenges of inaccurate pseudo-labelings and sampling selection bias. The proposed methods demonstrate improved performance over state-of-the-art approaches on various datasets.
- Keywords: Recommender systems, Debiasing methods, Doubly robust estimators, Propensity reconstruction learning, Attention mechanism, Collaborative filtering, Sampling selection bias, Inaccurate pseudo-labelings, Unbiased learning, Novel estimators, Improved performance on datasets, KUAIREC (dataset), Semi-synthetic datasets, Real-world datasets
Kernel-Based Evaluation of Conditional Biological Sequence Models (Poster)
- Authors: Pierre Glaser, Steffan Paul, Alissa M. Hummer, Charlotte Deane, Debora Marks, Alan Amin
- Affiliations: Department of Statistics, University of Oxford, Oxford, UK, Systems Biology, Harvard Medical School, Boston, USA, Harvard Medical School, Broad Institute, Boston, USA, Systems Biology, Harvard Medical School, Boston, USA; Department of Statistics, University of Oxford, Oxford, UK, Courant Institute, New York University, New York, USA, Gatsby Computational Neuroscience Unit, London, UK
- TL;DR: This study introduces kernel-based tools for evaluating conditional sequence models, focusing on a new metric called Augmented Conditional Maximum Mean Discrepancy (ACMMD) to assess model fit and reliability. The approach is demonstrated through the analysis of the ProteinMPNN model, revealing its limitations in fitting data across various protein families and allowing for hyperparameter tuning to improve performance.
- Keywords: conditional sequence models, computational biology, protein design, Augmented Conditional Maximum Mean Discrepancy (ACMMD), model evaluation, genomics, protein design, model accuracy, model reliability, high-dimensional discrete-valued sequences, new evaluation metrics, hyperparameter tuning
Fair Off-Policy Learning from Observational Data (Poster)
- Authors: Dennis Frauen, Valentyn Melnychuk, Stefan Feuerriegel
- Affiliations: LMU Munich; Munich Center for Machine Learning
- TL;DR: This paper presents a novel framework for fair off-policy learning from observational data, addressing the challenges of ensuring fairness despite potentially discriminatory historical policies. The proposed neural network-based approach, FairPol, aims to learn optimal decision rules while providing theoretical guarantees and demonstrating effectiveness through extensive experiments.
- Keywords: Fairness in algorithmic decision-making, Off-policy learning, Neural network-based framework, Fair representation learning, Algorithmic decision-making, Social applications, Systematic discrimination, Bias in observational data, Optimal policies under fairness notions, Generalization bounds, Simulated data, Real-world data
Nash Learning from Human Feedback (Spotlight Poster)
- Authors: REMI MUNOS, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Côme Fiegel, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel Mankowitz, Doina Precup, Bilal Piot
- Affiliations: Google DeepMind, ENSAE Paris
- TL;DR: This study introduces Nash Learning from Human Feedback (NLHF) as a novel approach for fine-tuning large language models (LLMs) using pairwise human feedback, aiming to enhance alignment with human preferences. The proposed method, supported by a new algorithm called Nash-MD, demonstrates effective policy generation and addresses limitations of traditional reward models.
- Keywords: Reinforcement Learning from Human Feedback (RLHF), Large Language Models (LLMs), Pairwise Preference Model, Nash Learning, Nash-MD, Mirror Descent, Text Summarization, Alignment with Human Preferences, Limitations of Reward Models, Novel Algorithmic Solution, Regularized Nash Equilibrium, Bradley-Terry Model, Elo Ranking System
Position: A Call for Embodied AI (Poster)
- Authors: Giuseppe Paolo, Jonas Gonzalez-Billandon, Balázs Kégl
- Affiliations: Noah’s Ark Lab, Huawei Technologies France, Paris, France, London Research Center, London, UK
- TL;DR: The paper proposes Embodied AI (E-AI) as a crucial step towards achieving Artificial General Intelligence (AGI), emphasizing the need for a theoretical framework that incorporates perception, action, memory, and learning. It highlights the limitations of current AI technologies, particularly Large Language Models, and calls for research focused on creating E-AI agents capable of effective interaction with humans and other intelligent entities.
- Keywords: Embodied AI, Artificial General Intelligence (AGI), Large Language Models (LLMs), Cognitive architectures, active inference, Robotics, neuroscience, natural language processing, Static learning, alignment issues, confabulation, Theoretical framework for E-AI, guidelines for future research, AI communication, collaboration, coexistence
Calibration Bottleneck: Over-compressed Representations are Less Calibratable (Poster)
- Authors: Deng-Bao Wang, Min-Ling Zhang
- Affiliations: School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Lab. of Computer Network and Information Integration (Southeast University), MOE, China
- TL;DR: This study investigates the calibratability of deep neural networks, revealing that over-compression in representation layers hinders calibration. It proposes a new training method, progressively layer-peeled training (PLP), which enhances model calibration and maintains competitive predictive performance.
- Keywords: model calibratability, uncertainty calibration, weight decay regularizer, temperature scaling (TS), histogram binning (HB), progressively layer-peeled training (PLP), deep neural networks (DNNs), high-dimensional prediction tasks, safety-critical decision-making, miscalibration of model confidence, over-compression of representation layers, improved model calibration, competitive predictive performance
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts (Poster)
- Authors: Jianan Zhou, Zhiguang Cao, Yaoxin Wu, Wen Song, Yining Ma, Jie Zhang, Xu Chi
- Affiliations: School of Computing and Information Systems, Singapore Management University, Singapore, Institute of Marine Science and Technology, Shandong University, China, Department of Information Systems, Eindhoven University of Technology, The Netherlands, Singapore Institute of Manufacturing Technology (SIMTech), Agency for Science, Technology and Research (A*STAR), Singapore, College of Computing and Data Science, Nanyang Technological University, Singapore
- TL;DR: This paper presents MVMoE, a multi-task vehicle routing solver that utilizes a mixture-of-experts approach to enhance model capacity and achieve zero-shot generalization across various VRP variants. The proposed method demonstrates significant improvements in performance and efficiency compared to traditional neural solvers.
- Keywords: Vehicle Routing Problems (VRPs), Neural Solvers, Multi-Task Learning, Mixture-of-Experts (MoE), Hierarchical Gating, Reinforcement Learning (RL), Logistics, Transportation, Manufacturing, NP-hard problems, Zero-shot generalization, Out-of-distribution data, Unified neural solver, Enhanced model capacity, Empirical performance improvements
Automated Statistical Model Discovery with Language Models (Poster)
- Authors: Michael Li, Emily Fox, Noah Goodman
- Affiliations: Department of Computer Science, Stanford University; Department of Statistics, Stanford University; Chan Zuckerberg Biohub – San Francisco, Department of Computer Science, Stanford University; Department of Psychology, Stanford University, Department of Computer Science, Stanford University; Department of Statistics, Stanford University; Chan Zuckerberg Biohub – San Francisco; Department of Psychology, Stanford University
- TL;DR: This study presents a method for automated statistical model discovery using large language models, which iteratively propose and critique models. The approach successfully identifies models comparable to those designed by human experts while ensuring interpretability and flexibility in model selection.
- Keywords: Automated statistical model discovery, Large language models, Probabilistic programs, Box’s Loop, Probabilistic modeling, Searching over a vast space of models, domain-specific constraints, Models on par with human expert designed models, interpretable extensions of classic models
DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design (Poster)
- Authors: Samuel Garcin, James Doran, Shangmin Guo, Christopher Lucas, Stefano V. Albrecht
- Affiliations: Huawei, School of Informatics, University of Edinburgh
- TL;DR: This study investigates the impact of environment sampling on the zero-shot generalisation ability of reinforcement learning agents and introduces data-regularised environment design (DRED) to improve performance by minimizing overfitting and distributional shifts. The findings suggest that prioritizing levels based on value loss can enhance the agent's ability to generalize to new environments effectively.
- Keywords: Zero-shot generalisation, Reinforcement learning, Deep actor-critic architectures, Adaptive sampling strategies, Unsupervised environment design (UED), Autonomous agents, Environment settings, Generalisation to new environments, Overfitting, Distributional shift, Data-regularised environment design (DRED), Generative model for level generation, Mutual information, Training levels
Discovering Environments with XRM (Oral)
- Authors: Mohammad Pezeshki, Diane Bouchacourt, Mark Ibrahim, Nicolas Ballas, Pascal Vincent, David Lopez-Paz
- Affiliations: Mila at Université de Montréal; CIFAR, FAIR at Meta
- TL;DR: The paper introduces CROSS-RISK MINIMIZATION (XRM) as a method for automatic environment discovery to enhance out-of-distribution generalization in AI systems. XRM effectively trains twin networks to learn from training data while addressing the challenges of costly environment annotations and biases, achieving improved worst-group-accuracy.
- Keywords: Out-of-distribution (OOD) generalization, automatic environment discovery, CROSS-RISK MINIMIZATION (XRM), empirical risk minimization (ERM), group distributionally robust optimization (GroupDRO), AI systems, healthcare, finance, self-driving vehicles, Costly environment annotations, human annotator biases, spurious correlations, underrepresented groups in training data, Hyper-parameter tuning, oracle worst-group-accuracy
Model-based Reinforcement Learning for Confounded POMDPs (Poster)
- Authors: Mao Hong, Zhengling Qi, Yanxun Xu
- Affiliations: Department of Decision Sciences, George Washington University, Washington, DC, United States, Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, United States
- TL;DR: This study presents a model-based offline reinforcement learning algorithm for confounded POMDPs, establishing a novel identification result for action effects and developing a two-stage estimation procedure for off-policy evaluation. The findings demonstrate the efficiency of the proposed method under certain conditions, contributing to optimal policy learning in partially observable environments.
- Keywords: model-based reinforcement learning, confounded partially observable Markov decision processes (POMDPs), offline reinforcement learning (RL), nonparametric two-stage estimation, off-policy evaluation (OPE), conservative policy optimization, autonomous driving, healthcare, partial observability, confounding bias, offline data distribution, optimal policy learning, finite-sample upper bound on suboptimality
Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching (Poster)
- Authors: Yuchen Zhang, Tianle Zhang, Kai Wang, Ziyao Guo, Yuxuan Liang, Xavier Bresson, Wei Jin, Yang You
- Affiliations: Emory University, Hong Kong University of Science and Technology (Guangzhou), National University of Singapore
- TL;DR: This paper presents a novel approach to lossless graph condensation that enhances the performance of Graph Neural Networks by utilizing diverse supervision signals and an expanding window matching technique. The proposed method addresses the limitations of existing techniques, particularly in large-scale graph datasets, and demonstrates superior results through extensive experiments.
- Keywords: Graph condensation, Graph Neural Networks (GNNs), Trajectory matching, Curriculum learning, Expanding window matching, Large-scale graph datasets, Graph-related applications, Lossless condensation, Performance gap between condensed and original graphs, New method for lossless graph condensation, Knowledge extraction from expert trajectories, Citeseer
Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance (Poster)
- Authors: Chiraag Kaushik, Ran Liu, Chi-Heng Lin, Amrit Khera, Matthew Jin, Wenrui Ma, Vidya Muthukumar, Eva Dyer
- Affiliations: Georgia Institute of Technology, Georgia, the USA, Stanford University, California, the USA, Georgia Institute of Technology, Georgia, the USA; Samsung Research, None
- TL;DR: This study introduces the concept of spectral imbalance as a source of class disparities in classification models, highlighting that even balanced datasets can exhibit significant performance gaps across classes. The authors develop a theoretical framework to analyze these disparities and propose methods to mitigate the issue through data augmentation and encoder evaluation.
- Keywords: class bias, spectral imbalance, classification models, Gaussian mixture model, theoretical framework, data augmentation strategies, machine learning, pretrained models, class disparities, performance gaps across classes, class-dependent generalization, new framework for studying class-dependent generalization, insights into pretrained features
Position: Towards Unified Alignment Between Agents, Humans, and Environment (Poster)
- Authors: Zonghan Yang, an liu, Zijun Liu, Kaiming Liu, Fangzhou Xiong, Yile Wang, Zeyuan Yang, Qingyuan Hu, XinRui Chen, Zhenhe Zhang, Fuwen Luo, Zhicheng Guo, Peng Li, Yang Liu
- Affiliations: Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua University, Beijing, China, Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China, Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua University, Beijing, China; Jiangsu Collaborative Innovation Center for Language Competence, Jiangsu, China
- TL;DR: This study introduces the principles of Unified Alignment for Agents (UA2), advocating for the alignment of agents with human intentions, environmental dynamics, and self-constraints. The research demonstrates the importance of these principles through proof-of-concept studies and performance benchmarking in realistic environments, highlighting the need for improved general problem-solving abilities in autonomous agents.
- Keywords: Autonomous agents, Unified Alignment for Agents (UA2), Foundation models, Large Language Models (LLMs), Large Multimodal Models (LMMs), Web task automation, open-ended world exploration, interactive coding, robotic tasks, Limited efficacy in complex environments, neglected factors in agent benchmarks, Principles of UA2, proof-of-concept studies, performance benchmarking, WebShop, AI alignment, AI safety
Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing (Poster)
- Authors: Youwei Shu, Xi Xiao, Derui Wang, Yuxin Cao, Siji Chen, Minhui Xue, Linyi Li, Bo Li
- Affiliations: Shenzhen International Graduate School, Tsinghua University, University of Illinois Urbana-Champaign; University of Chicago, University of Illinois Urbana-Champaign; Simon Fraser University; University of Chicago, CSIRO’s Data61
- TL;DR: This study investigates the effects of Exponential Standard Gaussian and Exponential General Gaussian distributions on Randomized Smoothing and Double Sampling Randomized Smoothing, revealing that ESG distributions provide consistent certification while EGG significantly enhances certified accuracy, particularly in high-dimensional settings. The findings suggest potential improvements in robustness certification against adversarial examples.
- Keywords: Randomized Smoothing, Adversarial Examples, Robustness Certification, Exponential Standard Gaussian (ESG), Exponential General Gaussian (EGG), Double Sampling Randomized Smoothing (DSRS), Adversarial Attacks, Curse of Dimensionality, Certified Radius, Improved Certified Accuracy, ImageNet, ℓp Adversaries, Neyman-Pearson Lemma
InfoNet: Neural Estimation of Mutual Information without Test-Time Optimization (Oral)
- Authors: Zhengyang Hu, Song Kang, Qunsong Zeng, Kaibin Huang, Yanchao Yang
- Affiliations: Department of Electrical and Electronic Engineering, the University of Hong Kong; Musketeers Foundation Institute of Data Science, the University of Hong Kong, School of Information Science and Technology, University of Science and Technology of China; Work done as an intern at HKU, Department of Electrical and Electronic Engineering, the University of Hong Kong
- TL;DR: The study introduces InfoNet, a neural network designed for efficient mutual information estimation without the need for test-time optimization. It demonstrates improved performance and generalization across various data distributions, providing a balance between efficiency and accuracy.
- Keywords: Mutual Information Estimation, Real-Time Applications, Neural Networks, Attention Mechanism, Intelligent Behavior, Data Streams, Efficiency in Estimation, Test-Time Optimization, InfoNet, Efficiency-Accuracy Trade-off
Learning Associative Memories with Gradient Descent (Poster)
- Authors: Vivien Cabannnes, Berfin Simsek, Alberto Bietti
- Affiliations: Flatiron, Meta AI
- TL;DR: This study investigates the training dynamics of an associative memory model using gradient descent and cross-entropy loss, revealing insights into classification margins and the effects of token frequency imbalance. The findings highlight oscillatory behaviors during training and the implications of overparameterization and underparameterization in learning.
- Keywords: associative memory, training dynamics, deep learning, gradient descent, cross-entropy loss, softmax, imbalance in token frequencies, memory interferences, suboptimal memorization schemes, logarithmic growth of classification margins, oscillatory transitory regimes, token embeddings, linear layer, high-dimensional embedding vectors, Transformer models
WARM: On the Benefits of Weight Averaged Reward Models (Poster)
- Authors: Alexandre Rame, Nino Vieillard, Léonard Hussenot, Robert Dadashi, Geoffrey Cideron, Olivier Bachem, Johan Ferret
- Affiliations: None, Google DeepMind
- TL;DR: This study introduces Weight Averaged Reward Models (WARM) to address reward hacking in large language models by mitigating distribution shifts and preference inconsistencies. The proposed method enhances the quality and alignment of LLM predictions, achieving a significant win rate over traditional single reward model approaches.
- Keywords: Large Language Models, Human Preferences, Reinforcement Learning, Reinforcement Learning from Human Feedback (RLHF), Weight Averaged Reward Models (WARM), Conversational Assistants, Summarization Tasks, Reward Hacking, Distribution Shifts, Preference Inconsistencies, Improved Quality and Alignment of LLM Predictions, Efficiency in Reward Modeling, Reward Models (RMs), Proxy RM, Policy Drift
TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge (Poster)
- Authors: Young Kwon, Rui Li, Stylianos Venieris, Jagmohan Chauhan, Nicholas Lane, Cecilia Mascolo
- Affiliations: Department of Computer Science and Technology, University of Cambridge, United Kingdom, Samsung AI Center, Cambridge, United Kingdom, Department of Computer Science and Technology, University of Cambridge, United Kingdom; Samsung AI Center, Cambridge, United Kingdom, School of Electronics and Computer Science, University of Southampton, United Kingdom
- TL;DR: The study presents TinyTrain, an innovative on-device training approach that addresses the challenges of data scarcity and resource limitations in edge devices by selectively updating parts of the model. TinyTrain significantly improves training efficiency and accuracy while maintaining a minimal memory footprint, outperforming existing methods.
- Keywords: On-device training, User personalisation, Privacy, Task-adaptive sparse-update method, Fine-tuning, IoT devices, Microcontroller units (MCUs), Edge devices, Data scarcity, Limited memory and compute resources, Long training time, Accuracy loss, TinyTrain method, Reduced computation and memory footprint, Improved training efficiency, Deep neural networks (DNNs), Multiply-accumulate (MAC) count
Batch and match: black-box variational inference with a score-based divergence (Spotlight Poster)
- Authors: Diana Cai, Chirag Modi, Loucas Pillaud-Vivien, Charles Margossian, Robert Gower, David Blei, Lawrence Saul
- Affiliations: Center for Computational Mathematics, Flatiron Institute; CERMICS Laboratory, Ecole des Ponts ParisTech, Department of Statistics, Department of Computer Science, Columbia University, Center for Computational Mathematics, Flatiron Institute, Center for Computational Mathematics, Flatiron Institute; Center for Computational Astrophysics, Flatiron Institute
- TL;DR: This paper introduces a novel approach to black-box variational inference called batch and match (BaM), which utilizes a score-based divergence to improve convergence speed and stability compared to traditional methods. The authors demonstrate that BaM converges exponentially quickly to the target mean and covariance, outperforming existing implementations in terms of gradient evaluations.
- Keywords: black-box variational inference, probabilistic modeling, score-based divergence, Gaussian variational families, stochastic evidence lower bound (ELBO), hierarchical models, deep generative models, high variance of gradient estimates, sensitivity to hyperparameters, convergence issues in high-dimensional problems, batch and match (BaM) method, closed-form proximal update, exponential convergence to target mean and covariance
Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations (Poster)
- Authors: Yongshuo Zong, Tingyang Yu, Ruchika Chavhan, Bingchen Zhao, Timothy Hospedales
- Affiliations: University of Edinburgh, EPFL
- TL;DR: This study investigates the vulnerability of large language and vision-language models to adversarial permutations in multiple-choice question answering, revealing significant accuracy degradation. The findings highlight the need for a deeper understanding of model robustness before deploying these systems in real-world applications.
- Keywords: robustness of language models, vulnerability in multiple-choice question answering (MCQA), permutation sensitivity, adversarial permutation, education, recruitment exams, model brittleness, accuracy degradation due to answer permutation, empirical demonstration of vulnerabilities, performance metrics analysis, MMLU dataset, large language models (LLMs), vision-language models (VLLMs), multiple-choice question answering (MCQA)
Hybrid$^2$ Neural ODE Causal Modeling and an Application to Glycemic Response (Oral)
- Authors: Junyi Zou, Matthew Levine, Dessi Zaharieva, Ramesh Johari, Emily Fox
- Affiliations: Department of Management Science and Engineering, Stanford University, Broad Institute of MIT and Harvard, Institute for Computational and Mathematical Engineering, Stanford University, Department of Pediatrics, Stanford University, Department of Statistics and Department of Computer Science, Stanford University; Chan Zuckerberg Biohub – San Francisco
- TL;DR: This study presents a hybrid model that combines mechanistic ODE-based dynamics with neural network components to improve causal modeling and predictive performance in glucose dynamics post-exercise for individuals with type 1 diabetes. The proposed approach effectively addresses the challenges of causal grounding in hybrid models while achieving state-of-the-art results.
- Keywords: Hybrid models, Causal modeling, Interpretable models, Ordinary differential equations (ODEs), Neural networks, Type 1 diabetes management, Glucose dynamics modeling, Causal grounding loss in hybrid models, Learning from small datasets, Observational data challenges, Hybrid loss combining causal and predictive loss, State-of-the-art predictive performance, Continuous glucose monitoring (CGM), Counterfactual reasoning
Sparse Inducing Points in Deep Gaussian Processes: Enhancing Modeling with Denoising Diffusion Variational Inference (Oral)
- Authors: JIAN XU, Delu Zeng, John Paisley
- Affiliations: South China University of Technology, Guangzhou, China, Columbia University, New York, USA
- TL;DR: This paper introduces Denoising Diffusion Variational Inference (DDVI) as a method to improve posterior inference of inducing points in Deep Gaussian Processes (DGPs), addressing biases in traditional variational methods. The proposed approach combines score matching and stochastic differential equations to enhance model efficiency and reduce computational complexity.
- Keywords: Deep Gaussian Processes, Bayesian deep learning, Denoising Diffusion Variational Inference (DDVI), Stochastic Differential Equation (SDE), Variational Inference, Posterior distribution approximation, computational complexity, bias in variational inference, Novel explicit variational lower bound for marginal likelihood function, Inducing points, KL divergence, score matching
Planning, Fast and Slow: Online Reinforcement Learning with Action-Free Offline Data via Multiscale Planners (Poster)
- Authors: Chengjie Wu, Hao Hu, yiqin yang, Ning Zhang, Chongjie Zhang
- Affiliations: Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China, Institute of Automation, Chinese Academy of Sciences, China, Department of Computer Science & Engineering, Washington University in St. Louis, MO, USA
- TL;DR: This paper investigates the use of passive data to enhance online reinforcement learning, proposing a novel algorithm called Multiscale State-Centric Planners (MSCP) that effectively addresses challenges in long-horizon tasks. Empirical evaluations show that MSCP significantly outperforms existing methods by leveraging passive observations for actionable insights.
- Keywords: reinforcement learning, passive RL, Multiscale State-Centric Planners (MSCP), online RL, offline RL, video data analysis, robotics manipulation, scientific discovery, distributional shift, extrapolation error, limited dataset coverage, long-horizon tasks, improved learning efficiency, dense training signals
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery (Poster)
- Authors: Pingchuan Ma, Johnson Tsun-Hsuan Wang, Minghao Guo, Zhiqing Sun, Josh Tenenbaum, Daniela Rus, Chuang Gan, Wojciech Matusik
- Affiliations: MIT BCS; Center for Brains, Minds and Machines, UMass Amherst; MIT-IBM Watson AI Lab, CMU LTI, MIT CSAIL
- TL;DR: This study proposes a bilevel optimization framework called the Scientific Generative Agent (SGA) that combines the reasoning capabilities of large language models with the computational power of simulations to enhance scientific discovery. The framework demonstrates efficacy in discovering constitutive laws and designing molecules, revealing innovative solutions that challenge conventional human expectations.
- Keywords: scientific discovery, large language models, simulations, bilevel optimization framework, Scientific Generative Agent (SGA), physics, chemistry, pharmacology, challenges in simulating observational feedback, grounding language with scientific discovery, novel solutions in constitutive law discovery, molecular design
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference (Poster)
- Authors: Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Angelopoulos, Tianle Li, Dacheng Li, Banghua Zhu, Hao Zhang, Michael Jordan, Joseph E Gonzalez, Ion Stoica
- Affiliations: UCSD, Stanford, UC Berkeley
- TL;DR: The paper introduces Chatbot Arena, an open platform designed to evaluate large language models (LLMs) based on human preferences through a crowdsourced pairwise comparison approach. It highlights the limitations of traditional static benchmarks and emphasizes the need for a dynamic evaluation method that reflects real-world usage.
- Keywords: Large Language Models, Human Preference Evaluation, Pairwise Comparison, Crowdsourcing, Model Evaluation, Benchmarking, Alignment with Human Preferences, Limitations of Static Benchmarks, Open Evaluation Platform, Credibility of Evaluation Methods, LLMs (Large Language Models), Human Preference
Active Preference Learning for Large Language Models (Poster)
- Authors: William Muldrew, Peter Hayes, Mingtian Zhang, David Barber
- Affiliations: Centre for Artificial Intelligence, University College London, London, UK
- TL;DR: This study develops an active learning strategy for Direct Preference Optimization (DPO) to enhance the efficiency of fine-tuning large language models using preference data. The proposed method improves both the learning rate and final performance compared to traditional reinforcement learning techniques.
- Keywords: Large Language Models, Human-AI Alignment, Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), Fine-tuning Language Models, Preference Learning, Complexity of RLHF, Stability in Fine-tuning, Active Learning Strategy, Improved Learning Rate, Performance on Preference Data
On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization (Poster)
- Authors: Motahareh Sohrabi, Juan Ramirez, Tianyue Zhang, Simon Lacoste-Julien, Jose Gallego-Posada
- Affiliations: Mila—Quebec AI Institute and DIRO, Université de Montréal; Canada CIFAR AI Chair, Mila—Quebec AI Institute and DIRO, Université de Montréal
- TL;DR: This paper introduces the νPI algorithm for updating Lagrange multipliers in constrained optimization, addressing the instability of traditional gradient descent-ascent methods. The proposed method demonstrates reliable stabilization of multiplier dynamics and generalizes existing momentum techniques, showing empirical success in various applications.
- Keywords: Constrained optimization, Neural networks, νPI algorithm, Lagrangian min-max formulations, PI controllers, Fairness, Sparsity, Active learning, Reinforcement learning, Model quantization, Unstable oscillatory dynamics, Nonconvex optimization, Dual variable convergence, Stabilization of multiplier dynamics, Generalization of momentum methods, Lagrange multipliers, Gradient descent-ascent
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay (Poster)
- Authors: Natasha Butt, Blazej Manczak, Auke Wiggers, Corrado Rainone, David Zhang, Michaël Defferrard, Taco Cohen
- Affiliations: University of Amsterdam; Qualcomm AI Research, Qualcomm AI Research; Qualcomm Technologies, Inc., Qualcomm AI Research
- TL;DR: This paper introduces CodeIt, a novel method for self-improvement in language models that addresses the challenges of program synthesis by utilizing hindsight relabeling and prioritized experience replay. The approach achieves state-of-the-art performance on the Abstraction and Reasoning Corpus, solving 15% of the evaluation tasks and demonstrating effective inter-task generalization.
- Keywords: language models, self-improvement, programming-by-examples, Code Iteration, hindsight relabeling, prioritized experience replay, Abstraction and Reasoning Corpus (ARC), general intelligence benchmarks, data sparsity in program synthesis, generalization between tasks, state-of-the-art performance on ARC, neuro-symbolic approach, ARC dataset
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback (Poster)
- Authors: Yufei Wang, Zhanyi Sun, Jesse Zhang, Zhou Xian, Erdem Biyik, David Held, Zackory Erickson
- Affiliations: Department of Computer Science, University of Southern California, Robotics Institute, Carnegie Mellon University
- TL;DR: This paper introduces RL-VLM-F, a method that automates the generation of reward functions for reinforcement learning agents using text descriptions and visual observations, significantly reducing the need for human effort in reward engineering. The approach demonstrates effectiveness across various domains, outperforming previous methods that relied on large pretrained models.
- Keywords: Reinforcement Learning, Reward Engineering, Vision Language Models, Preference Learning, Classic Control, Object Manipulation, Designing Reward Functions, High-Dimensional Environments, Automated Reward Generation, Effective Policies
Incorporating Information into Shapley Values: Reweighting via a Maximum Entropy Approach (Poster)
- Authors: Darya Biparva, Donatello Materassi
- Affiliations: Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, USA
- TL;DR: The paper reconciles the computation of Shapley values with causal inference methods by applying a maximum entropy perspective, proposing two variations of Shapley values that incorporate prior information about the model. The findings suggest that blindly applying Occam's razor to Shapley values does not yield satisfactory explanations.
- Keywords: Shapley values, Explainable AI (XAI), Maximum entropy approach, Additive feature attribution algorithms, Causal inference, Feature attribution, Interdependencies among features, Non-linear interactions, Variations of Shapley values based on entropy maximization, Occam’s razor
Exploiting Code Symmetries for Learning Program Semantics (Spotlight Poster)
- Authors: Kexin Pei, Weichen Li, Qirui Jin, Shuyang Liu, Scott Geng, Lorenzo Cavallaro, Junfeng Yang, Suman Jana
- Affiliations: University of Michigan, University College London, University of Washington, Columbia University; The University of Chicago, Huazhong University of Science and Technology, Columbia University
- TL;DR: This paper presents SYMC, a novel approach that incorporates code symmetries into the architecture of Large Language Models to enhance their ability to learn program semantics for automated program analysis. The results demonstrate that SYMC outperforms existing state-of-the-art models, including GPT-4, in various program analysis tasks without requiring pre-training.
- Keywords: code semantics, Large Language Models (LLMs), program analysis, self-attention, group-theoretic framework, software engineering, security tasks, robustness of code LLMs, generalization to new code, SYMC model, improved generalization, performance on program analysis tasks, code symmetries, permutation group, program dependence graph
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation (Poster)
- Authors: Kun Su, Xiulong Liu, Eli Shlizerman
- Affiliations: Department of ECE, University of Washington, Seattle, United States, Department of ECE, University of Washington, Seattle, United States; Department of Applied Math, University of Washington, Seattle, United States
- TL;DR: This study introduces a unified framework called Vision to Audio and Beyond (VAB) that bridges audio-visual representation learning and vision-to-audio generation by leveraging latent spaces for contextual learning. The model demonstrates efficiency in generating high-quality audio from video and acquiring semantic audio-visual features, leading to competitive results in various audio-visual tasks.
- Keywords: audio-visual representation learning, vision-to-audio generation, latent space modeling, masked audio token prediction, iterative-decoding, audio-visual retrieval, audio-visual classification, high-dimensional data, joint event comprehension, extensive training computations, unified model for audio-visual tasks, high-quality audio generation from video
Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach (Poster)
- Authors: Zixiao Wang, AmirEmad Ghassami, Ilya Shpitser
- Affiliations: Department of Biostatistics Johns Hopkins University, Baltimore, MD, Department of Mathematics and Statistics, Boston University, Boston, MA, Department of Computer Science, Johns Hopkins University, Baltimore, MD
- TL;DR: This study introduces a data fusion approach to identify and estimate parameters in settings with nonignorable missing data (MNAR) by augmenting MNAR datasets with auxiliary datasets that are missing at random (MAR). The authors derive inverse probability weighted estimators and demonstrate their effectiveness through simulations and applications.
- Keywords: Nonignorable missing data, Data fusion, Missing not at random (MNAR), Inverse probability weighted (IPW) estimators, Healthcare, Economics, Social sciences, Missing data, Missingness mechanisms, Identification of parameters, Estimation strategies, Missing at random (MAR), Outcome-selection model, Pattern-mixture model
Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams (Poster)
- Authors: Brian Cho, Kyra Gan, Nathan Kallus
- Affiliations: Department of ORIE, Cornell Tech, New York, NY, USA
- TL;DR: This paper introduces PEAK, a novel nonparametric sequential testing method for composite hypotheses across multiple data streams, which significantly reduces the number of samples needed for stopping while maintaining strong statistical power. The method demonstrates practical benefits in both synthetic and real-world applications, particularly in optimizing decision-making in sequential experiments.
- Keywords: Sequential testing, Nonparametric hypothesis testing, Testing-by-betting framework, Expectation-based averaged capital (PEAK), Pure-exploration bandit problems, Digital interventions, Composite hypotheses, Type-I error control, Early stopping, Novel betting scheme, Non-asymptotic α-level test, Reduction in sample size, HeartSteps dataset
Towards Compositionality in Concept Learning (Poster)
- Authors: Adam Stein, Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong
- Affiliations: Department of Computer and Information Science, University of Pennsylvania, Pennsylvania, USA, School of Computer Science, Peking University, Beijing, China
- TL;DR: This study proposes Compositional Concept Extraction (CCE) to discover compositional concept representations in foundation models, addressing the limitations of existing unsupervised methods. The evaluation demonstrates that CCE yields more compositional concepts and enhances accuracy in downstream classification tasks.
- Keywords: Concept-based interpretability, Compositionality in concept learning, Compositional Concept Extraction (CCE), PCA, KMeans, Image data, Text data, Non-compositional concept extraction, Black-box nature of foundation models, Improved compositional concept representations, Better accuracy on classification tasks, CUB dataset, CLIP model
A Minimaximalist Approach to Reinforcement Learning from Human Feedback (Poster)
- Authors: Gokul Swamy, Christoph Dann, Rahul Kidambi, Steven Wu, Alekh Agarwal
- Affiliations: Google Research, Carnegie Mellon University; Google Research, Carnegie Mellon University
- TL;DR: This paper introduces Self-Play Preference Optimization (SPO), a novel algorithm for reinforcement learning from human feedback that avoids the need for a reward model and adversarial training. The approach demonstrates significant efficiency and robustness in learning from human preferences, particularly in the presence of non-Markovian and stochastic conditions.
- Keywords: Reinforcement Learning from Human Feedback (RLHF), Preference-based Reinforcement Learning (PbRL), Self-Play Preference Optimization (SPO), Minimax Winner (MW), Robotics, Recommendation Systems, Retrieval Systems, Large Language Models (LLMs), Non-Markovian preferences, Intransitive preferences, Stochastic preferences, Compounding errors in offline approaches, Efficient learning methods, Robustness to noisy preferences
Efficient PAC Learnability of Dynamical Systems Over Multilayer Networks (Poster)
- Authors: Zirou Qiu, Abhijin Adiga, Madhav Marathe, S. S. Ravi, Daniel Rosenkrantz, Richard Stearns, Anil Vullikanti
- Affiliations: University of Virginia, Charlottesville, VA, USA; Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, VA, USA, Department of Computer Science, University at Albany – SUNY, Albany, NY, USA, Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, VA, USA, Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, VA, USA; Department of Computer Science, University at Albany – SUNY, Albany, NY, USA
- TL;DR: This study investigates the learnability of dynamical systems over multilayer networks, presenting an efficient PAC learning algorithm that requires few training examples to infer unknown systems. The findings provide a tight analysis of model complexity and establish theoretical foundations for future research in multilayer dynamical systems.
- Keywords: networked dynamical systems, multilayer networks, learnability, PAC learning algorithm, Natarajan dimension, cascading phenomena, contagion propagation, learning unknown interaction functions, model complexity, efficient learning guarantees, theoretical foundations for multilayer dynamical systems, interaction functions, threshold interaction functions, social contagions
Fast Decision Boundary based Out-of-Distribution Detector (Poster)
- Authors: Litian Liu, Yao Qin
- Affiliations: MIT, UC Santa Barbara
- TL;DR: This paper presents a computationally-efficient Out-of-Distribution (OOD) detection method that leverages feature distances to decision boundaries without relying on auxiliary models. The proposed approach demonstrates improved efficiency and effectiveness compared to state-of-the-art methods, making it suitable for real-time applications.
- Keywords: Out-of-Distribution (OOD) detection, AI safety, Feature distance to decision boundaries, closed-form estimation, Autonomous driving, AI systems deployment, Computational overhead in OOD detection, reliance on auxiliary models, Hyperparameter-free OOD detector, efficiency-effectiveness trade-off improvement, CIFAR-10, SVHN
Diffuse, Sample, Project: Plug-And-Play Controllable Graph Generation (Poster)
- Authors: Kartik Sharma, Srijan Kumar, Rakshit Trivedi
- Affiliations: Georgia Institute of Technology, Atlanta, GA, USA, Massachusetts Institute of Technology, Cambridge, MA, USA
- TL;DR: The study introduces PRODIGY, a novel approach for controlled graph generation using diffusion models, enabling precise control over graph properties while satisfying hard constraints. The method achieves up to 100% constraint satisfaction for various graph types, marking a significant advancement in interpretable graph generation.
- Keywords: graph generation, controlled graph generation, diffusion models, PRODIGY (Projected Diffusion), diffusion models, drug discovery, network optimization, social network analysis, controlling properties of generated graphs, handling hard constraints, up to 100% constraint satisfaction, precise control in graph generation
Multi-group Learning for Hierarchical Groups (Poster)
- Authors: Samuel Deng, Daniel Hsu
- Affiliations: Department of Computer Science, Columbia University
- TL;DR: This study extends multi-group learning to hierarchical groups, presenting an algorithm that outputs an interpretable decision tree predictor with near-optimal sample complexity. Empirical evaluations demonstrate its effectiveness in achieving generalization across real datasets with hierarchical structures.
- Keywords: multi-group learning, hierarchical structure, decision tree predictor, boosting-based algorithm, medical imaging, facial recognition, object recognition, natural language processing, subgroup performance, fairness in predictions, near-optimal sample complexity, group-wise error rates, agnostic PAC learning, hierarchical group structure
Encodings for Prediction-based Neural Architecture Search (Poster)
- Authors: Yash Akhauri, Mohamed Abdelfattah
- Affiliations: Cornell University, New York, USA
- TL;DR: This paper investigates various encoding methods for Neural Architecture Search (NAS) and introduces FLAN, a predictor that significantly reduces the cost of training accuracy predictors while enhancing sample efficiency. The study categorizes encodings into structural, learned, and score-based types, demonstrating their impact on NAS optimization.
- Keywords: Neural Architecture Search (NAS), Predictor-based methods, Accuracy predictors, Zero-cost proxies, Unsupervised pretraining, Structural encodings, Learned encodings, Neural network design, Optimization, Computational cost, Sample efficiency, FLAN: Flow Attention for NAS, Unified encodings, Cost reduction for training NAS accuracy predictors, NASBench-101, NASBench-201, NASBench-301, Network Design Spaces (NDS), TransNASBench-101
Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models (Poster)
- Authors: Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang
- Affiliations: University of Illinois Urbana-Champaign; Lapis Labs, University of Illinois Urbana-Champaign
- TL;DR: This paper introduces Language Agent Tree Search (LATS), a framework that enhances the capabilities of language models in reasoning, acting, and planning by integrating Monte Carlo Tree Search. The experimental results demonstrate LATS's effectiveness in various decision-making tasks, achieving state-of-the-art performance in programming and competitive results in web navigation.
- Keywords: Language models, autonomous agents, decision-making, Monte Carlo Tree Search, in-context learning, Programming, interactive question-answering, web navigation, mathematics, Limitations of simple acting processes, need for deliberate decision-making, Language Agent Tree Search (LATS), improved reasoning performance, state-of-the-art accuracy on HumanEval, HumanEval, WebShop
Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation (Poster)
- Authors: Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Wang
- Affiliations: Department of Computer Science, University of California, Santa Barbara, Language Technologies Institute, Carnegie Mellon University, Cheriton School of Computer Science, University of Waterloo
- TL;DR: This study investigates how pre-trained language models derive reasoning capabilities through the aggregation of indirect reasoning paths, particularly in the context of knowledge graphs and chain-of-thought reasoning. The findings suggest that enhancing training with random walk paths can significantly improve multi-step reasoning performance in real-world applications.
- Keywords: reasoning ability, pre-trained language models, next-token prediction, random walk paths, knowledge graphs, chain-of-thought reasoning, multi-step reasoning, logical reasoning, improved reasoning performance, aggregation of reasoning paths, multiple KG and CoT datasets, large language models (LLMs), reasoning graphs
On the Asymptotic Distribution of the Minimum Empirical Risk (Poster)
- Authors: Jacob Westerhout, TrungTin Nguyen, Xin Guo, Hien Nguyen
- Affiliations: School of Computing, Engineering, and Mathematical Sciences, La Trobe University, Bundoora, VIC 3086, Australia; Institute of Mathematics for Industry, Kyushu University, Nishi Ward, Fukuoka 819-0395, Japan, School of Mathematics and Physics, The University of Queensland, St Lucia, QLD 4072, Australia
- TL;DR: This paper characterizes the asymptotic distribution of the minimum empirical risk (MER) under various conditions, improving upon previous assumptions. The findings enable the construction of consistent confidence sets and hypothesis tests, with applications illustrated in neural network contexts.
- Keywords: Empirical Risk Minimization (ERM), Minimum Empirical Risk (MER), Statistical Inference, Asymptotic Distribution, Bootstrap, Penalized Model Selection, Neural Networks, Non-independent and identically distributed data, Discontinuous loss functions, Non-Euclidean spaces, Asymptotic distributions for MERs, Consistent confidence sets, Consistent hypothesis tests
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference (Oral)
- Authors: Bowen Zhao, Hannaneh Hajishirzi, Qingqing Cao
- Affiliations: University of Washington; Allen Institute for Artificial Intelligence, Apple, work done at the University of Washington
- TL;DR: The study introduces APT, a method that adaptively prunes and tunes parameters in large language models to enhance both training and inference efficiency. APT achieves up to 98% task performance retention while significantly reducing memory usage and speeding up fine-tuning processes.
- Keywords: Efficient training, Inference efficiency, Language models, Adaptive pruning, Tuning, Parameter-efficient fine-tuning (PEFT), Structured pruning, High training and inference costs, Memory consumption, APT method, Performance maintenance, Speedup in fine-tuning, Reduction in memory footprint, RoBERTa, T5, LLaMA, LoRA, Transformer
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator (Oral)
- Authors: Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, brian ichter
- Affiliations: Department of Computer Science, Stanford University, California, USA, Google DeepMind, California, USA, Department of Computer Science, Stanford University; Google DeepMind, California, USA, Google DeepMind, California, USA; Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California, USA
- TL;DR: This paper introduces Chain of Code (CoC), a method that enhances language model reasoning by integrating code-writing and emulation techniques. The results show that CoC significantly outperforms previous methods, broadening the scope of reasoning tasks that language models can effectively address.
- Keywords: Language Models, Code Reasoning, Chain of Thought, Code Emulation, Undefined behaviors in code execution, Complex reasoning tasks, Chain of Code (CoC), LM-augmented code emulator
Sample as you Infer: Predictive Coding with Langevin Dynamics (Poster)
- Authors: Umais Zahid, Qinghai Guo, Zafeirios Fountas
- Affiliations: Huawei Technologies Co., Ltd., Shenzhen, Guangdong, China, Huawei Technologies R&D, London, UK; Huawei Technologies Co., Ltd., Shenzhen, Guangdong, China, Huawei Technologies R&D, London, UK
- TL;DR: This paper introduces Langevin Predictive Coding (LPC), a novel algorithm for deep generative model learning that enhances predictive coding with Gaussian noise and Langevin sampling techniques. The results show that LPC achieves superior sample quality and faster convergence compared to traditional variational autoencoders (VAEs) while maintaining competitive performance across key metrics.
- Keywords: predictive coding, deep generative models, Langevin sampling, variational lower bound, Riemannian Langevin methods, adaptive SGD, robustness to sampling step size, training unsupervised deep generative models, Langevin Predictive Coding (LPC), superior sample quality, faster convergence, benchmark datasets, Gaussian noise, evidence lower bound (ELBO), hierarchical latent Gaussian generative models, Bayesian brain hypothesis, cognitive sciences
Ensemble Pruning for Out-of-distribution Generalization (Poster)
- Authors: Fengchun Qiao, Xi Peng
- Affiliations: DeepREAL Lab, Department of Computer and Information Sciences, University of Delaware, DE, USA
- TL;DR: This paper proposes a novel optimization framework for ensemble pruning that enhances out-of-distribution generalization by selecting complementary models with high predictive diversity. The approach is model-agnostic and demonstrates superior performance in both multi- and single-source out-of-distribution scenarios.
- Keywords: Ensemble learning, Out-of-distribution generalization, Ensemble pruning, Combinatorial optimization, Redundant models, Predictive diversity, Distribution shifts, Optimization framework, Model-agnostic approach, Deep neural networks, Topology graph
Online Matrix Completion: A Collaborative Approach with Hott Items (Poster)
- Authors: Dheeraj Baby, Soumyabrata Pal
- Affiliations: Adobe, Bangalore, India, Dept. of Computer Science, UC Santa Barbara, California, United States
- TL;DR: This study addresses the low rank matrix completion problem in an online setting, proposing two efficient algorithms that leverage user collaboration to improve recommendation accuracy. The algorithms achieve near-optimal regret guarantees, significantly enhancing the performance of recommendation systems.
- Keywords: online matrix completion, collaborative filtering, recommendation systems, PHASEDCLUSTERE-LIM, DETERMINANTELIM, low rank matrix completion, recommendation systems, user preference learning, exploration vs. exploitation, noisy rewards, sub-optimal item elimination, near-optimal per-user regret, regret guarantees
TENG: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets Toward Machine Precision (Poster)
- Authors: Zhuo Chen, Jacob McCarran, Esteban Vizcaino, Marin Soljačić, Di Luo
- Affiliations: Department of Physics, Massachusetts Institute of Technology; NSF AI Institute for Artificial Intelligence and Fundamental Interactions; Department of Physics, Harvard University, Department of Physics, Massachusetts Institute of Technology; NSF AI Institute for Artificial Intelligence and Fundamental Interactions
- TL;DR: This paper introduces the Time-Evolving Natural Gradient (TENG) method for solving partial differential equations (PDEs) using neural networks, achieving high accuracy and machine precision. The effectiveness of TENG is demonstrated through its superior performance on various PDEs, including the heat equation, Allen-Cahn equation, and Burgers’ equation.
- Keywords: Partial Differential Equations (PDEs), Neural Networks, Time-Evolving Natural Gradient (TENG), TENG-Euler, TENG-Heun, Natural Gradient Optimization, Computational Mathematics, Data-Driven Discovery, Initial Value Problems, Accuracy Challenges in PDE Solutions, High Accuracy in Neural-Network-Based PDE Solutions, Step-by-Step Optimizations, Machine Precision, Time-Dependent Variational Principles, Optimization-Based Time Integration
Extracting Training Data From Document-Based VQA Models (Poster)
- Authors: Francesco Pinto, Nathalie Rauschmayr, Florian Tramer, Phil Torr, Federico Tombari
- Affiliations: Department of Engineering of Science, University of Oxford, UK; Google, Zurich, Switzerland; None, Department of Engineering of Science, University of Oxford, UK, Google, Zurich, Switzerland, ETH Zurich, Zurich, Switzerland
- TL;DR: This study investigates the memorization behavior of Vision-Language Models in Document-Based Visual Question Answering, revealing that these models can extract sensitive information even when it is not present in the input. The authors propose a mitigation strategy to prevent the extractability of Personal Identifiable Information (PII).
- Keywords: Document-Based Visual Question Answering, Vision-Language Models, Visual Question Answering, Memorization of sensitive information, privacy risk, Mitigation strategy for extractable PII, DocVQA dataset, Personal Identifiable Information (PII), extractability
NeWRF: A Deep Learning Framework for Wireless Radiation Field Reconstruction and Channel Prediction (Poster)
- Authors: Haofan Lu, Christopher Vattheuer, Baharan Mirzasoleiman, Omid Abari
- Affiliations: Department of Computer Science, University of California Los Angeles (UCLA), Los Angeles, United States
- TL;DR: The study introduces NeWRF, a deep learning framework designed to predict wireless channels, significantly reducing the time and cost associated with traditional site surveys. The framework effectively utilizes sparse channel measurements to accurately predict wireless signal quality at unvisited locations, addressing common issues in wireless network deployments.
- Keywords: wireless channel prediction, deep learning, Neural Radiance Fields (NeRF), wireless network deployments, site surveys, dead spots, dropped signals, measurement density, NeWRF framework, accurate channel prediction
Image Hijacks: Adversarial Images can Control Generative Models at Runtime (Poster)
- Authors: Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons
- Affiliations: University of California, Berkeley, Harvard University, Cambridge University
- TL;DR: This study introduces the concept of image hijacks, which are adversarial images that can manipulate the behavior of vision-language models (VLMs) at inference time. The authors demonstrate that these hijacks can achieve over 80% success in various attacks, raising significant concerns about the security of foundation models against input-based adversarial attacks.
- Keywords: image hijacks, adversarial attacks, vision-language models (VLMs), Behaviour Matching algorithm, Prompt Matching method, security of foundation models, input-based attacks, adversarial robustness, adversarial images controlling VLMs, automated attacks, high success rates, VLMs, LLaVA, CLIP, LLaMA-2, large language models (LLMs), adversarial machine learning
Benchmarking Deletion Metrics with the Principled Explanations (Poster)
- Authors: Yipei Wang, Xiaoqian Wang
- Affiliations: Elmore Family School of Electrical and Computer Engineering, Purdue University, IN, USA
- TL;DR: This paper introduces the TRAjectory importanCE (TRACE) framework for evaluating attribution-based explanation methods using insertion/deletion metrics. It demonstrates that TRACE provides optimal results and addresses critical issues such as the out-of-distribution problem in model predictions.
- Keywords: Explainable Artificial Intelligence (XAI), Attribution Methods, Insertion/Deletion Metrics, TRAjectory importanCE (TRACE) framework, Out-of-Distribution (OOD) issue, Black-box nature of DNNs, Benchmarking insertion/deletion metrics, Evaluation metrics for attribution methods
How Language Model Hallucinations Can Snowball (Poster)
- Authors: Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah Smith
- Affiliations: Center for Data Science, New York University, Paul G. Allen School of Computer Science and Engineering, University of Washington; Allen Institute for AI, Princeton University; Princeton Language and Intelligence, Paul G. Allen School of Computer Science and Engineering, University of Washington
- TL;DR: This study investigates how language models can generate hallucinations that they recognize as incorrect, demonstrating that early mistakes can lead to further inaccuracies. The findings highlight the need for better understanding and mitigation of hallucination phenomena in language models.
- Keywords: Language Model Hallucinations, Knowledge Gaps in Language Models, Question-Answering Datasets, Zero-Shot Chain-of-Thought Prompting, Information-Seeking, Problem-Solving, Hallucination of Incorrect Statements, Early Mistakes Leading to More Mistakes, Identification of Incorrect Claims by Language Models, GPT-3.5, GPT-4, LLaMA2-70B-chat
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data (Poster)
- Authors: Paul Scotti, Mihir Tripathy, Cesar Kadir Torrico Villanueva, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth Norman, Tanishq Abraham
- Affiliations: Princeton Neuroscience Institute, Stability AI; Medical AI Research Center (MedARC), University of Minnesota, Medical AI Research Center (MedARC), University of Waterloo, Stability AI; Medical AI Research Center (MedARC); Princeton Neuroscience Institute, The University of Sydney
- TL;DR: The study presents a novel approach for reconstructing visual perception from fMRI data using only 1 hour of training data by leveraging shared-subject models and functional alignment. This method significantly enhances out-of-subject generalization and achieves state-of-the-art results in image retrieval and reconstruction metrics.
- Keywords: visual perception reconstruction, fMRI data analysis, shared-subject models, functional alignment procedure, CLIP, Stable Diffusion XL, neuroscience, brain imaging, image reconstruction, data sparsity, high-quality reconstruction with limited data, improved out-of-subject generalization, state-of-the-art image retrieval and reconstruction metrics, Natural Scenes Dataset, CLIP, Stable Diffusion, fMRI, latent space, pixel space
PARCv2: Physics-aware Recurrent Convolutional Neural Networks for Spatiotemporal Dynamics Modeling (Poster)
- Authors: Phong Nguyen, Xinlun Cheng, Shahab Azarfar, Pradeep Seshadri, Yen Nguyen, Munho Kim, Sanghun Choi, H. Udaykumar, Stephen Baek
- Affiliations: School of Data Science, University of Virginia, United States; Department of Astronomy, University of Virginia, United States, School of Mechanical Engineering, Kyungpook National University, Republic of Korea, School of Data Science, University of Virginia, United States, Department of Mechanical Engineering, University of Iowa, United States, School of Data Science, University of Virginia, United States; Department of Mechanical and Aerospace Engineering, University of Virginia, United States
- TL;DR: The study presents PARCv2, an advanced model for simulating unsteady and advection-dominant physics problems using a physics-aware recurrent convolutional approach. It demonstrates improved capabilities in modeling complex dynamics, particularly in fluid dynamics and energetic materials, compared to existing models.
- Keywords: Physics-aware deep learning, spatiotemporal dynamics, Recurrent convolutions, differentiator-integrator architecture, differential operators, Fluid dynamics, energetic materials, Unsteady dynamics, fast transients, advection-dominated systems, nonlinear field evolution, PARCv2 model, hybrid integral solver, long-time predictions, Partial differential equations (PDEs), advection-reaction-diffusion equations
Prediction Accuracy of Learning in Games : Follow-the-Regularized-Leader meets Heisenberg (Poster)
- Authors: Yi Feng, Georgios Piliouras, Xiao Wang
- Affiliations: Google DeepMind, London, United Kingdom, Shanghai University of Finance and Economics, Shanghai, China; Key Laboratory of Interdisciplinary Research of Computation and Economics, China, Shanghai University of Finance and Economics, Shanghai, China
- TL;DR: This study investigates the accuracy of predictions in deterministic learning dynamics of zero-sum games, focusing on observer uncertainty and covariance evolution. It establishes a Heisenberg-type inequality for the Follow-the-Regularized-Leader algorithm and demonstrates that Symplectic discretization improves prediction accuracy in learning dynamics.
- Keywords: prediction accuracy, learning dynamics, zero-sum games, Follow-the-Regularized-Leader (FTRL), Euler discretization, Symplectic discretization, machine learning, game theory, observer uncertainty, prediction challenges in learning dynamics, growth rates of covariance information, Heisenberg-type inequality for FTRL
Scalable AI Safety via Doubly-Efficient Debate (Oral)
- Authors: Jonah Brown-Cohen, Geoffrey Irving, Georgios Piliouras
- Affiliations: Google DeepMind, London, UK
- TL;DR: This paper presents a new set of debate protocols aimed at improving AI safety by allowing honest strategies to verify the alignment of stochastic AI systems using polynomial simulation steps. The findings suggest that these protocols can effectively manage the complexities of human oversight in high-stakes AI applications, such as legal document drafting.
- Keywords: AI Safety, AI Alignment, Debate Protocols, Natural Language Processing, Legal Document Drafting, Human Oversight Limitations, Complexity of Tasks, New Debate Protocols, Efficient Human Judgement Utilization, Large Language Models (LLMs), Stochastic AI Systems
Unsupervised Concept Discovery Mitigates Spurious Correlations (Poster)
- Authors: Md Rifat Arefin, Yan Zhang, Aristide Baratin, Francesco Locatello, Irina Rish, Dianbo Liu, Kenji Kawaguchi
- Affiliations: National University of Singapore, Mila, University of Montreal, Canada, Samsung - SAIT AI Lab, Montreal, Canada, Institute of Science and Technology Austria
- TL;DR: This paper introduces CoBalT, a novel method that leverages unsupervised object-centric learning to mitigate spurious correlations in deep learning models without requiring human-labeled subgroup annotations. The approach demonstrates competitive performance on benchmark datasets, addressing the challenge of model brittleness and unintended biases.
- Keywords: spurious correlations, unsupervised learning, object-centric learning, concept balancing, vector quantization, contrastive learning, image classification, representation learning, spurious correlations, model brittleness, unintended biases, CoBalT (concept balancing technique), robust classification, benchmark datasets for sub-population shifts
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning (Poster)
- Authors: Tianlang Chen, Shengjie Luo, Di He, Shuxin Zheng, Tie-Yan Liu, Liwei Wang
- Affiliations: National Key Laboratory of General Artificial Intelligence; School of Intelligence Science and Technology, Peking University, School of EECS, Peking University, National Key Laboratory of General Artificial Intelligence; School of Intelligence Science and Technology; Center for Machine Learning Research, Peking University, Microsoft Research AI4Science
- TL;DR: This study introduces GeoMFormer, a novel Transformer-based architecture designed for geometric molecular representation learning that effectively captures both invariant and equivariant features. Extensive experiments demonstrate its strong performance across various molecular modeling tasks, addressing the need for a flexible framework in this domain.
- Keywords: Molecular modeling, Geometric representation learning, Transformer-based models, Cross-attention modules, Quantum mechanics, Molecular systems, Invariance and equivariance in molecular representation, GeoMFormer architecture, Strong performance on invariant and equivariant tasks
The Fundamental Limits of Least-Privilege Learning (Poster)
- Authors: Theresa Stadler, Bogdan Kulynych, Michael Gastpar, Nicolas Papernot, Carmela Troncoso
- Affiliations: EPFL, Lausanne, Switzerland, University of Toronto & Vector Institute, Toronto, Canada, Lausanne University Hospital & University of Lausanne, Switzerland
- TL;DR: This study formalizes the least-privilege principle in machine learning, demonstrating a fundamental trade-off between the utility of feature representations for a task and the potential leakage of sensitive information. The findings indicate that achieving high utility while completely preventing inference of unrelated attributes is not feasible.
- Keywords: least-privilege learning, data misuse prevention, feature representations, feature mappings, machine learning as a service (MLaaS), information leakage, unintended inferences, data misuse, formalisation of least-privilege principle, trade-off between utility and leakage, Conditional Entropy Bottleneck
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs (Oral)
- Authors: Yeonhong Park, Jake Hyun, SangLyul Cho, Bonggeun Sim, Jae W. Lee
- Affiliations: Seoul National University
- TL;DR: This paper introduces any-precision LLM, a method for the low-cost deployment of multiple, different-sized Large Language Models (LLMs) by leveraging a lightweight quantization technique. The proposed solution significantly reduces deployment costs while maintaining model quality and inference throughput.
- Keywords: Large Language Models, any-precision LLM, quantization, post-training quantization, deployment of LLMs, high deployment costs, memory costs, training multiple model versions, lightweight method for any-precision quantization, efficient serving of LLMs
Towards Modular LLMs by Building and Reusing a Library of LoRAs (Poster)
- Authors: Oleksiy Ostapenko, Zhan Su, Edoardo Ponti, Laurent Charlin, Nicolas Le Roux, Lucas Caccia, Alessandro Sordoni
- Affiliations: Microsoft Research, Microsoft Research; Mila — Quebec AI Institute; Université de Montréal; Canada CIFAR AI Chair, University of Edinburgh, Mila — Quebec AI Institute; University of Copenhagen, Mila — Quebec AI Institute; HEC Montréal; Canada CIFAR AI Chair, Microsoft Research; Mila — Quebec AI Institute; Université de Montréal
- TL;DR: This study focuses on building and reusing a library of parameter-efficient adapters for large language models (LLMs) to enhance their performance on new tasks through techniques like model-based clustering and a novel routing mechanism. The findings demonstrate that these methods lead to superior generalization compared to traditional joint training approaches.
- Keywords: Modular LLMs, Parameter-efficient adapters, Zero-shot generalization, LoRA (Low-Rank Adaptation), Model-based clustering (MBC), Arrow routing mechanism, Large language models (LLMs), Multi-task learning, Improving LLM performance on new tasks, Task generalization, Dynamic selection of relevant adapters, Superior generalization to new tasks
Sequential Neural Score Estimation: Likelihood-Free Inference with Conditional Score Based Diffusion Models (Spotlight Poster)
- Authors: Louis Sharrock, Jack Simons, Song Liu, Mark Beaumont
- Affiliations: Department of Mathematics and Statistics, Lancaster University, UK; School of Mathematics, University of Bristol, UK, School of Mathematics, University of Bristol, UK
- TL;DR: This paper introduces Sequential Neural Posterior Score Estimation (SNPSE), a novel score-based method for Bayesian inference in simulator-based models, which effectively reduces simulation costs by leveraging conditional score-based diffusion models. The method demonstrates comparable or superior performance to existing state-of-the-art techniques like Sequential Neural Posterior Estimation (SNPE) across various numerical examples.
- Keywords: Bayesian inference, likelihood-free inference, simulator-based models, Sequential Neural Posterior Score Estimation (SNPSE), score-based methods, conditional score-based diffusion models, Neuroscience, evolutionary biology, ecology, epidemiology, climate science, cosmology, high-energy physics, econometrics, Absence of a tractable likelihood function, inference from data, New methods for Bayesian inference, sequential training procedures
Early Time Classification with Accumulated Accuracy Gap Control (Poster)
- Authors: Liran Ringel, Regev Cohen, Daniel Freedman, Michael Elad, Yaniv Romano
- Affiliations: Department of Computer Science, Technion—Israel Institute of Technology, Haifa, Israel; Department of Electrical and Computer Engineering, Technion—Israel Institute of Technology, Haifa, Israel, Verily AI, Israel
- TL;DR: This paper introduces a statistical framework for early time series classification that allows for accurate predictions without processing the entire input stream. The proposed method significantly reduces the number of timesteps required for classification while maintaining rigorous accuracy gap control.
- Keywords: Early time series classification, predictive inference, Statistical framework, calibrated stopping rule, Learn-then-Test calibration, Reading comprehension, real-time song identification, computational tomography, Accuracy gap control, early halt times, Early stopping mechanism, reduction of timesteps for classification, Sequential classifier, i.i.d. instances
Sequential Disentanglement by Extracting Static Information From A Single Sequence Element (Poster)
- Authors: Nimrod Berman, Ilan Naiman, Idan Arbiv, Gal Fadlon, Omri Azencot
- Affiliations: Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- TL;DR: This study presents a novel architecture for unsupervised sequential disentanglement that effectively mitigates information leakage by conditioning on a single sample. The proposed method outperforms existing approaches on various benchmarks in generation and prediction tasks.
- Keywords: Unsupervised sequential disentanglement, Representation learning, Variational autoencoders (VAEs), Dynamic extensions, Time series analysis, Video processing, Audio processing, Information leakage, Non-disentangled representation, Novel architecture for disentanglement, Variational framework, Static factors, Dynamic factors, Latent codes
Robust Universal Adversarial Perturbations (Poster)
- Authors: Changming Xu, Gagandeep Singh
- Affiliations: Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, USA, VMWare, California, USA; Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, USA
- TL;DR: This study introduces a method for generating Universal Adversarial Perturbations (UAPs) that are robust against real-world transformations, significantly improving their effectiveness in practical attack scenarios. The proposed UAPs demonstrate up to 23% greater robustness compared to existing state-of-the-art methods.
- Keywords: Universal Adversarial Perturbations, Adversarial Machine Learning, Iterative algorithm, probabilistic robustness bounds, Image classification, audio classification, Non-robustness of adversarial perturbations, real-world transformations, Robust UAPs, improved robustness against transformations, CIFAR-10, ILSVRC 2012, Deep Neural Networks (DNNs), adversarial perturbations
Model Assessment and Selection under Temporal Distribution Shift (Poster)
- Authors: Elise Han, Chengpiao Huang, Kaizheng Wang
- Affiliations: Department of Industrial Engineering and Operations Research, Columbia University, New York, NY, United States; Data Science Institute, Columbia University, New York, NY, United States, Department of Industrial Engineering and Operations Research, Columbia University, New York, NY, United States, Department of Computer Science, Columbia University, New York, NY, United States
- TL;DR: This paper presents adaptive methods for model assessment and selection in environments with temporal distribution shifts, utilizing a rolling window approach to estimate generalization errors. The proposed techniques enable effective comparison of models and demonstrate adaptability to changing data distributions.
- Keywords: model assessment, model selection, temporal distribution shift, rolling window approach, pairwise comparisons, single-elimination tournament, non-stationarity in data, model performance degradation, distribution shift, adaptive model selection, generalization error estimation
Minimizing $f$-Divergences by Interpolating Velocity Fields (Poster)
- Authors: Song Liu, Jiahao Yu, Jack Simons, Mingxuan Yi, Mark Beaumont
- Affiliations: University of Bristol, Bristol, UK
- TL;DR: This study presents a method for minimizing f-divergences between particle and target distributions by estimating velocity fields through interpolation techniques, addressing issues of overfitting in previous density ratio estimations. The proposed approach demonstrates effectiveness in applications such as domain adaptation and missing data imputation.
- Keywords: machine learning, statistical divergence minimization, particle distribution, Wasserstein Gradient Flow, Stein Variational Gradient Descent, density ratio estimation, interpolation techniques, domain adaptation, data imputation, generative modeling, overfitting, statistical discrepancy, unnormalized target density functions, consistent estimators, improved velocity field estimation, f-divergence, Kullback-Leibler divergence, probability flow ODE
Low-Cost High-Power Membership Inference Attacks (Oral)
- Authors: Sajjad Zarifzadeh, Philippe Liu, Reza Shokri
- Affiliations: National University of Singapore (NUS), CS Department
- TL;DR: This paper presents a novel statistical test for robust membership inference attacks (RMIA) that operates with low computational overhead and superior test power compared to prior methods. The findings indicate that RMIA effectively enhances the differentiation between member and non-member data points, laying the groundwork for practical data privacy risk assessment in machine learning.
- Keywords: Membership inference attacks, data privacy risk assessment, Robust membership inference attacks (RMIA), likelihood ratio tests, Machine learning, Information leakage, computational cost of attacks, performance instability, Novel statistical test for MIA, enhanced differentiation between member and non-member data points
Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses (Poster)
- Authors: Panagiotis Koromilas, Giorgos Bouritsas, Theodoros Giannakopoulos, Mihalis Nicolaou, Yannis Panagakis
- Affiliations: Department of Informatics and Telecommunications, National and Kapodistrian University of Athens; Archimedes AI/Athena Research Center, NCSR "Demokritos", The Cyprus Institute
- TL;DR: This study analyzes various contrastive learning losses, demonstrating that they share the same optimal solutions under certain conditions, leading to the introduction of a new objective called Decoupled Hyperspherical Energy Loss (DHEL). The empirical results indicate that DHEL enhances performance and robustness across different batch sizes and hyperparameters in computer vision tasks.
- Keywords: Contrastive Learning, Representation Learning, InfoNCE, Kernel Contrastive Learning (KCL), Decoupled Hyperspherical Energy Loss (DHEL), Computer Vision, Memory issues with large batches, Sensitivity to temperature hyperparameter, Dimensionality collapse, Hard-negative sampling strategies, Improved downstream performance, Robustness across batch sizes and hyperparameters, Reduced dimensionality collapse, Hyperspherical Energy Minimisation (HEM)
Emergence of In-Context Reinforcement Learning from Noise Distillation (Poster)
- Authors: Ilya Zisman, Vladislav Kurenkov, Alexander Nikulin, Viacheslav Sinii, Sergey Kolesnikov
- Affiliations: AIRI, Moscow, Russia; Skoltech, Moscow, Russia, Tinkoff, Moscow, Russia, Tinkoff, Moscow, Russia; Innopolis University, Kazan, Russia, AIRI, Moscow, Russia; MIPT, Moscow, Russia, AIRI, Moscow, Russia; Innopolis University, Kazan, Russia
- TL;DR: This study introduces ADε, a novel data acquisition method that enables in-context reinforcement learning through synthetic noise injection, addressing the challenges of sample inefficiency and generalization in RL. Experimental results show that this approach allows RL agents to outperform the best suboptimal policies in learning datasets by a significant margin.
- Keywords: In-Context Reinforcement Learning, Noise Distillation, Reinforcement Learning, Transformers, Noise Injection Curriculum, Sample Inefficiency, Generalization to New Tasks, ADε (new data acquisition approach), Learning Histories Generation, Meta-RL, Curriculum Learning
Subgoal-based Demonstration Learning for Formal Theorem Proving (Poster)
- Authors: Xueliang Zhao, Wenda Li, Lingpeng Kong
- Affiliations: The University of Hong Kong, University of Edinburgh
- TL;DR: This paper introduces a subgoal-based demonstration learning framework to enhance the efficiency of proof search in large language models for formal theorem proving. The proposed methods significantly improve proof accuracy and sampling efficiency, demonstrating the potential of LLMs in automated theorem proving.
- Keywords: formal theorem proving, large language models (LLMs), subgoal-based demonstration learning, reinforcement learning, diffusion models, software verification, research-level mathematics, proof search efficiency, proof accuracy, increased proof accuracy, improved sampling efficiency, miniF2F benchmark
Vector Quantization Pretraining for EEG Time Series with Random Projection and Phase Alignment (Poster)
- Authors: Haokun Gui, Xiucheng Li, Xinyang Chen
- Affiliations: School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China
- TL;DR: This study introduces VQ-MTM, a self-supervised learning model for EEG time series analysis that utilizes random projection and phase alignment to enhance seizure detection and classification. The model demonstrates significant performance improvements over existing methods across multiple datasets.
- Keywords: EEG time series analysis, self-supervised learning, Vector Quantization, BERT-style modeling, random-projection quantization, phase-aligning module, Time-Phase-Shift Equivariance of Fourier Transform, Neurological disorder diagnosis, seizure detection and classification, Data sparsity, labor-intensive diagnosis, rare disorder types, VQ-MTM model, improved performance in seizure detection and classification, Five real-world datasets, large-scale datasets
Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data (Poster)
- Authors: Giannis Daras, Alexandros Dimakis, Constantinos Daskalakis
- Affiliations: Department of Computer Science, University of Texas at Austin; Archimedes AI, Department of Electrical Engineering and Computer Science, MIT, Department of Electrical and Computer Engineering, University of Texas at Austin
- TL;DR: This paper presents a novel framework for training diffusion models using only corrupted data, addressing the memorization issue prevalent in existing models. The proposed method effectively samples from the uncorrupted distribution and reduces memorization while maintaining competitive performance.
- Keywords: Ambient diffusion, diffusion models, corrupted data, Tweedie’s formula, consistency loss function, Image generation, MRI, black-hole imaging, Memorization of training examples, copyright and privacy concerns, training with corrupted data, Exact framework for learning diffusion models, optimal denoisers, Stable Diffusion XL
Simple linear attention language models balance the recall-throughput tradeoff (Spotlight Poster)
- Authors: Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, James Zou, Atri Rudra, Christopher Re
- Affiliations: University of Buffalo, Stanford University
- TL;DR: This study investigates the efficiency of language models, particularly focusing on the recall-throughput tradeoff. The proposed BASED architecture combines linear and sliding window attention, achieving significant improvements in recall and throughput compared to existing models.
- Keywords: language models, recall, efficiency, attention, linear attention, sliding window attention, BASED architecture, language generation, recall-intensive tasks, memory consumption, recall-throughput tradeoff, BASED architecture, improved throughput, enhanced recall, KV-cache, sub-quadratic models, perplexity, Pareto frontier
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts (Poster)
- Authors: Guanjie Chen, Xinyu Zhao, Tianlong Chen, Yu Cheng
- Affiliations: The Chinese University of Hong Kong, The University of North Carolina at Chapel Hill, The University of North Carolina at Chapel Hill; MIT; Harvard University, Shanghai Artificial Intelligence Laboratory; Shanghai Jiao Tong University
- TL;DR: This study introduces MoE-RBench, a framework for assessing the reliability of Sparse Mixture-of-Experts models, highlighting their performance in safety, adversarial resilience, and robustness. The findings suggest that with proper training settings, MoE models can outperform dense language models in reliability, addressing critical issues in downstream applications.
- Keywords: Mixture-of-Experts (MoE), Large Language Models (LLMs), reliability assessment, Sparse Mixture-of-Experts (SMoE), conditional computation, language modeling, translation, vision, multimodality, reliability issues, harmful content generation, false information, performance drops, domain transfer instability, comprehensive assessment of MoE reliability, insights into adapting pre-trained MoE models, Transformer, adversarial attacks, out-of-distribution robustness, safety, hallucination, AI safety
Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks (Poster)
- Authors: Akshay Kumar Jagadish, Julian Coda-Forno, Mirko Thalmann, Eric Schulz, Marcel Binz
- Affiliations: Computational Principles of Intelligence Lab, Max Planck Institute for Biological Cybernetics, Tübingen, Germany; Institute for Human-Centered AI, Helmholtz Computational Health Center, Munich, Germany
- TL;DR: This study demonstrates that large language models can generate ecologically valid category learning tasks, and introduces a new model called ecologically rational meta-learned inference (ERMI) that explains human category learning better than existing cognitive models. The findings suggest that human cognition can be understood through the lens of ecological rationality, with implications for developing more effective cognitive models.
- Keywords: Ecological rationality, Category learning, Human cognition, Large language models, Meta-learning, Ecologically rational meta-learned inference (ERMI), Cognitive science, Classification tasks, Defining ecologically valid tasks, Building rational models for cognitive tasks, ERMI model, Better explanation of human data, State-of-the-art performance on OpenML-CC18, OpenML-CC18
Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding (Poster)
- Authors: Guangyi Liu, Yu Wang, Zeyu Feng, Qiyu Wu, Liping Tang, Yuan Gao, Zhen Li, Shuguang Cui, Julian McAuley, Zichao Yang, Eric Xing, Zhiting Hu
- Affiliations: MBZUAI; CMU, University of Tokyo, CMU, CUHK-Shenzhen, MBZUAI, UC San Diego, Stanford University
- TL;DR: This paper introduces Generalized Encoding-Decoding Diffusion Probabilistic Models (EDDPMs) that unify the capabilities of generation, reconstruction, and representation across various data types. Extensive experiments demonstrate EDDPMs' flexibility and superior performance compared to existing models in handling diverse tasks.
- Keywords: deep generative models, encoding-decoding, diffusion models, Generalized Encoding-Decoding Diffusion Probabilistic Models (EDDPMs), Gaussian noising-denoising, variational autoencoders (VAEs), generative adversarial networks (GANs), text synthesis, protein sequence generation, image generation, limitations of existing models, need for flexible data handling, integration of core capabilities, enhanced performance across data types
ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy (Poster)
- Authors: Kirill Vishniakov, Zhiqiang Shen, Zhuang Liu
- Affiliations: Meta AI Research, MBZUAI
- TL;DR: This study conducts a comparative analysis of ConvNet and Vision Transformer models across supervised and CLIP training paradigms, revealing significant differences in model behaviors despite similar ImageNet accuracies. The findings emphasize the need for nuanced evaluation metrics beyond traditional accuracy measures to better inform model selection for specific applications.
- Keywords: computer vision, model selection, model evaluation, ConvNet, Vision Transformer, CLIP, image classification, model robustness, model overfitting, performance evaluation, transferability, comparative analysis, model characteristics, ImageNet
Principled Preferential Bayesian Optimization (Oral)
- Authors: Wenjie Xu, Wenbin Wang, Yuning Jiang, Bratislav Svetozarevic, Colin Jones
- Affiliations: Automatic Control Laboratory, EPFL, Lausanne, Switzerland; Urban Energy Systems Laboratory, Empa, Zurich, Switzerland, Automatic Control Laboratory, EPFL, Lausanne, Switzerland, Urban Energy Systems Laboratory, Empa, Zurich, Switzerland; The Institute for Artificial Intelligence Research and Development of Serbia, Serbia
- TL;DR: This study presents a novel approach to preferential Bayesian optimization that utilizes preference feedback to optimize black-box functions, introducing an optimistic algorithm with theoretical guarantees on cumulative regret and convergence. Experimental results demonstrate that the proposed method outperforms existing heuristics in various applications.
- Keywords: Preferential Bayesian Optimization, Black-box Function Optimization, Likelihood Ratio, Gaussian Processes, Surrogate Modeling, Visual Design Optimization, Thermal Comfort Optimization, Robotic Gait Optimization, Preference Feedback, Cumulative Regret, Global Convergence, Optimistic Algorithm, Information-Theoretic Bound, Estimated Best Solution
CogBench: a large language model walks into a psychology lab (Poster)
- Authors: Julian Coda-Forno, Marcel Binz, Jane Wang, Eric Schulz
- Affiliations: Google DeepMind, London, UK, Computational Principles of Intelligence Lab, Max Planck Institute for Biological Cybernetics, Tübingen, Germany; Institute for Human-Centered AI, Helmholtz Computational Health Center, Munich, Germany
- TL;DR: This study introduces CogBench, a benchmark for evaluating large language models (LLMs) using behavioral metrics from cognitive psychology experiments. Key findings reveal that larger models and those trained with reinforcement learning from human feedback exhibit more human-like behavior and improved performance, while open-source models are less risk-prone than proprietary ones.
- Keywords: Large Language Models, Cognitive Psychology, Behavioral Metrics, Statistical Multilevel Modeling, Reinforcement Learning from Human Feedback (RLHF), Prompt-Engineering Techniques, AI Evaluation, Human Behavior Alignment, Evaluation Challenges of LLMs, Opacity of Model Behavior, Introduction of CogBench, Insights on Model Size and RLHF Impact
Learning a Diffusion Model Policy from Rewards via Q-Score Matching (Poster)
- Authors: Michael Psenka, Alejandro Escontrela, Pieter Abbeel, Yi Ma
- Affiliations: Department of Electrical Engineering and Computer Science, University of California, Berkeley
- TL;DR: This paper introduces a novel policy update method called Q-score matching for reinforcement learning using diffusion models, addressing the challenges of optimizing policies in continuous action spaces. The proposed method demonstrates the ability to learn effective policies through off-policy reinforcement learning, showing promise in simulated environments.
- Keywords: reinforcement learning, diffusion models, Q-score matching, behavior cloning, off-policy reinforcement learning, robotics, continuous action spaces, optimization of policies, sampling challenges in continuous spaces, new policy update method, geometric perspective on policy optimization
Averaging $n$-step Returns Reduces Variance in Reinforcement Learning (Poster)
- Authors: Brett Daley, Martha White, Marlos C. Machado
- Affiliations: Department of Computing Science, University of Alberta; Alberta Machine Intelligence Institute; Canada CIFAR AI Chair
- TL;DR: This study demonstrates that averaging n-step returns into compound returns reduces variance in reinforcement learning, leading to improved sample efficiency. The findings suggest that these methods can enhance the performance of deep RL agents like DQN and PPO.
- Keywords: reinforcement learning, multistep returns, sample efficiency, n-step returns, λ-returns, temporal-difference learning, deep reinforcement learning, value function learning, variance in multistep returns, sample complexity, compound returns, variance reduction, two-bootstrap returns
SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning (Poster)
- Authors: Matthias Weissenbacher, Rishabh Agarwal, Yoshinobu Kawahara
- Affiliations: Google DeepMind, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan; Graduate School of Information Science and Technology, Osaka University, Japan
- TL;DR: This paper introduces the Symmetry-Invariant Transformer (SiT), a novel architecture designed to enhance generalization in reinforcement learning by leveraging local and global symmetries. The results demonstrate SiT's superior performance over traditional Vision Transformers on various benchmarks, highlighting its sample efficiency and adaptability to new environments.
- Keywords: reinforcement learning, generalization, Symmetry-Invariant Transformer (SiT), Graph Symmetric Attention (GSA), self-attention, image-based reinforcement learning, MiniGrid, Procgen, Atari, CIFAR10, out-of-distribution generalization, data augmentation challenges, sample inefficiency, improved generalization, sample efficiency, MiniGrid, Procgen, Atari 100k, CIFAR10, Vision Transformers (ViTs), local and global symmetries, equivariance, invariance
Linguistic Calibration of Long-Form Generations (Poster)
- Authors: Neil Band, Xuechen Li, Tengyu Ma, Tatsunori Hashimoto
- Affiliations: Department of Computer Science, Stanford University
- TL;DR: This study introduces a framework for linguistic calibration of long-form generations in language models, enabling them to convey confidence levels in their claims. The findings demonstrate that the calibrated model significantly improves user decision-making by providing probabilistic predictions alongside generated content.
- Keywords: Linguistic calibration, long-form generations, language models, Reinforcement learning, supervised finetuning, Decision-making, biomedical questions, scientific questions, Hallucination, knowledge gaps in language models, Calibrated long-form generations, improved user decision-making, Llama 2 7B, AI Safety, AI Alignment, Hallucination
Optimally Improving Cooperative Learning in a Social Setting (Poster)
- Authors: Shahrzad Haddadan, Cheng Xin, Jie Gao
- Affiliations: Rutgers Business School, Piscataway, NJ, USA, Department of Computer Science, Rutgers University, Piscataway, NJ, USA
- TL;DR: This study investigates how to optimally correct a few classifiers in a network of agents to enhance overall prediction accuracy, presenting a polynomial time algorithm for aggregate optimization and demonstrating the NP-hard nature of egalitarian optimization. The findings have significant implications for cooperative learning in various applications, including cybersecurity and social networks.
- Keywords: Cooperative learning, Networked agents, Classification task, Polynomial time algorithm, Approximation algorithms, Cybersecurity, Online social networks, Erroneous classifiers, Accuracy maximization, NP-hard optimization, Optimization algorithms, Performance guarantees, Synthetic data, Real data
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference (Poster)
- Authors: Piotr Nawrot, Adrian Łańcucki, Marcin Chochowski, David Tarjan, Edoardo Ponti
- Affiliations: NVIDIA; University of Wrocław, NVIDIA; University of Edinburgh, NVIDIA
- TL;DR: This study introduces Dynamic Memory Compression (DMC) to enhance the efficiency of large language models during inference by reducing the memory load of key-value caches. The method achieves significant throughput improvements while maintaining performance, allowing for longer contexts and larger batches within existing memory constraints.
- Keywords: Large Language Models, Transformers, Memory Efficiency, Dynamic Memory Compression (DMC), Grouped Query Attention (GQA), Auto-regressive Inference, Conversational AI, Inefficiency in memory usage, excessive memory load during generation, Increased throughput, cache compression, performance preservation, Llama 2, NVIDIA H100 GPU, Key-Value Cache, Attention Mechanisms
On Positivity Condition for Causal Inference (Poster)
- Authors: Inwoo Hwang, Yesong Choe, Yeahoon Kwon, Sanghack Lee
- Affiliations: Graduate School of Data Science, Seoul National University, Seoul, South Korea, Artificial Intelligence Institute, Seoul National University, Seoul, South Korea; Graduate School of Data Science, Seoul National University, Seoul, South Korea, Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
- TL;DR: The study investigates the positivity condition necessary for causal effect identification in observational studies, particularly in scenarios where strict positivity may not hold. It explores various methodologies to derive identification formulas without relying on strict positivity, ultimately proposing a positivity-aware identification algorithm.
- Keywords: Causal inference, observational studies, causal effect identification, do-calculus, Q-decomposition, adjustment criterion, backdoor criterion, g-computation, Strict positivity, unmeasured confounders, non-identifiability, Positivity-aware identification algorithm, identification formulas, Causal diagrams, causal graph, semi-Markovian model
Stable Differentiable Causal Discovery (Poster)
- Authors: Achille Nazaret, Justin Hong, Elham Azizi, David Blei
- Affiliations: Department of Computer Science, Columbia University, New York, USA; Irving Institute for Cancer Dynamics, Columbia University, New York, USA, Department of Computer Science, Columbia University, New York, USA; Department of Statistics, Columbia University, New York, USA, Department of Computer Science, Columbia University, New York, USA; Irving Institute for Cancer Dynamics, Columbia University, New York, USA; Department of Biomedical Engineering, Columbia University, New York, USA
- TL;DR: This paper introduces Stable Differentiable Causal Discovery (SDCD), a new method for inferring causal relationships represented as directed acyclic graphs (DAGs), which addresses the numerical instability and scalability issues of existing methods. SDCD demonstrates improved convergence speed and accuracy, making it applicable to datasets with thousands of variables.
- Keywords: Causal Discovery, Directed Acyclic Graphs (DAGs), Differentiable Causal Discovery (DCD), Stable Differentiable Causal Discovery (SDCD), Spectral Acyclicity Constraint, Biology, Climate Science, Economics, NP-hard problem, numerical instability, scalability issues, New method (SDCD), improved convergence speed and accuracy, scalable to thousands of variables, Observational data, interventional data
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective (Poster)
- Authors: Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E Turner, Alireza Makhzani
- Affiliations: University of Toronto, Canada, Vector Institute, Canada; University of Toronto, Canada, Cambridge University, United Kingdom, Vector Institute, Canada
- TL;DR: This study investigates the effects of removing the square root from adaptive gradient methods, revealing that square-root-free methods can close the generalization gap to SGD on convolutional architectures while maintaining performance on transformers. The findings suggest new insights into adaptive methods and their role in modern training strategies.
- Keywords: adaptive gradient methods, second-order methods, deep learning, Adam, SGD, convolutional architectures, diagonal and non-diagonal adaptive methods, transformers, convolutional neural networks (CNNs), generalization gap, performance discrepancies between adaptive methods and SGD, square-root-free adaptive methods, preconditioner invariance, improved training performance
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers (Poster)
- Authors: Katherine Crowson, Stefan Baumann, Alex Birch, Tanishq Abraham, Daniel Kaplan, Enrico Shippole
- Affiliations: Independent Researcher, Florida, United States, CompVis @ LMU Munich, Germany, Birchlabs, England, United Kingdom, Stability AI, United States, realiz.ai, New York, United States
- TL;DR: The paper introduces the Hourglass Diffusion Transformer (HDiT), a novel image-generative model that efficiently scales with pixel count for high-resolution training directly in pixel-space. HDiT achieves competitive performance with existing models and sets a new state-of-the-art for diffusion models on FFHQ-10242.
- Keywords: image generation, high-resolution synthesis, Hourglass Diffusion Transformer (HDiT), Transformer architecture, convolutional U-Nets, image editing, video and audio generation, fine detail representation, training complexity, state-of-the-art performance on FFHQ-10242, competitive performance on ImageNet-2562, ImageNet, FFHQ, diffusion models, latent diffusion models (LDMs), CNN-transformer-hybrid
Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-error (Oral)
- Authors: Haoran Li, Zicheng Zhang, Wang Luo, Congying Han, Yudong Hu, Tiande Guo, Shichen Liao
- Affiliations: School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
- TL;DR: This study investigates the existence of an optimal robust policy (ORP) in deep reinforcement learning under adversarial conditions, introducing a consistency assumption of policy (CAP) and demonstrating that the Bellman optimal policy serves as the ORP. The proposed Consistent Adversarial Robust Deep Q-Network (CAR-DQN) effectively minimizes Bellman Infinity-error, showcasing superior performance across various benchmarks.
- Keywords: Adversarial robustness, Deep reinforcement learning (DRL), Bellman optimal policy, Consistent Adversarial Robust Deep Q-Network (CAR-DQN), L∞-norm, Optimal robust policy (ORP), state-adversarial robustness, adversarial attacks, Existence of deterministic and stationary ORP, minimization of Bellman Infinity-error, Markov decision process (MDP), state-adversarial paradigm, Bellman optimality equations
A Unified Framework for Learning with Nonlinear Model Classes from Arbitrary Linear Samples (Poster)
- Authors: Ben Adcock, Juan Cardenas, Nick Dexter
- Affiliations: Ann and H. J. Smead Department of Aerospace Engineering Sciences, University of Colorado Boulder, Boulder, Colorado, USA, Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada, Department of Scientific Computing, Florida State University, Tallahassee, Florida, USA
- TL;DR: This paper introduces a unified framework for learning with nonlinear model classes from arbitrary linear samples, establishing learning guarantees that relate training data to model class for effective generalization. The framework accommodates various learning problems, including compressed sensing and active learning, providing a comprehensive approach to analyzing learning challenges.
- Keywords: Learning from training data, Nonlinear model classes, Random linear measurements, Learning guarantees, Compressed sensing, Active learning, Regression, Learning guarantees, Generalization bounds, Framework for learning, Variation of model class, Hilbert spaces, Finite-dimensional subspaces
Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices (Poster)
- Authors: Nathaniel Cohen, Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas, Tomer Michaeli
- Affiliations: Mines Paris – PSL Research University, Paris, France; Technion – Israel Institute of Technology, Haifa, Israel, Technion – Israel Institute of Technology, Haifa, Israel
- TL;DR: This paper presents Slicedit, a novel zero-shot method for text-based video editing that utilizes pretrained text-to-image diffusion models to enhance temporal consistency in videos with complex nonrigid motion. The method effectively edits videos while preserving the original structure and motion, demonstrating significant advantages over existing techniques.
- Keywords: video editing, text-to-image diffusion models, diffusion models, spatiotemporal slices, video editing, image synthesis, temporal consistency, nonrigid motion, occlusions, Slicedit method, enhanced temporal consistency
The Expressive Power of Path-Based Graph Neural Networks (Poster)
- Authors: Caterina Graziani, Tamara Drucks, Fabian Jogl, Monica Bianchini, franco scarselli, Thomas Gärtner
- Affiliations: Department of Information Engineering and Mathematics, University of Siena, Siena, Italy, RUML, TU Wien, Vienna, Austria, RUML, TU Wien, Vienna, Austria; CAIML, TU Wien, Vienna, Austria
- TL;DR: This study introduces PATH-WL, a novel class of path-based graph neural networks that enhances expressive power by utilizing paths and shortest path distance information. The findings demonstrate that PATH-WL can count cycles and distinguish a broader range of graph classes compared to existing methods, establishing a new hierarchy of expressive graph neural networks.
- Keywords: graph neural networks, expressive power, path-based methods, PATH-WL, color refinement algorithms, message passing, strongly regular graphs, graph isomorphism, limitations of 1-WL, counting cycles, distinguishing non-isomorphic graphs, new hierarchy of expressive GNNs, empirical results on graph classes, Weisfeiler-Leman (1-WL), k-WL
Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning (Poster)
- Authors: Saber Malekmohammadi, Yaoliang Yu, YANG CAO
- Affiliations: School of Computer Science, University of Waterloo, Waterloo, Canada; Vector Institute, Toronto, Canada, Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan
- TL;DR: This paper presents Robust-HDP, an algorithm designed to enhance utility and convergence speed in heterogeneous differentially private federated learning systems, particularly in scenarios with untrusted servers. The method effectively estimates noise levels in clients' model updates, addressing challenges related to varying privacy requirements and dataset sizes.
- Keywords: Federated Learning, Differential Privacy, Robust-HDP, Local Differential Privacy (LDP), Central Differential Privacy (CDP), Robust PCA (RPCA), Heterogeneity in privacy requirements, noise level variation in model updates, untrusted server challenges, Improved utility and convergence speed, noise level estimation in model updates, Differential Privacy (DP), privacy parameter (ϵ), DPSGD algorithm
Environment Design for Inverse Reinforcement Learning (Oral)
- Authors: Thomas Kleine Buening, Victor Villin, Christos Dimitrakakis
- Affiliations: The Alan Turing Institute, London, UK, Université de Neuchâtel, Neuchâtel, Switzerland
- TL;DR: This study proposes a framework for adaptive environment design in Inverse Reinforcement Learning to improve sample-efficiency and robustness in learning reward functions from expert demonstrations. The authors demonstrate that intelligently selecting environments can enhance the learning process, addressing challenges related to overfitting and changes in environment dynamics.
- Keywords: Inverse Reinforcement Learning, Adaptive Environment Design, Bayesian IRL, Maximum Entropy IRL, Autonomous Decision-Making, Robotics, Low Sample-Efficiency, Overfitting to Environment Dynamics, Improved Sample-Efficiency, Robustness of Learned Rewards
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity (Oral)
- Authors: Andrew Lee, Xiaoyan Bai, Itamar Pres, Martin Wattenberg, Jonathan K. Kummerfeld, Rada Mihalcea
- Affiliations: University of Michigan, Ann Arbor, U.S.A., Harvard University, Cambridge, Massachusetts, University of Sydney, Sydney, Australia
- TL;DR: This study investigates the mechanisms by which direct preference optimization (DPO) aligns pre-trained language models to reduce toxicity. It reveals that while toxic capabilities are not removed, they are bypassed, and provides insights into how these models can be un-aligned.
- Keywords: alignment algorithms, toxicity reduction, pre-trained language models, direct preference optimization (DPO), reinforcement learning with human preferences (RLHF), proximal policy optimization (PPO), singular value decomposition (SVD), natural language processing, toxicity, undesirable behaviors in language models, mechanisms of alignment, un-aligning models, understanding model behavior, pairwise preference dataset, large language models, GPT2-medium, Llama2-7b
A Dynamic Algorithm for Weighted Submodular Cover Problem (Oral)
- Authors: Kiarash Banihashem, Samira Goudarzi, MohammadTaghi Hajiaghayi, Peyman Jabbarzade, Morteza Monemizadeh
- Affiliations: Department of Mathematics and Computer Science, TU Eindhoven, the Netherlands, Department of Computer Science, University of Maryland, MD, USA
- TL;DR: This study introduces a dynamic algorithm for the weighted submodular cover problem, focusing on maintaining an approximately optimal solution amidst element insertions and deletions. The proposed randomized algorithm achieves a (1 − O(ϵ), O(ϵ−1))-bicriteria approximation with low query complexity per update.
- Keywords: submodular cover problem, dynamic setting, optimization, randomized algorithm, bicriteria approximation, data summarization, active learning, network inference, video analysis, facility location, maintaining approximately optimal solution, low query complexity, updates to ground set, (1 − O(ϵ), O(ϵ−1))-bicriteria approximation, monotone submodular function, ground set
A Tale of Tails: Model Collapse as a Change of Scaling Laws (Poster)
- Authors: Elvis Dohmatob, Yunzhen Feng, Pu Yang, Francois Charton, Julia Kempe
- Affiliations: Meta FAIR, Center for Data Science, New York University; Courant Institute, New York University, School of Mathematical Sciences, Peking University
- TL;DR: This study investigates how the introduction of synthetic data into training datasets affects the scaling laws of AI models, potentially leading to model collapse. The authors develop a theoretical framework to analyze various decay phenomena and validate their findings through experiments with a transformer model.
- Keywords: synthetic data, model collapse, scaling laws, transformer, large language model, text generation, arithmetic tasks, loss of scaling, un-learning of skills, contamination of datasets, theoretical framework of model collapse, decay phenomena, LAION-5B, Llama2, generative AI, neural scaling laws, AIGC (AI Generated Content)
Time-Series Forecasting for Out-of-Distribution Generalization Using Invariant Learning (Poster)
- Authors: Haoxin Liu, Harshavardhan Kamarthi, Lingkai Kong, Zhiyuan Zhao, Chao Zhang, B. Aditya Prakash
- Affiliations: School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, USA
- TL;DR: This paper proposes FOIL, a model-agnostic framework that enhances time-series forecasting models' out-of-distribution generalization capabilities through invariant learning. The approach addresses challenges posed by unobserved variables and environment inference, achieving performance improvements of up to 85% across various forecasting models.
- Keywords: Time-series forecasting, Out-of-distribution generalization, Invariant learning, Invariant learning, Surrogate loss, Multi-head network, Public health, Finance, Urban computing, Out-of-distribution (OOD) generalization, Unobserved variables, Temporal distribution shifts, FOIL framework, Improved performance of TSF models
From Coarse to Fine: Enable Comprehensive Graph Self-supervised Learning with Multi-granular Semantic Ensemble (Oral)
- Authors: Qianlong Wen, Mingxuan Ju, Zhongyu Ouyang, Chuxu Zhang, Yanfang Ye
- Affiliations: Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA, School of Computer Science, Brandeis University, Waltham, MA, USA, Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA; Snap Inc., Bellevue, WA, USA
- TL;DR: This study introduces the Multi-granularity Graph Semantic Ensemble (MGSE) framework to enhance self-supervised learning in graph models by capturing multi-granular knowledge. Experimental results demonstrate that MGSE can improve the performance of existing graph SSL frameworks by up to 9.2%.
- Keywords: Self-supervised learning (SSL), Graph learning, Multi-granularity Graph Semantic Ensemble, Knowledge Distillation, Drug discovery, Protein analysis, Social network analysis, Data sparsity, Generalization across downstream tasks, Performance improvement in graph SSL frameworks, Graph neural networks (GNNs), Teacher-student model
Sparser, Better, Deeper, Stronger: Improving Static Sparse Training with Exact Orthogonal Initialization (Poster)
- Authors: Aleksandra I. Nowak, Łukasz Gniecki, Filip Szatkowski, Jacek Tabor
- Affiliations: Warsaw University of Technology, Jagiellonian University, Faculty of Mathematics and Computer Science; Jagiellonian University, Doctoral School of Exact and Natural Sciences; IDEAS NCBR, Jagiellonian University, Faculty of Mathematics and Computer Science
- TL;DR: This study introduces Exact Orthogonal Initialization (EOI) as a novel method for static sparse training, demonstrating its effectiveness in training highly sparse neural networks without the need for residual connections or normalization techniques. The findings highlight the importance of weight initialization in optimizing sparse models from scratch.
- Keywords: static sparse training, neural network compression, Exact Orthogonal Initialization (EOI), Givens rotations, sparse initialization, optimization challenges, superior effectiveness and efficiency of EOI, training highly sparse networks, MLP (Multi-Layer Perceptron), CNN (Convolutional Neural Networks)
TIC-TAC: A Framework For Improved Covariance Estimation In Deep Heteroscedastic Regression (Poster)
- Authors: Megh Shukla, Mathieu Salzmann, Alexandre Alahi
- Affiliations: Ecole Polytechnique Fédérale de Lausanne (EPFL), Ecole Polytechnique Fédérale de Lausanne (EPFL); Swiss Data Science Center (SDSC)
- TL;DR: This study presents a framework called TIC-TAC for improving covariance estimation in deep heteroscedastic regression, addressing the challenges of sub-optimal convergence. The proposed methods, TIC and TAC, enhance the accuracy of covariance predictions and facilitate better optimization outcomes.
- Keywords: deep heteroscedastic regression, covariance estimation, Taylor Induced Covariance (TIC), Task Agnostic Correlations (TAC), sub-optimal convergence, challenges in covariance estimation, improved convergence of negative log-likelihood, accurate covariance learning, synthetic datasets, real-world datasets
On The Fairness Impacts of Hardware Selection in Machine Learning (Poster)
- Authors: Sree Harsha Nelaturu, Nishaanth Kanna, Cuong Tran, Sara Hooker, Ferdinando Fioretto
- Affiliations: Cohere For AI, University of Virginia; Cohere For AI, Cohere For AI Community; Saarland University, Cohere For AI Community, Dyania Health; University of Virginia
- TL;DR: This study investigates how hardware selection impacts fairness and performance in machine learning models, revealing that different hardware can exacerbate disparities among demographic groups. The authors propose a theoretical framework to quantify these disparities and suggest strategies to mitigate hardware-induced performance imbalances.
- Keywords: fairness in machine learning, hardware selection, machine learning as-a-service, model training and deployment, performance disparities, ethical application of ML, gradient flow variations, loss surface differences, strategy for mitigating performance imbalances, theoretical framework for quantifying disparities
Practical Hamiltonian Monte Carlo on Riemannian Manifolds via Relativity Theory (Poster)
- Authors: Kai Xu, Hong Ge
- Affiliations: MIT-IBM Watson AI Lab, Cambridge MA, United States, University of Cambridge, Cambridge, United Kingdom
- TL;DR: This paper presents a method to enhance the stability of Hamiltonian Monte Carlo sampling on Riemannian manifolds by introducing position-dependent velocity norms, which effectively mitigate numerical errors in high curvature regions. The proposed approach generalizes existing techniques and offers a more robust algorithm for sampling from relativistic momentum distributions.
- Keywords: Hamiltonian Monte Carlo, Riemannian Manifolds, Numerical Stability, Hamiltonian dynamics, momentum distributions, numerical integration, Statistical physics, neuroscience, bioinformatics, social science, machine learning, Integration instability, high curvature regions, numerical errors, Position-dependent velocity norms, tractable algorithms for relativistic momentum distributions
Challenges in Training PINNs: A Loss Landscape Perspective (Oral)
- Authors: Pratik Rathore, Weimu Lei, Zachary Frangella, Lu Lu, Madeleine Udell
- Affiliations: Department of Statistics and Data Science, Yale University, New Haven, CT, USA, ICME, Stanford University, Stanford, CA, USA; Department of Management Science & Engineering, Stanford University, Stanford, CA, USA, ICME, Stanford University, Stanford, CA, USA, Department of Management Science & Engineering, Stanford University, Stanford, CA, USA, Department of Electrical Engineering, Stanford University, Stanford, CA, USA; ICME, Stanford University, Stanford, CA, USA
- TL;DR: This study investigates the challenges of training Physics-Informed Neural Networks (PINNs) by analyzing the loss landscape and proposes a novel second-order optimizer, NysNewton-CG (NNCG), which enhances performance compared to traditional methods. The findings highlight the importance of combining first- and second-order optimization techniques to effectively minimize the PINN loss function and improve solutions to complex partial differential equations.
- Keywords: Physics-Informed Neural Networks (PINNs), optimization challenges in training neural networks, Adam optimizer, L-BFGS, Adam+L-BFGS, NysNewton-CG (NNCG), Solving Partial Differential Equations (PDEs), Ill-conditioning in loss landscape, difficulties in minimizing PINN loss function, Improved optimization strategies for training PINNs, insights into loss landscape
High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise (Oral)
- Authors: Eduard Gorbunov, Abdurakhmon Sadiev, Marina Danilova, Samuel Horváth, Gauthier Gidel, Pavel Dvurechenskii, Alexander Gasnikov, Peter Richtarik
- Affiliations: King Abdullah University of Science and Technology, KSA, University Innopolis, Russia; Ivannikov Institute for System Programming RAS, Russia; Skolkovo Institute of Science and Technology, Russia, Université de Montréal and Mila, Canada; Canada CIFAR AI Chair, Weierstrass Institute for Applied Analysis and Stochastics, Germany, Mohamed bin Zayed University of Artificial Intelligence, UAE, Moscow Institute of Physics and Technology, Russia
- TL;DR: This paper presents new stochastic methods for composite and distributed optimization that utilize gradient clipping to achieve high-probability convergence results, addressing limitations in existing methods under heavy-tailed noise. The findings contribute to the theoretical understanding and practical application of optimization techniques in machine learning contexts.
- Keywords: high-probability convergence, stochastic optimization, composite optimization, distributed optimization, Prox-SGD, Parallel SGD, gradient clipping, machine learning, federated learning, heavy-tailed noise, convergence issues, optimization under weak assumptions, new stochastic methods, high-probability convergence results, methods for variational inequalities, variational inequalities, strongly convex problems
Adversarial Attacks on Combinatorial Multi-Armed Bandits (Poster)
- Authors: Rishab Balasubramanian, Jiawei Li, Tadepalli Prasad, Huazheng Wang, Qingyun Wu, Haoyu Zhao
- Affiliations: Princeton University, Pennsylvania State University, Oregon State University, University of Illinois Urbana-Champaign
- TL;DR: This study investigates reward poisoning attacks on Combinatorial Multi-armed Bandits (CMAB) and establishes conditions for their attackability, revealing that the attackability depends on whether the bandit instance is known to the adversary. The findings indicate that adversarial attacks on CMAB are complex and lack a universal attack strategy due to the unknown environment.
- Keywords: Adversarial attacks, Combinatorial Multi-armed Bandits (CMAB), Reward poisoning attacks, Attack algorithm, Online advertising, Recommendation, Ranking, Influence maximization, Attackability of CMAB, Vulnerability and robustness to poisoning attacks, Sufficient and necessary condition for attackability, Validation through experiments, Multi-armed bandits (MAB), Semi-bandit feedback
Safe Exploration in Dose Finding Clinical Trials with Heterogeneous Participants (Poster)
- Authors: Isabel Chien, Wessel Bruinsma, Javier Gonzalez, Richard E Turner
- Affiliations: University of Cambridge, Cambridge, UK, Microsoft Research AI for Science, Microsoft Research
- TL;DR: The study presents SAFE-T, an adaptive dose-finding procedure that prioritizes participant safety and efficacy while accommodating heterogeneous populations. It demonstrates improved performance over traditional methods in identifying optimal drug doses, addressing ethical concerns in clinical trials.
- Keywords: dose-finding clinical trials, participant heterogeneity, adaptive trial methods, Safe Allocation for Exploration of Treatments (SAFE-T), non-parametric multi-output Gaussian process models, Bayesian optimization, drug development, clinical trials, participant safety, participant benefit, toxicity, efficacy, ethical concerns, societal health inequalities, accurate final dose recommendations, theoretical guarantees for safety constraints
Parameterized Physics-informed Neural Networks for Parameterized PDEs (Oral)
- Authors: Woojin Cho, Minju Jo, Haksoo Lim, Kookjin Lee, Dongeun Lee, Sanghyun Hong, Noseong Park
- Affiliations: Oregon State University, Yonsei University; Arizona State University, Arizona State University, Yonsei University, LG CNS, Texas A&M University-Commerce, KAIST
- TL;DR: This paper introduces parameterized physics-informed neural networks (P2INNs) to efficiently model solutions of parameterized PDEs, significantly improving accuracy and parameter efficiency compared to traditional PINNs. The findings demonstrate P2INNs' effectiveness in overcoming known limitations of existing methods in scientific machine learning applications.
- Keywords: Physics-informed neural networks, Scientific machine learning, Parameterized physics-informed neural networks (P2INNs), Neural networks, Design optimization, Uncertainty quantification, Repetitive and time-consuming training of PINNs, Evaluation of PDE solutions in parameter space, Improved accuracy and parameter efficiency of P2INNs, Overcoming failure modes of PINNs, Partial differential equations (PDEs), Latent representation, Governing physical laws
Position: The Causal Revolution Needs Scientific Pragmatism (Poster)
- Authors: Joshua Loftus
- Affiliations: Department of Statistics, London School of Economics, London, UK
- TL;DR: The paper argues for the adoption of scientific pragmatism to facilitate the progress of causal models in empirical sciences, which are currently hindered by conflicting academic perspectives. It emphasizes the importance of using causal models as tools for hypothetical reasoning to unlock their potential benefits.
- Keywords: Causal models, Scientific pragmatism, Knowledge generation, Causal models, Predictive models, Empirical sciences, Health sciences, Social sciences, Economics, Computer science, Statistics, Stalled progress in causal methods, Scientific perfectionism, System-centric inductive biases, Structural causal models (SCMs), Directed acyclic graphs (DAGs)
Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms (Poster)
- Authors: Yichen Li, Chicheng Zhang
- Affiliations: Department of Computer Science, University of Arizona, Tucson, AZ, USA
- TL;DR: This study introduces new algorithms for interactive imitation learning that allow learners to query experts for action annotations, aiming to learn competitive policies with minimal expert input. The proposed methods, MFTPL-P and BOOTSTRAP-DAGGER, demonstrate significant improvements over existing imitation learning approaches in continuous control tasks.
- Keywords: Interactive Imitation Learning, Agnostic Learning, MFTPL-P (Mixed Follow the Perturbed Leader with Poisson perturbations), BOOTSTRAP-DAGGER, Continuous Control Tasks, Covariate Shift, Data Collection Methods, Oracle-efficient algorithms, Finite-sample guarantees
PruNeRF: Segment-Centric Dataset Pruning via 3D Spatial Consistency (Poster)
- Authors: Yeonsung Jung, Heecheol Yun, Joonhyung Park, Jin-Hwa Kim, Eunho Yang
- Affiliations: Graduate School of AI, Korea Advanced Institute of Science and Technology (KAIST), Republic of Korea; AITRICS, Republic of Korea, Graduate School of AI, Korea Advanced Institute of Science and Technology (KAIST), Republic of Korea, NAVER AI Lab, Republic of Korea; AI Institute of Seoul National University, Republic of Korea
- TL;DR: This paper introduces PruNeRF, a segment-centric dataset pruning framework that effectively identifies and removes distractors in Neural Radiance Fields (NeRF) training images. The proposed method demonstrates improved robustness against distractors, addressing a significant challenge in 3D scene learning.
- Keywords: Neural Radiance Fields (NeRF), 3D scene learning, dataset pruning, Influence Functions, depth-based reprojection, segmentation, 3D scene synthesis, image processing, Vulnerability to distractors, dataset curation challenges, inconsistency in supervision, PruNeRF framework, improved robustness against distractors
Bridging discrete and continuous state spaces: Exploring the Ehrenfest process in time-continuous diffusion models (Poster)
- Authors: Ludwig Winkler, Lorenz Richter, Manfred Opper
- Affiliations: Technical University of Berlin, Zuse Institute Berlin; dida Datenschmiede GmbH, Technical University of Berlin; University of Birmingham; University of Potsdam
- TL;DR: This study explores the relationship between time-continuous Markov jump processes on discrete state spaces and their continuous-state diffusion counterparts, particularly focusing on the Ehrenfest process. The authors propose a new algorithm for training the time-reversal of these processes, demonstrating its effectiveness through numerical experiments.
- Keywords: generative modeling, stochastic processes, time-reversal, Markov jump processes, Ornstein-Uhlenbeck process, stochastic differential equations (SDEs), denoising score matching, discrete data, text, images, graph structures, biological data, bridging discrete and continuous state spaces, improved convergence, new algorithms for training time-reversal of Markov jump processes, Ehrenfest process, score functions, rate functions
Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them) (Poster)
- Authors: Drew Prinster, Samuel Stanton, Anqi Liu, Suchi Saria
- Affiliations: Department of Computer Science, Johns Hopkins University, Baltimore, MD, U.S.A., Prescient Design, Genentech, New York City, NY, U.S.A.
- TL;DR: This paper demonstrates that conformal prediction can be extended to any joint data distribution, addressing the limitations of previous methods that assumed exchangeability. The authors propose specific algorithms for practical applications in AI/ML contexts, particularly in scenarios involving covariate shifts due to active learning and black-box optimization.
- Keywords: uncertainty quantification, conformal prediction, AI/ML risk management, conformal prediction, weighted exchangeability, black-box optimization, active learning, data distribution shifts, uncertainty estimation challenges, algorithms for covariate shifts, empirical evaluation methods, feedback-loop shifts, quasi-exchangeability
Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation (Poster)
- Authors: Michelle Pan, Mariah Schrum, Vivek Myers, Erdem Biyik, Anca Dragan
- Affiliations: Department of Computer Science, University of Southern California, Department of Electrical Engineering and Computer Sciences, UC Berkeley
- TL;DR: This study presents the Coprocessor Actor Critic, a model-based reinforcement learning approach for adaptive brain stimulation aimed at treating neurological conditions like Parkinson's disease and post-stroke motor deficits. The proposed method demonstrates improved sample efficiency and task success compared to traditional methods, highlighting its potential for personalized rehabilitation strategies.
- Keywords: adaptive brain stimulation, neurological conditions, brain-computer interface, model-based reinforcement learning (MBRL), coprocessor policy learning, rehabilitation, stroke recovery, motor control, patient heterogeneity, closed-loop coprocessor policies, motor deficits, improved sample efficiency, task success, individualized coprocessor policies
Learning to Continually Learn with the Bayesian Principle (Poster)
- Authors: Soochan Lee, Hyeonseong Jeon, Jaehyeon Son, Gunhee Kim
- Affiliations: Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
- TL;DR: This study proposes a novel meta-continual learning framework that leverages sequential Bayesian updates to mitigate catastrophic forgetting in neural networks while maintaining their representational power. The approach demonstrates improved performance and scalability, making it applicable across various domains.
- Keywords: continual learning, meta-learning, sequential Bayesian update, stochastic gradient descent, catastrophic forgetting, NP-hard problem, meta-continual learning framework, improved performance, scalability
Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data (Poster)
- Authors: Xuran Meng, Difan Zou, Yuan Cao
- Affiliations: Department of Statistics and Actuarial Science, University of Hong Kong, Hong Kong; Department of Mathematics, University of Hong Kong, Hong Kong, Department of Computer Science, University of Hong Kong, Hong Kong, Department of Statistics and Actuarial Science, University of Hong Kong, Hong Kong
- TL;DR: This study investigates the benign overfitting phenomenon in two-layer ReLU convolutional neural networks for XOR-type classification tasks, demonstrating that these networks can achieve near Bayes-optimal accuracy even in the presence of label-flipping noise. The findings establish conditions under which over-parameterized models can effectively learn complex patterns that linear models cannot.
- Keywords: benign overfitting, deep learning, over-parameterization, two-layer ReLU convolutional neural networks (CNNs), gradient descent, binary classification, XOR-type classification tasks, overfitting, label-flipping noise, sample complexity, signal-to-noise ratio, near Bayes-optimal accuracy, upper and lower bounds of test error
Repoformer: Selective Retrieval for Repository-Level Code Completion (Oral)
- Authors: Di Wu, Wasi Ahmad, Dejiao Zhang, Murali Krishna Ramanathan, Xiaofei Ma
- Affiliations: University of California Los Angeles, AWS AI Labs
- TL;DR: This paper proposes a selective retrieval framework for repository-level code completion that enhances efficiency and performance by avoiding unnecessary retrievals. The framework achieves state-of-the-art results and demonstrates significant speed improvements while maintaining output quality.
- Keywords: repository-level code completion, retrieval-augmented generation (RAG), self-supervised learning, selective retrieval, inefficiency in retrieval, performance degradation from irrelevant information, state-of-the-art performance, 70% inference speedup, RepoEval, CrossCodeEval, CrossCodeLongEval, code language models (code LMs), cross-file contexts
Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization (Spotlight Poster)
- Authors: Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal
- Affiliations: School of IE and School of ECE, Purdue University, West Lafayette, IN, U.S.A, Department of Computer Science, University of Central Florida, Department of Statistics, Purdue University, West Lafayette, IN, U.S.A, Department of Computer Science, KAUST
- TL;DR: This study provides a comprehensive theoretical analysis of Actor-Critic algorithms, addressing practical aspects such as multi-layer neural network parametrization and Markovian sampling. The authors establish global convergence sample complexity bounds, highlighting the importance of aligning theoretical models with real-world applications.
- Keywords: Actor-Critic algorithms, reinforcement learning, Multi-layer neural network parametrization, Markovian sampling, Games, network scheduling, robotics, autonomous driving, video streaming, Gap between theoretical analysis and practical implementations, global convergence, Global convergence sample complexity bounds, weak gradient domination property, MMCLG criteria
SqueezeLLM: Dense-and-Sparse Quantization (Poster)
- Authors: Sehoon Kim, Coleman Hooper, Amir Gholaminejad, Zhen Dong, Xiuyu Li, Sheng Shen, Michael Mahoney, EECS Kurt Keutzer
- Affiliations: UC Berkeley; ICSI; LBNL, UC Berkeley
- TL;DR: This study introduces SqueezeLLM, a post-training quantization framework that enables lossless compression of generative large language models to ultra-low precisions of up to 3-bit, significantly improving performance and reducing memory bandwidth bottlenecks. The framework achieves up to 2.3× speedup on an A6000 GPU compared to the FP16 baseline.
- Keywords: Generative Large Language Models, Model Deployment, Quantization, Sensitivity-based Non-uniform Quantization, Dense-and-Sparse Decomposition, Memory Bandwidth Bottleneck, Resource Requirements for Inference, Lossless Compression, 3-bit Quantization, Performance Improvement, LLaMA Models, A6000 GPU
Disparate Impact on Group Accuracy of Linearization for Private Inference (Poster)
- Authors: Saswat Das, Marco Romanelli, Ferdinando Fioretto
- Affiliations: New York University, New York, NY, USA, University of Virginia, Charlottesville, VA, USA
- TL;DR: This study investigates the impact of linearizing ReLU activations in neural networks for private inference, revealing that while it reduces computational costs, it disproportionately harms the accuracy of minority groups. The authors propose a fine-tuning strategy to mitigate these fairness issues.
- Keywords: Private Inference, Fairness in Machine Learning, Linearization of Non-linear Activations, ReLU Activations, Machine Learning, Facial Recognition, Computational Challenge of Cryptographic Computations, Disparate Impact on Accuracy for Minority Groups, Mitigation Strategies for Accuracy Loss, Mathematical Interpretation of Decision Boundaries, UTKFaces, ResNet18
Enabling Uncertainty Estimation in Iterative Neural Networks (Poster)
- Authors: Nikita Durasov, Doruk Oner, Jonathan Donier, Hieu Le, EPFL Pascal Fua
- Affiliations: Computer Vision Laboratory, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, Neural Concept SA, Lausanne, Switzerland
- TL;DR: This paper presents a method for uncertainty estimation in iterative neural networks by leveraging the convergence rate of successive outputs as a proxy for accuracy. The approach achieves state-of-the-art results with lower computational costs compared to traditional methods like ensembles, demonstrating practical applications in road detection and aerodynamic property estimation.
- Keywords: uncertainty estimation, iterative neural networks, road detection, aerodynamic properties estimation, convergence rate, accuracy of predictions, state-of-the-art uncertainty estimates, computational cost reduction
SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals (Poster)
- Authors: Rahul Thapa, Bryan He, Magnus Ruud Kjaer, Hyatt Moore, Gauri Ganjoo, Emmanuel Mignot, James Zou
- Affiliations: Department of Psychiatry and Behavioral Sciences, Stanford University, Department of Health Technology, Technical University of Denmark, Department of Biomedical Data Science, Stanford University; Department of Computer Science, Stanford University; Co-senior authors, Department of Psychiatry and Behavioral Sciences, Stanford University; Co-senior authors, Department of Biomedical Data Science, Stanford University, Department of Computer Science, Stanford University
- TL;DR: This study introduces SleepFM, a multi-modal foundation model for sleep analysis that leverages a large polysomnography dataset to improve sleep stage classification and sleep disordered breathing detection through a novel contrastive learning approach. The model outperforms traditional methods, demonstrating the importance of integrating diverse physiological signals for comprehensive sleep health assessment.
- Keywords: multi-modal representation learning, sleep analysis, contrastive learning, logistic regression, convolutional neural networks (CNN), sleep disorder detection, sleep stage classification, manual visual inspection of sleep data, reliance on labeled data, integration of diverse physiological signals, SleepFM model, improved performance metrics (AUROC, AUPRC), polysomnography dataset, 14,000 participants, 100,000 hours of recordings, Brain Activity Signals (BAS), sleep disordered breathing (SDB)
An LLM Compiler for Parallel Function Calling (Poster)
- Authors: Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael Mahoney, EECS Kurt Keutzer, Amir Gholaminejad
- Affiliations: UC Berkeley; ICSI; LBNL, UC Berkeley; ICSI, UC Berkeley
- TL;DR: This paper introduces LLMCompiler, a system designed to execute function calls in parallel, addressing the inefficiencies of sequential function calling in Large Language Models. The results demonstrate significant improvements in latency, cost, and accuracy compared to existing methods like ReAct.
- Keywords: Large Language Models, Function Calling, Parallel Execution, LLMCompiler, Function Calling Planner, Task Fetching Unit, Executor, High latency, Cost inefficiency, Inaccurate behavior in sequential function calling, Latency speedup, Cost savings, Accuracy improvement, ReAct, Tool Calling, Classical Compilers
Boundary Exploration for Bayesian Optimization With Unknown Physical Constraints (Poster)
- Authors: Yunsheng Tian, Ane Zuniga, Xinwei Zhang, Johannes P. Dürholt, Payel Das, Jie Chen, Wojciech Matusik, Mina Konakovic Lukovic
- Affiliations: MIT CSAIL, USA, Evonik Operations GmbH, Germany, MIT-IBM Watson AI Lab, IBM Research, USA
- TL;DR: This paper presents BE-CBO, a novel Bayesian optimization method designed to efficiently explore the boundary between feasible and infeasible designs in the presence of unknown physical constraints. The method demonstrates superior performance compared to existing techniques through extensive experiments on both synthetic and real-world benchmarks.
- Keywords: Bayesian optimization, black-box functions, unknown constraints, BE-CBO (Bayesian Exploration-Constrained Bayesian Optimization), ensemble of neural networks, Gaussian Processes, Engineering design, materials science, formulation development, Unknown physical constraints, optimization of feasible and infeasible regions, New optimization method (BE-CBO), improved performance in boundary exploration
Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent (Poster)
- Authors: Yingru Li, Jiawei Xu, Lei Han, Zhi-Quan Luo
- Affiliations: The Chinese University of Hong Kong, Shenzhen, Tencent AI and Robotics X, The Chinese University of Hong Kong, Shenzhen; Shenzhen Research Institute of Big Data
- TL;DR: This paper introduces HyperAgent, a novel reinforcement learning algorithm that efficiently approximates posteriors related to optimal action-value functions, demonstrating significant performance improvements in large-scale benchmarks. The algorithm achieves logarithmic computational complexity and sublinear regret, addressing key challenges in practical RL deployment.
- Keywords: reinforcement learning, hypermodel framework, HyperAgent, Q⋆ function, DDQN, Deep Sea exploration, Atari benchmarks, large state spaces, computational complexity, data efficiency, exploration vs. exploitation, efficient incremental approximation, sublinear regret, logarithmic per-step computational complexity
Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach (Poster)
- Authors: Anton Plaksin, Vitaly Kalev
- Affiliations: Yandex, Moscow, Russia, IMM UB RAS, Yekaterinburg, Russia
- TL;DR: This paper introduces a framework for Robust Reinforcement Learning using positional differential game theory, demonstrating that a shared Q-function can effectively address uncertainties in agent policies. The proposed Isaacs Deep Q-Network algorithms outperform existing RRL and Multi-Agent RL methods in various environments.
- Keywords: Robust Reinforcement Learning, Positional Differential Games, Deep Q-Learning, Q-function, Minimax and Maximin Bellman equations, Real-world applications, Control systems, Uncertainty, Disturbances, Non-stationarity, Robust policies, Isaacs Deep Q-Network algorithms, Centralized Q-learning approach, Adversarial agent
Leveraging Attractor Dynamics in Spatial Navigation for Better Language Parsing (Spotlight Poster)
- Authors: Xiaolong Zou, Xingxing Cao, Xiaojiao Yang, Bo Hong
- Affiliations: Qiyuan Lab, Beijing, China
- TL;DR: This study investigates the shared computational mechanisms of the hippocampal formation in spatial navigation and language comprehension by developing a prefrontal-hippocampal-entorhinal model (PHE-trinity). The findings suggest that attractor dynamics can enhance learning efficiency and generalization in language parsing tasks, revealing insights into the dynamic mechanisms underlying these cognitive functions.
- Keywords: spatial navigation, language comprehension, hippocampal formation, modular continuous attractor neural network, grid network, language command parsing, relationship between spatial navigation and language processing, systematic generalization, efficient learning, dynamic mechanism for syntactic structure representation, SCAN dataset
Infinite-Horizon Distributionally Robust Regret-Optimal Control (Poster)
- Authors: Taylan Kargin, Joudi Hajar, Vikrant Malik, Babak Hassibi
- Affiliations: California Institute of Technology
- TL;DR: This study focuses on infinite-horizon distributionally robust control of linear systems with quadratic costs, aiming to minimize the worst-case expected regret compared to a non-causal policy. The authors develop an efficient algorithm for optimal control and a method for constructing a near-optimal state-space controller, avoiding the computational burden of traditional approaches.
- Keywords: distributionally robust control, regret-optimal control, linear systems, Wasserstein-2 ambiguity set, frequency-domain algorithm, convex optimization, uncertainty in control systems, performance decline due to disturbances, near-optimal state-space controller, worst-case expected regret minimization, H∞-norm, causal policy, non-causal policy, bounded energy disturbances
Irregular Multivariate Time Series Forecasting: A Transformable Patching Graph Neural Networks Approach (Poster)
- Authors: Weijia Zhang, Chenlong Yin, Hao Liu, Xiaofang Zhou, Hui Xiong
- Affiliations: The Hong Kong University of Science and Technology; The Hong Kong University of Science and Technology (Guangzhou), The Hong Kong University of Science and Technology (Guangzhou), The Hong Kong University of Science and Technology
- TL;DR: This study introduces Transformable Patching Graph Neural Networks (T-PATCHGNN) for forecasting Irregular Multivariate Time Series (IMTS), addressing challenges related to irregular sampling and inter-time series correlations. The proposed method demonstrates superior performance on a comprehensive IMTS forecasting benchmark across various scientific domains.
- Keywords: Irregular Multivariate Time Series (IMTS) forecasting, time series analysis, Transformable Patching Graph Neural Networks (T-PATCHGNN), time-adaptive graph neural networks, Healthcare, biomechanics, climate science, astronomy, Irregular sampling intervals, missing data, intra-time series dependency modeling, inter-time series correlation modeling, New forecasting method (T-PATCHGNN), improved modeling of dynamic inter-time series correlation
Unbiased Multi-Label Learning from Crowdsourced Annotations (Poster)
- Authors: Mingxuan Xia, Zenan Huang, Runze Wu, Gengyu Lyu, Junbo Zhao, Gang Chen, Haobo Wang
- Affiliations: Fuxi AI Lab, NetEase Inc., Hangzhou, China, Faculty of Information Technology, Beijing University of Technology, Beijing, China, State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China, School of Software Technology, Zhejiang University, Ningbo, China; State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China
- TL;DR: This study introduces an unbiased risk estimator for Crowdsourced Multi-Label Learning (CMLL) that addresses the challenges of unreliable labels from annotators. The proposed method enhances performance by leveraging label correlations and provides a theoretical foundation for generalization error bounds.
- Keywords: Crowdsourced Multi-Label Learning (CMLL), Multi-Label Learning (MLL), Unbiased risk estimator, Decoupled autoencoder framework, Image recognition, Document classification, Protein function prediction, Unreliable labels from annotators, Data annotation challenges, Generalization error bound, Improved performance through label correlation exploitation, Crowdsourced transition matrices, True label inferring, Crowdsourced Multi-Label Inference (CMLI)
A Geometric Explanation of the Likelihood OOD Detection Paradox (Poster)
- Authors: Hamidreza Kamkari, Brendan Ross, Jesse Cresswell, Anthony Caterini, Rahul G. Krishnan, Gabriel Loaiza-Ganem
- Affiliations: Layer 6 AI, Layer 6 AI; University of Toronto; Vector Institute, University of Toronto; Vector Institute
- TL;DR: This study investigates the paradox of likelihood-based deep generative models assigning higher likelihoods to out-of-distribution data from simpler sources while failing to generate such data. The authors propose a new method for OOD detection that combines likelihoods with local intrinsic dimension estimates, achieving state-of-the-art results.
- Keywords: Out-of-distribution (OOD) detection, deep generative models (DGMs), Likelihood-based models, normalizing flows (NFs), score-based diffusion models, Autonomous driving, medical diagnostics, finance, medical imaging, OOD detection reliability, paradox of high likelihoods for OOD data, low probability mass, Method for OOD detection using likelihoods and local intrinsic dimension (LID) estimates
Position: Evolving AI Collectives Enhance Human Diversity and Enable Self-Regulation (Poster)
- Authors: Shiyang Lai, Yujin Potter, Junsol Kim, Richard Zhuang, Dawn Song, James Evans
- Affiliations: Department of Electrical Engineering and Computer Sciences, UC Berkeley, Department of Sociology & Knowledge Lab, University of Chicago; Santa Fe Institute, Department of Sociology & Knowledge Lab, University of Chicago
- TL;DR: This study explores the formation of free-formed AI collectives that can enhance human diversity and self-regulation in online environments. It highlights the potential for these decentralized AI systems to reduce toxic behavior and suggests avenues for further research on AI moderation and ethical considerations.
- Keywords: AI collectives, human diversity, self-regulation, Large language models (LLMs), decentralized AI subjectivities, Online environments, AI moderation, Toxic behavior online, anti-social behavior, Emergent AI collectives, cross-moderation opportunities
Accelerating Heterogeneous Federated Learning with Closed-form Classifiers (Poster)
- Authors: Eros Fanì, Raffaello Camoriano, Barbara Caputo, Marco Ciccone
- Affiliations: Department of Computing and Control Engineering, Polytechnic University of Turin, Italy, Department of Computing and Control Engineering, Polytechnic University of Turin, Italy; Istituto Italiano di Tecnologia, Genoa, Italy; CINI Consortium, Rome, Italy, Department of Computing and Control Engineering, Polytechnic University of Turin, Italy; Istituto Italiano di Tecnologia, Genoa, Italy
- TL;DR: The study introduces Federated Recursive Ridge Regression (FED3R) to address challenges in Federated Learning caused by statistical heterogeneity, demonstrating significant improvements in convergence speed and resource efficiency. The method is particularly effective in cross-device scenarios and can be fine-tuned with existing FL algorithms.
- Keywords: Federated Learning, Statistical Heterogeneity, Ridge Regression, Closed-form Classifiers, Cross-device Scenarios, Client Drift, Biased Local Solutions, Convergence Speed, Data Distribution Heterogeneity, Federated Recursive Ridge Regression (FED3R), FED3R with Fine-Tuning (FED3R+FT)
Learning Scale-Aware Spatio-temporal Implicit Representation for Event-based Motion Deblurring (Poster)
- Authors: Wei Yu, Jianing Li, Shengping Zhang, Xiangyang Ji
- Affiliations: School of Computer Science and Technology, Harbin Institute of Technology, Weihai, China, School of Computer Science, Peking University, Beijing, China, Department of Automation, Tsinghua University, Beijing, China
- TL;DR: This study introduces a Scale-Aware Spatio-temporal Network (SASNet) for effectively restoring blurred images using event streams at arbitrary scales, addressing the challenges posed by unknown spatial and temporal scales. The proposed method outperforms existing techniques, particularly in high-speed motion scenarios, and is validated on a newly created high-resolution dataset.
- Keywords: Event-based motion deblurring, computational photography, Scale-Aware Spatio-temporal Network (SASNet), Spatial Implicit Representation Module (SIRM), Temporal Implicit Representation Module (TIRM), Image restoration, robotics, autonomous vehicles, Unknown scales of images and events, motion blur, Improved deblurring performance, high-resolution hybrid dataset (H2D), High-resolution Hybrid Deblur (H2D) dataset, GoPro dataset, Event Vision Sensors (EVS), CMOS Image Sensors (CIS)
Why Do You Grok? A Theoretical Analysis on Grokking Modular Addition (Poster)
- Authors: Mohamad Amin Mohamadi, Zhiyuan Li, Lei Wu, Danica J Sutherland
- Affiliations: Computer Science Department, University of British Columbia; Alberta Machine Intelligence Institute, Toyota Technological Institute at Chicago, School of Mathematical Sciences, Peking University, Toyota Technological Institute at Chicago; Computer Science Department, University of British Columbia
- TL;DR: This study provides a theoretical explanation for the "grokking" phenomenon in neural networks, demonstrating that models can generalize after initially overfitting, particularly in the context of modular addition. The findings suggest a transition from kernel-like behavior to more complex dynamics in gradient descent as a key factor in achieving generalization.
- Keywords: grokking, generalization patterns, overfitting, gradient descent, one-hidden-layer quadratic networks, modular addition, algorithmic tasks, model generalization, population error, representation learning, theoretical explanation of grokking, empirical evidence of network behavior, kernel regime, rich regime, permutation-equivariant model
An Intrinsic Vector Heat Network (Poster)
- Authors: Alexander Gao, Maurice Chu, Mubbasir Kapadia, Ming Lin, Hsueh-Ti Derek Liu
- Affiliations: Roblox Core AI, Department of Computer Science, University of Maryland, College Park, USA, Roblox Research; Department of Computer Science, University of Maryland, College Park, USA
- TL;DR: This paper presents a novel neural network architecture for learning tangent vector fields on manifold surfaces in 3D, utilizing a trainable vector heat diffusion module to maintain essential invariances. The proposed method is validated on triangle meshes and demonstrates effectiveness in quadrilateral mesh generation.
- Keywords: Tangent vector fields, Neural network architecture, Manifold surfaces, Vector heat diffusion module, Vector-valued neurons, Quadrilateral mesh generation, Scientific computation, Robotic navigation, Invariance to rigid motion, Isometric deformation, Choice of local tangent bases, Novel architecture for learning tangent vector fields, Empirical validation of invariant properties, Triangle meshes, Riemannian manifolds, Scalar-valued architectures, Geometric deep learning
Simplicity Bias via Global Convergence of Sharpness Minimization (Poster)
- Authors: Khashayar Gatmiry, Zhiyuan Li, Sashank J. Reddi, Stefanie Jegelka
- Affiliations: Google Research, Toyota Technological Institute at Chicago, Massachusetts Institute of Technology
- TL;DR: This study investigates the simplicity bias in neural networks trained with label noise SGD, demonstrating that it converges to a model replicating a single linear feature across all neurons, resulting in a low-rank feature matrix. The findings highlight a connection between sharpness minimization and the geometry of the loss landscape, contributing to the understanding of generalization in neural networks.
- Keywords: simplicity bias, generalization ability of neural networks, stochastic gradient descent (SGD), label noise SGD, sharpness minimization, low complexity, high-dimensional training data, loss landscape, convergence to low-rank feature matrix, local geodesic convexity of Hessian trace, two-layer neural networks, rank one feature matrix
Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning (Poster)
- Authors: Kakei Yamamoto, Kazusato Oko, Zhuoran Yang, Taiji Suzuki
- Affiliations: The University of Tokyo, Tokyo, Japan; Center for Advanced Intelligence Project, RIKEN, Yale University, New Haven, CT, MIT, Cambridge, MA
- TL;DR: This study introduces the mean-field Langevin TD learning and policy gradient methods to enhance feature learning in deep reinforcement learning, demonstrating linear convergence to the globally optimal policy and accurate value function identification. The findings contribute to a deeper understanding of neural reinforcement learning beyond traditional lazy training approaches.
- Keywords: deep reinforcement learning, optimal policy determination, mean-field Langevin TD learning (MFLTD), mean-field Langevin policy gradient (MFLPG), policy gradient, temporal-difference (TD) learning, Wasserstein gradient flows, nonconvexity of expected total reward, bias of semi-gradient optimization, challenges in neural network optimization, linear convergence towards globally optimal policy, accurate identification of true value function, actor-critic method, Kullback-Leibler divergence
Revisiting Context Aggregation for Image Matting (Poster)
- Authors: Qinglin Liu, Xiaoqian Lv, Quanling Meng, Zonglin Li, Xiangyuan Lan, Shuo Yang, Shengping Zhang, Liqiang Nie
- Affiliations: School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China, Department of Computer Science, The University of Hong Kong, Hong Kong, China, School of Computer Science and Technology, Harbin Institute of Technology, Weihai, China, School of Computer Science and Technology, Harbin Institute of Technology, Weihai, China; Peng Cheng Laboratory, Shenzhen, China
- TL;DR: This study revisits context aggregation mechanisms in image matting, revealing that a basic encoder-decoder network can achieve superior performance without complex context aggregation modules. The proposed AEMatter network, utilizing a Hybrid-Transformer backbone and large image training strategy, significantly outperforms existing state-of-the-art methods.
- Keywords: Image Matting, Context Aggregation, Encoder-Decoder Network, Hybrid-Transformer, Appearance-Enhanced Axis-Wise Learning (AEAL), Image Editing, Film Post-Production, Context Scale Shift, Matting Performance Degradation, AEMatter Network, Large Image Training Strategy, Five Popular Matting Datasets
Robustness of Nonlinear Representation Learning (Oral)
- Authors: Simon Buchholz, Bernhard Schölkopf
- Affiliations: Max Planck Institute for Intelligent Systems, Tübingen, Germany; Tübingen AI Center, Tübingen, Germany; ELLIS Institute, Tübingen, Germany, Max Planck Institute for Intelligent Systems, Tübingen, Germany; Tübingen AI Center, Tübingen, Germany
- TL;DR: This study investigates the robustness of nonlinear representation learning in slightly misspecified settings, focusing on approximate identifiability in Independent Component Analysis (ICA) with nearly isometric mixing functions. The findings suggest that the mixing matrix and independent components can be approximately recovered, which has significant implications for unsupervised representation learning in real-world data.
- Keywords: Unsupervised representation learning, Robustness, Causal Representation Learning, Independent Component Analysis (ICA), Nonlinear representation learning, Misspecification, Identifiability, Latent variable identification, Approximate identifiability results, Recovery of mixing matrix, Local isometry, Mixing functions
Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary (Poster)
- Authors: Shuo Yang, Zhe Cao, Sheng Guo, Ruiheng Zhang, Ping Luo, Shengping Zhang, Liqiang Nie
- Affiliations: Harbin Institute of Technology, China, MYBank, Ant Group, China, Beijing Institute of Technology, China, The University of Hong Kong, Hong Kong; Shanghai AI Lab, China, The University of Hong Kong, Hong Kong
- TL;DR: This study introduces a novel coreset selection method that reconstructs the decision boundary of deep neural networks, achieving a 50% data pruning rate on the ImageNet-1K dataset with minimal accuracy loss. The findings highlight the method's effectiveness and its potential for cross-architecture transferability in model training.
- Keywords: Coreset selection, Generalization capability, Deep neural networks, Geometry-based methods, Coreset construction, Image recognition, Data sparsity, Computational cost, Generalization error, 50% data pruning rate, Cross-architecture transferability, ImageNet-1K
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling (Poster)
- Authors: Mingze Wang, Zeping Min, Lei Wu
- Affiliations: School of Mathematical Sciences, Peking University, Beijing, China; Center for Machine Learning Research, Peking University, Beijing, China, School of Mathematical Sciences, Peking University, Beijing, China
- TL;DR: This study introduces a novel algorithm, Progressive Rescaling Gradient Descent (PRGD), which maximizes the margin of linearly separable data at an exponential rate, significantly outperforming existing gradient-based algorithms that achieve only polynomial rates. The findings suggest that PRGD can enhance generalization performance in both linearly separable and non-separable datasets.
- Keywords: margin maximization, gradient-based algorithms, linearly separable data, Progressive Rescaling Gradient Descent (PRGD), gradient descent (GD), normalized gradient descent (NGD), deep learning, machine learning, margin maximization bias, inefficiency in GD/NGD, exponential rate of margin maximization, theoretical findings, generalization performance enhancement, ℓ2-margin, centripetal velocity
Conformal Prediction with Learned Features (Poster)
- Authors: Shayan Kiyani, George J. Pappas, Hamed Hassani
- Affiliations: Electrical and Systems Engineering Department, University of Pennsylvania, USA
- TL;DR: This paper introduces Partition Learning Conformal Prediction (PLCP), a framework designed to enhance the conditional validity of prediction sets by learning uncertainty-guided features from calibration data. Experimental results demonstrate that PLCP outperforms existing methods in terms of coverage and length across various datasets.
- Keywords: conformal prediction, conditional guarantees, Partition Learning Conformal Prediction (PLCP), alternating gradient descent, healthcare, prediction sets, nontrivial prediction sets, full conditional coverage, marginal coverage, improved conditional validity, theoretical analysis, superior performance in coverage and length, real-world datasets, synthetic datasets
Symmetry Induces Structure and Constraint of Learning (Poster)
- Authors: Liu Ziyin
- Affiliations: MIT; NTT Research
- TL;DR: This study explores the impact of loss function symmetries on the learning behavior of machine learning models, demonstrating that mirror reflection symmetries impose constraints on model parameters. The findings reveal that these constraints can lead to phenomena such as sparsity and low rankness, providing insights into the design of algorithms that enforce hard constraints in a differentiable manner.
- Keywords: symmetry in neural networks, learning behavior, loss function, stochastic gradient descent (SGD), gradient descent (GD), constraints on model parameters, loss of plasticity, collapse phenomena, mirror reflection symmetry, constrained symmetric solutions, sparsity, low rankness, homogeneous ensembling
Position: Standardization of Behavioral Use Clauses is Necessary for the Adoption of Responsible Licensing of AI (Poster)
- Authors: Daniel McDuff, Tim Korjakow, Scott Cambo, Jesse Benjamin, Jenny Lee, Yacine Jernite, Carlos Muñoz Ferrandis, Aaron Gokaslan, Alek Tarkowski, Joseph Lindley, A. Feder Cooper, Danish Contractor
- Affiliations: Open Future Foundation, Hugging Face, The Center for Generative AI, Law, and Policy Research, Cornell University, Responsible AI Licenses; University of Washington, USA, Responsible AI Licenses; Lancaster University, Responsible AI Licenses; Technical University of Berlin, None, Responsible AI Licenses; None, Alinia AI
- TL;DR: The paper advocates for the standardization of behavioral use clauses in responsible AI licenses to mitigate risks associated with AI technology while allowing for necessary customization in specific contexts. It highlights the significant adoption of these licenses and the need for clarity to avoid user confusion and dilution of impact.
- Keywords: Responsible AI, Licensing, Behavioral Use Clauses, Mixed-methods methodology, Clustering of license clauses, Qualitative interviews, Quantitative analysis, AI asset management, Software and model repositories, Negligent or malicious uses of AI, Accountability challenges in decentralized systems, Standardization of responsible AI licenses, Customization of behavioral restrictions, Responsible AI Licenses, Behavioral-use clauses
Watermark Stealing in Large Language Models (Poster)
- Authors: Nikola Jovanović, Robin Staab, Martin Vechev
- Affiliations: Department of Computer Science, ETH Zurich
- TL;DR: This study identifies watermark stealing as a significant vulnerability in current LLM watermarking schemes, demonstrating that attackers can effectively spoof and scrub watermarks for under $50 with an average success rate exceeding 80%. The findings challenge the perceived robustness of these schemes and highlight the urgent need for more secure watermarking methods.
- Keywords: LLM watermarking, AI-generated content detection, Watermark stealing, spooﬁng attacks, scrubbing attacks, Vulnerability of watermarking schemes, adversarial attacks, Automated watermark stealing algorithm, comprehensive study of spooﬁng and scrubbing, Large Language Models (LLMs), watermarking
Adaptively Perturbed Mirror Descent for Learning in Games (Poster)
- Authors: Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Atsushi Iwasaki
- Affiliations: CyberAgent, Tokyo, Japan; University of Electro-Communications, Tokyo, Japan, CyberAgent, Tokyo, Japan, University of Electro-Communications, Tokyo, Japan
- TL;DR: This study introduces Adaptively Perturbed Mirror Descent (APMD), a novel technique for achieving last-iterate convergence to Nash equilibria in monotone games, even in the presence of noise. The proposed method demonstrates significantly accelerated convergence rates by adaptively adjusting the perturbation magnitude based on the strategy profile's proximity to an equilibrium.
- Keywords: Learning in games, Nash equilibrium, Payoff perturbation, Mirror Descent (MD), Adaptively Perturbed MD (APMD), Optimistic learning algorithms, Monotone games, Cournot competition, Zero-sum games, Noise in feedback, Convergence challenges, Strategy profile dynamics, Last-iterate convergence, Accelerated convergence, Perturbation adjustment
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers (Poster)
- Authors: Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek
- Affiliations: Fraunhofer Heinrich-Hertz-Institute, 10587 Berlin, Germany; Technische Universität Berlin, 10587 Berlin, Germany; BIFOLD – Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- TL;DR: This study introduces AttnLRP, an extension of Layer-wise Relevance Propagation that effectively attributes both input and latent representations in transformer models, addressing challenges of biased predictions and hallucinations. The proposed method demonstrates superior faithfulness and computational efficiency compared to existing techniques, facilitating a deeper understanding of model behavior.
- Keywords: Explainability, Large Language Models, Transformer Models, Layer-wise Relevance Propagation, Attention Mechanism, Text Generation, Image Generation, Biased Predictions, Hallucinations, Model Behavior Understanding, Faithful Attributions, Concept-Based Explanations, LLaMa 2, Mixtral 8x7b, Flan-T5, Vision Transformer Architectures, Attention Layers, Feed-Forward Network (FFN)
A Dynamical Model of Neural Scaling Laws (Poster)
- Authors: Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan
- Affiliations: Department of Physics, Harvard University, SEAS, Harvard University; Kempner Institute, Harvard University
- TL;DR: This study introduces a solvable model to analyze neural scaling laws, revealing that performance improves predictably with training time, dataset size, and model size. The findings highlight an asymmetric compute-optimal scaling rule and the dynamics of training convergence, providing insights into the optimal trade-offs in deep learning architectures.
- Keywords: neural scaling laws, deep learning, gradient descent, random feature model, language models, vision models, performance scaling, training time, model size, compute-optimal scaling law, asymmetric scaling rule
Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning (Poster)
- Authors: Idan Achituve, Idit Diamant, Arnon Netzer, Gal Chechik, Ethan Fetaya
- Affiliations: Department of Computer Science, Bar-Ilan University, Israel, Faculty of Engineering, Bar-Ilan University, Israel, Faculty of Engineering, Bar-Ilan University, Israel; Sony Semiconductor Israel, Sony Semiconductor Israel
- TL;DR: This study introduces a novel gradient aggregation method for multi-task learning using Bayesian inference to account for uncertainty in gradient dimensions. The approach improves performance by effectively managing the sensitivity of gradients across multiple tasks.
- Keywords: Multi-task learning (MTL), Bayesian inference, Gradient aggregation, probability distribution over parameters, Autonomous vehicles, real-time inference tasks, Sensitivity in gradient dimensions, performance degradation in MTL, Novel gradient aggregation procedure, uncertainty estimates for gradients, State-of-the-art performance
Optimal Coresets for Low-Dimensional Geometric Median (Poster)
- Authors: Peyman Afshani, Chris Schwiegelshohn
- Affiliations: Department of Computer Science, Aarhus University, Denmark
- TL;DR: This study investigates coresets for approximating the cost of median queries in low-dimensional spaces, providing matching upper and lower bounds on the number of points in the coreset. The findings highlight the efficiency of coresets in big data analysis and their implications for geometric median problems.
- Keywords: coresets, geometric median, big data analysis, approximating cost with respect to median queries, high-dimensional data, coreset construction, bounds on coreset size, Euclidean norm, ε-coreset
FrameQuant: Flexible Low-Bit Quantization for Transformers (Poster)
- Authors: Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang, Vikas Singh
- Affiliations: Google Research, University of Wisconsin-Madison; Google Research, University of Wisconsin-Madison
- TL;DR: This paper introduces FrameQuant, a novel Post-Training Quantization scheme that enables quantization of Transformer models to two bits with minimal accuracy loss, addressing the challenges of high resource demands in deploying large models. The method leverages Fusion Frames to enhance robustness against quantization errors, promising significant efficiency gains.
- Keywords: Post-Training Quantization, Transformer Models, Efficiency in Model Deployment, Fusion Frames, Low-Bit Quantization, Natural Language Processing, Vision Transformers, Image Classification, High compute and memory/storage footprint, Model deployment challenges, Resource-constrained devices, Two-bit quantization scheme, Robustness to quantization error, Transformers, Large Language Models (LLMs), Vision Transformers (VITs)
Learning to Play Atari in a World of Tokens (Poster)
- Authors: Pranav Agarwal, Sheldon Andrews, Samira Ebrahimi Kahou
- Affiliations: University of Calgary, Mila, Canada CIFAR AI Chair, École de Technologie Supérieure, Roblox, None, École de Technologie Supérieure, Mila, None
- TL;DR: This study introduces DART, a sample-efficient method for model-based reinforcement learning that utilizes discrete representations to improve world modeling and learning behavior. DART outperforms previous state-of-the-art methods on the Atari 100k benchmark, achieving a median human-normalized score of 0.790 and surpassing human performance in 9 out of 26 games.
- Keywords: Model-based reinforcement learning, Transformers, Discrete abstract representations, Transformer-decoder, Transformer-encoder, Atari games, Reinforcement learning environments, Sample inefficiency, Partial observability, Compounding error problems, DART (Discrete Abstract Representations for Transformer-based Learning), Improved sample efficiency, Atari 100k
Probabilistic Generating Circuits - Demystified (Oral)
- Authors: Sanyam Agarwal, Markus Bläser
- Affiliations: Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
- TL;DR: This paper investigates probabilistic generating circuits (PGCs) and demonstrates that their power stems from allowing negative weights rather than their representation method. It also shows that PGCs for categorical variables with larger image sizes do not support tractable marginalization unless NP = P, while PCs with negative weights can effectively model these variables.
- Keywords: probabilistic modeling, tractable probabilistic inference, probabilistic generating circuits (PGCs), probabilistic circuits (PCs), determinantal point processes (DPPs), probability generating polynomial, intractable probabilistic inference, marginalization challenges, transformation of PGCs to PCs with negative weights, tractable marginalization for categorical variables, negative weights, set-multilinear polynomials
The Non-linear $F$-Design and Applications to Interactive Learning (Poster)
- Authors: Alekh Agarwal, Jian Qian, Alexander Rakhlin, Tong Zhang
- Affiliations: UIUC, MIT, Google
- TL;DR: This paper introduces the F-design, a generalization of G-optimal design for non-linear function classes, and demonstrates its effectiveness in various interactive machine learning tasks. The authors provide algorithms for constructing designs with a bounded F-condition number, yielding state-of-the-art results in data collection and exploration.
- Keywords: Non-linear experimental design, Interactive machine learning, F-design, G-design, Algorithms for design construction, Data collection, Confidence band construction, Contextual bandits, Model-free reinforcement learning, Active learning, Data collection efficiency, Sample-constrained scenarios, F-condition number, State-of-the-art results in interactive learning, Eluder dimension
BeigeMaps: Behavioral Eigenmaps for Reinforcement Learning from Images (Poster)
- Authors: Sandesh Adhikary, Anqi Li, Byron Boots
- Affiliations: Computer Science and Engineering, University of Washington, Seattle, WA (USA); NVIDIA, Computer Science and Engineering, University of Washington, Seattle, WA (USA)
- TL;DR: The study proposes Behavioral Eigenmaps (BeigeMaps) as a new representation learning method for reinforcement learning agents that addresses the challenges of high-dimensional image observations. The findings demonstrate that BeigeMaps can improve policy performance in prior behavioral distance-based RL algorithms.
- Keywords: Reinforcement Learning, Representation Learning, Behavioral Distances, Bisimulation Metric, Isometric Mapping, Image Observations, Control Policies, High-dimensional Data, Poor Sample Efficiency, Weak Generalization Capability, Behavioral Eigenmaps (BeigeMaps), Improved Policy Performance
Gaussian Processes on Cellular Complexes (Poster)
- Authors: Mathieu Alain, So Takao, Brooks Paige, Marc Deisenroth
- Affiliations: Centre for Artificial Intelligence, University College London, London, UK, Centre for Artificial Intelligence, University College London, London, UK; Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- TL;DR: This paper proposes Gaussian processes on cellular complexes to model polyadic relations and capture interactions between higher-order cells, addressing the limitations of traditional graphs in uncertainty quantification. The authors introduce a novel cellular Matérn kernel that generalizes existing methods and allows for directed predictions on various cell types.
- Keywords: Gaussian processes, cellular complexes, polyadic relations, Gaussian processes, Matérn kernel, Machine learning, signal processing, Uncertainty quantification, limitations of graphs, Novel kernels for cellular complexes, Graph neural networks (GNNs), graph kernel machines, Topological inductive biases, higher-order cells
How to Escape Sharp Minima with Random Perturbations (Poster)
- Authors: Kwangjun Ahn, Ali Jadbabaie, Suvrit Sra
- Affiliations: MIT, MIT; Microsoft Research; TU Munich, MIT; TU Munich
- TL;DR: This study formalizes the concept of flat minima in optimization and explores efficient algorithms for finding them, highlighting the significance of flat minima in improving prediction performance in machine learning applications. The findings support the effectiveness of methods like sharpness-aware minimization in achieving flatter minima.
- Keywords: flat minima, optimization algorithms, machine learning, gradient-based algorithm, sharpness-aware minimization (SAM), image classification, language processing, local/global minima, prediction performance, approximate flat minima, efficient algorithms, Hessian, empirical risk
Not all distributional shifts are equal: Fine-grained robust conformal inference (Poster)
- Authors: Jiahao Ai, Zhimei Ren
- Affiliations: School of Mathematical Sciences, Peking University, Beijing, China, Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, USA
- TL;DR: This study presents a fine-grained framework for uncertainty quantification in predictive models that addresses distributional shifts by distinguishing between shifts in covariate distributions and conditional relationships. The proposed method demonstrates improved robustness and efficiency in generating valid prediction intervals compared to existing approaches.
- Keywords: uncertainty quantification, distributional shifts, predictive models, conformal inference, distributionally robust learning, sensitivity analysis, individual treatment effects, performance drop under distributional shifts, non-exchangeability of training and test data, valid prediction intervals, improved efficiency in prediction sets, f-divergence, covariate distribution shift
Learning Mixtures of Gaussian Processes through Random Projection (Poster)
- Authors: Emmanuel Akeweje, Mimi Zhang
- Affiliations: School of Computer Science and Statistics, Trinity College Dublin, Ireland; I-Form Advanced Manufacturing Research Centre, Science Foundation Ireland, Ireland
- TL;DR: This study presents an ensemble clustering framework to identify latent cluster labels in functional data generated from Gaussian process mixtures, significantly reducing computational complexity compared to existing methods. The proposed approach allows for independent learning of each Gaussian process component after uncovering hidden cluster labels, with theoretical guarantees on identifiability and learnability.
- Keywords: Gaussian processes, ensemble clustering, functional data, Gaussian mixture model (GMM), univariate GMM, parameter estimation, Cluster analysis, statistical analysis of functional data, Computational complexity, identifiability and learnability of Gaussian process mixtures, Consensus clustering, theoretical guarantees, Synthetic datasets, real datasets
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions (Poster)
- Authors: Victor Agostinelli III, Sanghyun Hong, Lizhong Chen
- Affiliations: Oregon State University, OR USA
- TL;DR: The study introduces LeaPformer, a novel approach to enhance linear transformers for autoregressive and simultaneous tasks by utilizing learned proportions instead of static positional representations. It demonstrates superior performance in quality-throughput trade-offs on various benchmarks, including language modeling and speech-to-text translation.
- Keywords: Linear Transformers, Autoregressive Tasks, Simultaneous Tasks, Position-based re-weighting functions, Dynamic proportions, Natural Language Processing, Speech-to-Text Translation, Quadratic complexity of attention mechanisms, Sequence length dependency, LeaPformer, Quality-throughput trade-off, Long-Range Arena benchmark, Wikitext-103, Transformers, Efficient Attention Mechanisms
Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise (Poster)
- Authors: Kwangjun Ahn, Zhiyu Zhang, Yunbum Kook, Yan Dai
- Affiliations: Tsinghua University, Georgia Tech, Harvard University, MIT; Microsoft Research
- TL;DR: This study provides a theoretical perspective on the Adam optimizer by framing it within the online learning of updates, revealing its correspondence to the Follow-the-Regularized-Leader (FTRL) framework. The findings emphasize the significance of Adam's algorithmic components, particularly momentum and discounting factors, in enhancing optimization performance.
- Keywords: Adam optimizer, online learning, optimization algorithms, Follow-the-Regularized-Leader (FTRL), stochastic gradient descent (SGD), Deep learning, Transformer-based neural networks, Theoretical understanding of algorithmic components, convergence rates, Insights into the benefits of momentum and discounting factors in optimization
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction (Spotlight Poster)
- Authors: Zeyuan Allen-Zhu, Yuanzhi Li
- Affiliations: Meta / FAIR Labs, USA, MBZUAI, UAE
- TL;DR: This paper investigates how large language models (LLMs) memorize and extract knowledge, revealing that effective knowledge extraction requires data augmentation during pretraining. The authors recommend rewriting pretraining data and incorporating more instruction-finetuning data to enhance knowledge extraction capabilities.
- Keywords: Knowledge extraction, Language models, Memorization, Transformer-based models, Linear probing, Knowledge extraction challenges, Memorization vs. extraction, Recommendations for LLM pretraining, Knowledge augmentation, Controlled biography dataset, Large language models (LLMs), Factual knowledge
Revisiting Inexact Fixed-Point Iterations for Min-Max Problems: Stochasticity and Structured Nonconvexity (Poster)
- Authors: Ahmet Alacaoglu, Donghwan Kim, Stephen Wright
- Affiliations: KAIST, Republic of Korea, University of Wisconsin–Madison, USA; University of British Columbia, Canada, University of Wisconsin–Madison, USA
- TL;DR: This study addresses constrained, L-smooth, potentially stochastic nonconvex-nonconcave min-max problems, providing optimal complexity guarantees under cohypomonotonicity and weak MVI conditions. The authors present improved convergence analyses and refined methods for inexact iterations, relevant to applications in reinforcement learning and adversarial machine learning.
- Keywords: min-max problems, nonconvex-nonconcave optimization, stochastic optimization, inexact fixed-point iterations, Halpern iteration, Krasnosel’ski˘ı-Mann iteration, multilevel Monte Carlo estimator, reinforcement learning, generative adversarial networks (GANs), adversarial machine learning, nonconvexity, cohypomonotonicity, weak Minty Variational Inequality (MVI), optimal complexity guarantees, convergence analysis improvements, ρ-cohypomonotonicity, Lipschitz continuity, indicator function
Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates (Poster)
- Authors: Youssef Allouah, Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot, Geovani Rizk, Sasha Voitovych
- Affiliations: Sorbonne Université, LPSM, University of Toronto, EPFL
- TL;DR: This study investigates the robustness of federated learning against Byzantine clients by analyzing the effects of client subsampling and local updates. The findings reveal that careful management of these factors is crucial for optimal convergence and performance in federated learning systems.
- Keywords: Federated Learning, Byzantine Robustness, FedAvg, FedRo, Image Classification, Adversarial Clients, Byzantine Clients, Client Subsampling, Local Updates, Robust Aggregation Rule, Convergence Conditions, FEMNIST, CIFAR-10
The Privacy Power of Correlated Noise in Decentralized Learning (Poster)
- Authors: Youssef Allouah, Anastasiia Koloskova, Aymane Firdoussi, Martin Jaggi, Rachid Guerraoui
- Affiliations: EPFL, Switzerland
- TL;DR: This paper introduces DECOR, a decentralized learning method that utilizes correlated noise to enhance privacy while maintaining utility in model training. It demonstrates that DECOR achieves optimal privacy-utility trade-offs under a new relaxation of local differential privacy, SecLDP, addressing privacy concerns in decentralized settings.
- Keywords: Decentralized learning, Privacy, Decentralized SGD, Differential privacy (DP), Local differential privacy (LDP), SecLDP, Privacy concerns, Data exposure, Network sparsity, DECOR method, Privacy accountant, Correlated noise, Gaussian noise
Nonlinear Filtering with Brenier Optimal Transport Maps (Poster)
- Authors: Mohammad Al-Jarrah, Niyizhen Jin, Bamdad Hosseini, Amirhossein Taghvaei
- Affiliations: Department of Aeronautics & Astronautics, University of Washington, Seattle, Department of Applied Mathematics, University of Washington, Seattle
- TL;DR: This paper presents a novel method for nonlinear filtering using Brenier optimal transport maps to compute the conditional distribution of a stochastic dynamical system's state from noisy observations. The proposed method addresses limitations of conventional particle filters, particularly in high-dimensional and non-Gaussian scenarios, demonstrating improved sample efficiency and scalability.
- Keywords: Nonlinear filtering, Stochastic dynamical systems, Brenier optimal transport (OT) maps, Sequential importance resampling (SIR) particle filters, Ensemble Kalman filter, Neural networks, Degenerate likelihoods, High-dimensional states, Weight degeneracy issue, Sample efficiency, High-dimensional scalability, Multi-modal distributions, Hidden Markov process, Conditional probability kernels
Hyperbolic Optimizer as a Dynamical System (Poster)
- Authors: Nico Alvarado, Hans Lobel
- Affiliations: Department of Computer Science, Pontificia Universidad Católica de Chile; National Center of Artificial Intelligence, Chile; Millennium Institute Foundational Research on Data, Chile; Department of Transport and Logistics Engineering, Pontificia Universidad Católica de Chile, Department of Computer Science, Pontificia Universidad Católica de Chile; National Center of Artificial Intelligence, Chile; Millennium Institute Foundational Research on Data, Chile
- TL;DR: This study introduces a hyperbolic optimizer based on ADMM tailored for hyperbolic geometry, linking it to a non-linear ordinary differential equation. The research emphasizes the importance of stability analysis for effective implementation in real-world applications, particularly in hyperbolic neural networks.
- Keywords: hyperbolic geometry, dynamical systems, optimization algorithms, Riemannian optimization, ADMM (Alternating Direction Method of Multipliers), non-linear ordinary differential equations, hyperbolic neural networks, hierarchical data representation, challenges of classical optimizers in non-Euclidean spaces, new hyperbolic optimizer, stability analysis through ODE linearization, hyperbolic spaces, Lyapunov stability, Poincaré ball model
Robust and Conjugate Gaussian Process Regression (Spotlight Poster)
- Authors: Matias Altamirano, Francois-Xavier Briol, Jeremias Knoblauch
- Affiliations: Department of Statistical Science, University College London, London, United Kingdom
- TL;DR: This paper presents a method for robust and conjugate Gaussian process regression (RCGP) that maintains closed-form conditioning while addressing the challenges posed by outliers in data. The proposed RCGP method demonstrates strong empirical performance across various applications, including Bayesian optimisation and sparse variational Gaussian processes.
- Keywords: Gaussian Process Regression, Robustness in Inference, Generalised Bayesian Inference, Conjugate Gaussian Process, Bayesian Optimisation, Sparse Variational Gaussian Processes, Outliers in Data, Non-robust Inferences, Provably Robust and Conjugate Gaussian Process (RCGP)
Position: Stop Making Unscientific AGI Performance Claims (Poster)
- Authors: Patrick Altmeyer, Andrew Demetriou, Antony Bartlett, Cynthia C. S. Liem
- Affiliations: Department of Intelligent Systems, Delft University of Technology, Delft, the Netherlands
- TL;DR: The paper argues against the interpretation of spurious correlations in large language models as evidence of Artificial General Intelligence (AGI), emphasizing the need for caution in AI research communication. It highlights the tendency of humans to anthropomorphize AI capabilities and calls for adherence to principles of academic integrity.
- Keywords: Artificial General Intelligence (AGI), Large Language Models (LLMs), Random projections, matrix decompositions, deep autoencoders, transformers, Misinterpretation of AI capabilities, spurious correlations in model representations, Academic integrity, methodological setup
Triple Changes Estimator for Targeted Policies (Spotlight Poster)
- Authors: Sina Akbari, Negar Kiyavash
- Affiliations: EPFL, Switzerland
- TL;DR: This study extends the triple difference estimator within the changes-in-changes framework to provide a more robust method for estimating causal effects, particularly in the context of Medicaid expansion's impact on children's preventive care. The findings highlight the importance of addressing the parallel trends assumption to avoid biased estimates in observational studies.
- Keywords: difference-in-differences (DiD), triple difference estimator, changes-in-changes (CiC), triple changes estimator, Medicaid expansion, children's preventive care, parallel trends assumption, bias in estimates, extension of the triple difference estimator, identification assumptions
Online conformal prediction with decaying step sizes (Poster)
- Authors: Anastasios Angelopoulos, Rina Barber, Stephen Bates
- Affiliations: Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge MA USA, Department of Electrical Engineering and Computer Science, University of California at Berkeley, Berkeley CA USA, Department of Statistics, University of Chicago, Chicago IL USA
- TL;DR: This paper presents a method for online conformal prediction that utilizes decaying step sizes to provide robust coverage guarantees for both arbitrary and I.I.D. data sequences. The proposed method improves practical properties by ensuring that coverage remains close to the desired level at every time point, particularly in stable distributions.
- Keywords: online conformal prediction, uncertainty quantification, decaying step sizes, conformal score function, time-series forecasting, medicine, robotics, finance, epidemiology, coverage probability, prediction sets, historical fraction of miscovered labels, simultaneous best-case and worst-case guarantees, long-run coverage guarantee, convergence guarantee
Distinguishing the Knowable from the Unknowable with Language Models (Poster)
- Authors: Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, Benjamin Edelman
- Affiliations: Harvard University
- TL;DR: This study investigates the distinction between epistemic and aleatoric uncertainty in large language models (LLMs) by using larger models as proxies for ground truth. The authors propose methods to identify these uncertainties at the token level, demonstrating that LLMs contain internal representations that can enhance model confidence indicators.
- Keywords: epistemic uncertainty, aleatoric uncertainty, large language models (LLMs), linear probes, embeddings, unsupervised methods, In-Context Learning Test (ICLT), identifying uncertainty types in language model outputs, improved indicators of model confidence, token-level uncertainty classification
Scalable Online Exploration via Coverability (Poster)
- Authors: Philip Amortila, Dylan Foster, Akshay Krishnamurthy
- Affiliations: University of Illinois, Urbana-Champaign, Microsoft Research
- TL;DR: This paper introduces L1-Coverage as a new exploration objective in reinforcement learning, aimed at improving exploration efficiency in high-dimensional environments. The authors demonstrate that L1-Coverage facilitates effective policy optimization and enables computationally efficient algorithms for both model-based and model-free reinforcement learning.
- Keywords: exploration in reinforcement learning, high-dimensional domains, L1-Coverage, policy optimization, policy gradient, Q-learning, sample complexity, computational efficiency, exploration challenges, efficient model-based and model-free algorithms, exploration objectives, MDP (Markov Decision Process), coverability
Stationarity without mean reversion in improper Gaussian processes (Poster)
- Authors: Luca Ambrogioni
- Affiliations: Donders Institute for Brain, Cognition and Behaviour, Radboud University
- TL;DR: This paper introduces a method to use improper Gaussian process priors with infinite variance to create stationary processes that do not exhibit mean reversion, addressing issues in traditional GP regression. The authors present a family of non-positive kernels that maintain desirable properties of standard stationary kernels while avoiding pathological behaviors.
- Keywords: Gaussian Processes, Stationarity, Non-Mean Reversion, Improper Gaussian Processes, Non-Positive Kernels, Machine Learning, Spatial Statistics, Statistical Signal Processing, Mean Reversion, Pathological Behavior in GP Regression, Non-Reverting Covariance Functions, Analytical Posterior Distributions, Synthetic Data, Real Data, Covariance Function, Kernel, Squared Exponential, Matérn Class
Causal Action Influence Aware Counterfactual Data Augmentation (Poster)
- Authors: Núria Armengol Urpí, Marco Bagatella, Marin Vlastelica, Georg Martius
- Affiliations: Department of Computer Science, University of Tübingen, Tübingen, Germany, Department of Computer Science, ETH Zurich, Zurich, Switzerland; Max Planck Institute for Intelligent Systems, Tübingen, Germany, Max Planck Institute for Intelligent Systems, Tübingen, Germany
- TL;DR: The study introduces CAIAC, a data augmentation method that enhances offline learning for robots by utilizing causal action influence to create synthetic transitions from existing datasets. This approach significantly improves the robustness of learning algorithms against distributional shifts and causal confusion.
- Keywords: Offline learning, data augmentation, causal reasoning, Causal action influence (CAI), counterfactual reasoning, Robotics, complex behavior teaching, Causal confusion, distributional shift, spurious correlations, CAIAC method for data augmentation, improved robustness of offline learning algorithms
No Dimensional Sampling Coresets for Classification (Spotlight Poster)
- Authors: Meysam Alishahi, Jeff Phillips
- Affiliations: Kahlert School of Computing, University of Utah, Salt Lake City, Utah, USA; None, Kahlert School of Computing, University of Utah, Salt Lake City, Utah, USA; visiting ScaDS.AI, University of Leipzig and MPI for Math in the Sciences, Leipzig, Germany
- TL;DR: This paper refines and generalizes the concept of coresets for classification problems using a sensitivity sampling framework, introducing the first no dimensional coresets that do not depend on the dimension of the data. The findings provide sample complexity bounds and ensure approximation guarantees for various loss functions, enhancing the efficiency of machine learning models.
- Keywords: Coresets, Classification Problems, Sensitivity Sampling, Sensitivity Sampling, Radamacher Complexity, Machine Learning, Approximation Guarantees, Data Size Reduction, Performance on New Data, No Dimensional Coresets, Sample Complexity Bounds
Robust Graph Matching when Nodes are Corrupt (Poster)
- Authors: Taha Ameen Ur Rahman, Bruce Hajek
- Affiliations: Department of Electrical and Computer Engineering; Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- TL;DR: This study introduces two models for matching correlated graphs in the presence of corrupt nodes, demonstrating that while detection of corrupt nodes is impossible in an adversarial setting, robust estimators can still match a positive fraction of nodes when only one network is compromised. The findings highlight the importance of considering node corruption in graph matching algorithms.
- Keywords: Graph matching, Node corruption, k-core estimator, maximum overlap estimator, Social networks, Biological networks, Computer vision, Natural language processing, Node corruption, Graph isomorphism, Necessary conditions for estimators, Robust algorithms, Correlated graphs, Edge-correlated networks
Learning the Target Network in Function Space (Poster)
- Authors: Kavosh Asadi, Yao Liu, Shoham Sabach, Ming Yin, Rasool Fakoor
- Affiliations: Amazon, Technion; Amazon, Princeton University
- TL;DR: The study introduces Lookahead-Replicate (LR), a novel algorithm for learning the value function in reinforcement learning that maintains equivalence in function space rather than parameter space. Empirical results show that LR significantly enhances deep reinforcement learning performance on the Atari benchmark.
- Keywords: reinforcement learning, value function approximation, Lookahead-Replicate (LR), Bellman operator, learning accurate value function, function approximation in large state spaces, convergent behavior in learning, improved deep RL performance, Atari benchmark
How Free is Parameter-Free Stochastic Optimization? (Spotlight Poster)
- Authors: Amit Attia, Tomer Koren
- Affiliations: Blavatnik School of Computer Science, Tel Aviv University; Google Research Tel Aviv
- TL;DR: This study investigates the existence of fully parameter-free stochastic optimization methods that achieve competitive convergence rates without requiring significant knowledge of problem parameters. It presents a hyperparameter search technique that outperforms existing state-of-the-art algorithms in both non-convex and convex settings.
- Keywords: parameter-free stochastic optimization, stochastic gradient descent, hyperparameter search technique, adaptive methods, parameter-free methods, machine learning, statistical learning problems, tuning of algorithmic parameters, unknown problem parameters, fully parameter-free methods, convergence rates, lower bounds
Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing (Poster)
- Authors: Alaa Anani, Tobias Lorenz, Bernt Schiele, Mario Fritz
- Affiliations: Max Planck Institute for Informatics, CISPA Helmhotz Center for Information Security; Max Planck Institute for Informatics, CISPA Helmhotz Center for Information Security
- TL;DR: This paper presents a novel adaptive hierarchical certification method for semantic segmentation that reduces abstain rates by certifying unstable components at coarser semantic levels while maintaining theoretical guarantees. The proposed method demonstrates improved certified accuracy and lower abstain rates compared to existing state-of-the-art techniques.
- Keywords: Certification for machine learning, Semantic segmentation, Adversarial robustness, Randomized smoothing, Adaptive hierarchical certification, Autonomous driving, Medical imaging, Video surveillance, Object detection, High abstain rates, Model uncertainty, Non-robustness to adversarial perturbations, Certified Information Gain (CIG), Lower abstain rate, Cityscapes, PASCAL-Context, ACDC, COCO-Stuff, Safety-critical domains
Random features models: a way to study the success of naive imputation (Poster)
- Authors: Alexis Ayme, Claire Boyer, Aymeric Dieuleveut, Erwan Scornet
- Affiliations: Sorbonne Université, CNRS, Laboratoire de Probabilités, Statistique et Modélisation (LPSM), F-75005 Paris, France; Institut Universitaire de France (IUF), Sorbonne Université, CNRS, Laboratoire de Probabilités, Statistique et Modélisation (LPSM), F-75005 Paris, France, CMAP, UMR7641, Ecole Polytechnique, IP Paris, 91128 Palaiseau, France
- TL;DR: This study investigates the effectiveness of naive imputation methods for handling missing data, demonstrating that the bias introduced is negligible in high-dimensional linear predictors and remains relevant even in low dimensions. The authors establish theoretical bounds for stochastic gradient predictors applied to zero-imputed data, suggesting favorable outcomes even under more complex missing data scenarios.
- Keywords: missing data, naive imputation, predictive performance, random features model, stochastic gradient descent (SGD), machine learning, data analysis, missing completely at random (MCAR), bias in imputation, finite-sample bounds, consistency of imputation strategies
Delaunay Graph: Addressing Over-Squashing and Over-Smoothing Using Delaunay Triangulation (Poster)
- Authors: Hugo Attali, Davide Buscaldi, Nathalie Pernelle
- Affiliations: LIPN, Universite Sorbonne Nord
- TL;DR: This study introduces a novel approach using Delaunay Triangulation to construct graphs from features, addressing the issues of over-squashing and over-smoothing in Graph Neural Networks. The proposed method consistently outperforms existing graph rewiring techniques, enhancing information propagation efficiency.
- Keywords: Graph Neural Networks (GNNs), Information Propagation, Delaunay Triangulation, Message-Passing Paradigm, Graph Rewiring, Chemistry, Information Retrieval, Social Network Analysis, Knowledge Graphs, Over-Squashing, Over-Smoothing, Heterophilic Case, Bottlenecks in Graphs, Novel Graph Construction Method, Improved Information Propagation
An amortized approach to non-linear mixed-effects modeling based on neural posterior estimation (Poster)
- Authors: Jonas Arruda, Yannik Schälte, Clemens Peiter, Olga Teplytska, Ulrich Jaehde, Jan Hasenauer
- Affiliations: Pharmaceutical Institute, University of Bonn, 53121 Bonn, Germany, Life and Medical Sciences Institute, University of Bonn, 53115 Bonn, Germany, Life and Medical Sciences Institute, University of Bonn, 53115 Bonn, Germany; Computational Health Center, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- TL;DR: This study presents a novel machine learning-based approach using neural density estimation to efficiently estimate parameters in non-linear mixed-effects models for heterogeneous populations. The method demonstrates significant flexibility and scalability in applications within cell biology and pharmacology compared to traditional techniques.
- Keywords: Non-linear mixed-effects models, Heterogeneous populations, Neural density estimation, Conditional normalizing flows, Cell biology, Pharmacology, Computational challenges in parameter inference, Individual-level likelihood formulation, Amortized parameter estimation, Efficient inference of population parameters
Memory Consolidation Enables Long-Context Video Understanding (Spotlight Poster)
- Authors: Ivana Balazevic, Yuge Shi, Pinelopi Papalampidi, Rahma Chaabouni, Skanda Koppula, Olivier Henaff
- Affiliations: Google DeepMind
- TL;DR: This study introduces the Memory-Consolidated Vision Transformer (MC-ViT), which effectively extends the temporal context of video understanding by utilizing a memory bank of past activations without modifying the underlying architecture. MC-ViT achieves state-of-the-art performance on long-context video tasks while using significantly fewer parameters than existing models.
- Keywords: long-context video understanding, memory consolidation, transformer architectures, memory bank, fine-tuning, video processing, artificial vision systems, short temporal contexts, quadratic complexity, scaling issues, Memory-Consolidated Vision Transformer (MC-ViT), state-of-the-art performance, EgoSchema, Perception Test, Diving48, redundancy reduction, streaming setting
Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages (Poster)
- Authors: Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy Nguyen, Kunal Talwar, Samson Zhou
- Affiliations: Northeastern, Apple, Texas A&M University, UC Berkeley
- TL;DR: This study addresses private vector mean estimation in the shuffle model, proposing a multi-message protocol that achieves optimal error with a specific message complexity. It also analyzes the robustness of the protocol against malicious users and establishes the necessity of multiple messages for optimal performance.
- Keywords: private vector mean estimation, shuffle model, federated learning, multi-message protocol, single-message protocol, differential privacy, federated analytics, machine learning model training, statistical analysis, privacy error, optimal error, robustness to malicious users, optimal message complexity, mean squared error analysis, local differential privacy, Encode, Shuffle, Analyze (ESA) model
Practical Performance Guarantees for Pipelined DNN Inference (Spotlight Poster)
- Authors: Aaron Archer, Matthew Fahrbach, Kuikui Liu, Prakash Prabhu
- Affiliations: MIT, Google
- TL;DR: This study focuses on optimizing pipeline parallelism for DNN inference by partitioning model graphs and minimizing the bottleneck stage's running time. The authors present effective algorithms and demonstrate that their novel MIP relaxations significantly improve lower bounds, closing the optimality gap by a factor of nearly 10.
- Keywords: pipeline parallelism, deep neural network (DNN) inference, mixed-integer programming (MIP), model graph partitioning, bottleneck stage, communication overhead, running time balance, improved lower bounds, practical algorithms, diverse testbed of 369 production models
Fast Algorithms for Hypergraph PageRank with Applications to Semi-Supervised Learning (Poster)
- Authors: Konstantinos Ameranis, Adela DePavia, Lorenzo Orecchia, Erasmo Tani
- Affiliations: Computational and Applied Mathematics, University of Chicago, Chicago, USA, Department of Computer Science, University of Chicago, Chicago, USA
- TL;DR: This paper presents scalable algorithms for hypergraph PageRank and hypergraph Laplacian systems to enhance semi-supervised learning by effectively capturing higher-order relationships. The proposed methods demonstrate significant speed improvements, enabling broader applications of hypergraph models in large-scale settings.
- Keywords: semi-supervised learning, hypergraph models, hypergraph PageRank, hypergraph Laplacian systems, higher-order relations, group membership, scalability of hypergraph models, scalable algorithms, speed-ups on hypergraph tasks, benchmark instances of semi-supervised learning, hypergraph primitives, Dirichlet energy, Laplacian system
Bipartite Matching in Massive Graphs: A Tight Analysis of EDCS (Poster)
- Authors: Amir Azarmehr, Soheil Behnezhad, Mohammad Roghani
- Affiliations: Khoury College of Computer Science, Northeastern University, Boston, USA, Department of Management Science and Engineering, Stanford University, Stanford, USA
- TL;DR: This paper analyzes the edge-degree constrained subgraph (EDCS) for maximum matching in massive graphs, revealing that the optimal sparsity parameter β = 6 achieves an approximation ratio of 0.677, surpassing the previously believed limit of 2/3. The findings suggest that increasing β is not necessary for improving approximation ratios.
- Keywords: Maximum matching, Graph sparsification, Edge-degree constrained subgraph (EDCS), Data mining, Resource allocation, Online advertisement, Balanced clustering, Handling massive graphs, Memory limitations in processing, Approximation ratio analysis, Improved matching size
A Rate-Distortion View of Uncertainty Quantification (Poster)
- Authors: Ifigeneia Apostolopoulou, Benjamin Eysenbach, Frank Nielsen, Artur Dubrawski
- Affiliations: Sony Computer Science Laboratories Inc., Tokyo, Japan, Machine Learning Department, AutonLab, Carnegie Mellon University, Computer Science Department, Princeton University
- TL;DR: This paper introduces the Distance Aware Bottleneck (DAB) method to enhance deep neural networks' ability to quantify uncertainty by measuring the distance of new examples from a learned codebook of training data. The proposed method achieves superior out-of-distribution detection and misclassification prediction compared to existing techniques, all while providing deterministic uncertainty estimates with a single forward pass.
- Keywords: Uncertainty Quantification, Deep Learning, Distance Aware Bottleneck (DAB), Gaussian Processes, Deterministic Uncertainty Methods (DUMs), Out-Of-Distribution (OOD) Detection, Reliable uncertainty estimation, model confidence, misclassification prediction, Improved OOD detection, deterministic uncertainty estimates, CIFAR-10, CIFAR-100, SVHN, Information Bottleneck, feature collapse
Constrained Ensemble Exploration for Unsupervised Skill Discovery (Poster)
- Authors: Chenjia Bai, Rushuai Yang, Qiaosheng Zhang, Kang Xu, Yi Chen, Ting Xiao, Xuelong Li
- Affiliations: Tencent, Hong Kong University of Science and Technology, Shanghai Artificial Intelligence Laboratory, Shanghai Artificial Intelligence Laboratory; The Institute of Artificial Intelligence (TeleAI), China Telecom, Shanghai Artificial Intelligence Laboratory; Shenzhen Research Institute of Northwestern Polytechnical University, East China University of Science and Technology; The Institute of Artificial Intelligence (TeleAI), China Telecom
- TL;DR: This paper presents a novel unsupervised reinforcement learning framework called Constrained Ensemble exploration for Skill Discovery (CeSD), which utilizes an ensemble of skills to enhance state coverage and learn distinguishable behaviors without relying on extrinsic rewards. The proposed method demonstrates superior performance in various downstream tasks compared to existing approaches.
- Keywords: Unsupervised Reinforcement Learning, Skill Discovery, Ensemble of Skills, State-Distribution Constraints, Value Functions, Game AI, Self-Driving Cars, Robotic Manipulation, Locomotion Tasks, Static Skills, Poor State Coverage, High-Dimensional Environments, Novel Framework (CeSD), Well-Explored Ensemble Skills, Mutual Information, Empowerment, Feature Clustering
Simulation of Graph Algorithms with Looped Transformers (Poster)
- Authors: Artur Back de Luca, Kimon Fountoulakis
- Affiliations: David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
- TL;DR: This study investigates the capability of looped transformer networks to simulate various graph algorithms, including Dijkstra’s, BFS, and DFS, demonstrating that the architecture can handle graphs of varying sizes without increasing the number of parameters. The findings indicate a limit to simulation due to finite precision, while also achieving Turing Completeness with constant width.
- Keywords: graph algorithms, neural networks, algorithmic reasoning, looped transformer, attention heads, adjacency matrix, simulation of algorithms, finite precision, Turing Completeness, multitask model, Dijkstra’s algorithm, Breadth-first search (BFS), Depth-first search (DFS), Kosaraju’s algorithm, strongly connected components (SCC)
Diffusion Models Demand Contrastive Guidance for Adversarial Purification to Advance (Poster)
- Authors: Mingyuan Bai, Wei Huang, Li Tenghui, Andong Wang, Junbin Gao, Cesar F Caiafa, Qibin Zhao
- Affiliations: School of Automation, Guangdong University of Technology, Guangzhou, 510006, CHINA, Deep Learning Theory Team, Center of Advanced Intelligence Project, RIKEN, Tokyo, 1030027, JAPAN, Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, Darlington, NSW, 2006, AUSTRALIA, Key Laboratory of Intelligent Detection and the Internet of Things in Manufacturing, Ministry of Education, Guangzhou, 510006, CHINA, Instituto Argentino de Radioastronomía, CONICET CCT La Plata/CIC-PBA/UNLP, V. Elisa, 1894, ARGENTINA, Tensor Learning Team, Center of Advanced Intelligence Project, RIKEN, Tokyo, 1030027, JAPAN
- TL;DR: This study proposes a method to enhance adversarial purification using diffusion models guided by contrastive loss, demonstrating significant improvements over existing adversarial training methods. Extensive experiments show that the proposed approach effectively removes adversarial attacks while preserving semantic information across various datasets.
- Keywords: Adversarial defense, Adversarial purification, Diffusion models, Contrastive guidance, Image classification, Deep learning, Adversarial attacks, Misclassification of DNNs, Improved adversarial purification methods, Theoretical derivation of noise levels, CIFAR-10, CIFAR-100, German Traffic Sign Recognition Benchmark, ImageNet, ResNet, WideResNet, Generative models
On the Complexity of Finite-Sum Smooth Optimization under the Polyak–Łojasiewicz Condition (Spotlight Poster)
- Authors: Yunyan Bai, Yuxing Liu, Luo Luo
- Affiliations: School of Data Science, Fudan University, Shanghai, China, School of Data Science, Fudan University, Shanghai, China; Shanghai Key Laboratory for Contemporary Applied Mathematics, Shanghai, China
- TL;DR: This paper investigates the complexity of finite-sum smooth optimization under the Polyak–Łojasiewicz condition, demonstrating that any gradient method requires a significant number of incremental first-order oracle calls to achieve an ε-suboptimal solution. It also provides lower bounds for communication and time costs in distributed settings and proposes a decentralized first-order method that approaches these bounds.
- Keywords: Finite-sum optimization, Polyak–Łojasiewicz condition, Incremental first-order oracle (IFO) methods, Gradient descent (GD), Deep neural networks, Reinforcement learning, Optimal control, Matrix recovery, Nonconvex optimization, Complexity of finding stationary points, Lower bounds for communication rounds, time cost, and local first-order oracle calls; Decentralized first-order method
On the Identifiability of Switching Dynamical Systems (Poster)
- Authors: Carles Balsells-Rodas, Yixin Wang, Yingzhen Li
- Affiliations: University of Michigan, Imperial College London
- TL;DR: This study investigates the identifiability of Switching Dynamical Systems, establishing identification conditions for Markov Switching Models and demonstrating their practical applications in segmenting high-dimensional time series and causal discovery in climate data. The findings contribute to the theoretical understanding of sequential latent variable models and propose estimation algorithms for identifiable systems.
- Keywords: Identifiability, Latent Variable Models, Switching Dynamical Systems, Markov Switching Models, Non-linear Gaussians, Affine Transformations, High-dimensional Time Series, Causal Discovery, Climate Data, Identifiability Analysis, Sequential Generative Models, Estimation Algorithms, Identification Conditions, State-space Models, Hidden Markov Models, Autoregressive Connections
Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better (Poster)
- Authors: Vicente Balmaseda, Ying Xu, Yixin Cao, Nate Veldt
- Affiliations: Department of Computing, Hong Kong Polytechnic University, Hong Kong, China, Department of Computer Science and Engineering, Texas A&M University, College Station, Texas, USA
- TL;DR: This paper presents improved approximation algorithms for the NP-hard cluster deletion problem, enhancing previous guarantees from 4 to 3. The authors also introduce a new combinatorial approach that is more scalable for practical applications.
- Keywords: Cluster deletion, graph clustering, correlation clustering, Approximation algorithms, linear programming, Computational biology, social network analysis, NP-hard problems, edge deletion to form cliques, Improved approximation guarantees, combinatorial algorithms
How Learning by Reconstruction Produces Uninformative Features For Perception (Poster)
- Authors: Randall Balestriero, Yann LeCun
- Affiliations: NYU, Brown University
- TL;DR: The study investigates the misalignment between learning to reconstruct data and learning for perception, demonstrating that reconstruction can lead to uninformative features. It finds that using denoising strategies can improve the alignment and performance on perception tasks.
- Keywords: representation learning, perception, reconstruction, deep Autoencoder, denoising, image recognition, supervised learning, misalignment between reconstruction and perception, uninformative features, long training schedules, improved accuracy through noise strategies, detection of non-beneficial noise strategies, TinyImagenet
On Mechanistic Knowledge Localization in Text-to-Image Generative Models (Poster)
- Authors: Samyadeep Basu, Keivan Rezaei, Priyatham Kattakinda, Vlad Morariu, Nanxuan Zhao, Ryan A Rossi, Varun Manjunatha, Soheil Feizi
- Affiliations: University of Maryland, Adobe Research
- TL;DR: This study introduces mechanistic localization in text-to-image generative models to identify layers controlling visual attributes, facilitating efficient model editing. The authors present methods LOCOGEN and LOCOEDIT, addressing challenges in knowledge localization and model editing in recent models like SD-XL and DeepFloyd.
- Keywords: text-to-image generative models, mechanistic localization, model editing, causal tracing, LOCOGEN, LOCOEDIT, image generation, model editing, challenges in model editing, knowledge localization, efficient model editing methods, neuron-level model editing, LAION-5B, MS-COCO, UNet, CLIP text-encoder, Stable-Diffusion
Analyzing $D^\alpha$ seeding for $k$-means (Poster)
- Authors: Etienne Bamas, Sai Ganesh Nagarajan, Ola Svensson
- Affiliations: ETH AI Center, Zurich, Switzerland, EPFL, Zuse Institute Berlin
- TL;DR: This paper analyzes the Dα seeding algorithm for k-means clustering, demonstrating that for any α > 2, it provides improved approximation guarantees for the standard k-means cost. The findings indicate that the choice of α significantly influences the clustering performance, particularly in relation to the distribution of data points.
- Keywords: clustering, k-means, Dα seeding, k-means++, Dα seeding algorithm, (k,α)-clustering cost, approximation guarantees, O(log k) approximation, experimental validation, Gaussian distributions, mixing weights
Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions (Spotlight Poster)
- Authors: Yongqiang Cai
- Affiliations: School of Mathematical Sciences, Laboratory of Mathematics and Complex Systems, MOE, Beijing Normal University, Beijing, 100875, China
- TL;DR: This study investigates the universal approximation capabilities of deep neural networks by demonstrating the existence of a finite vocabulary of mappings that can approximate any continuous function on a compact domain. The findings suggest a novel compositional model for regular languages and highlight the potential of sequence modeling in addressing non-sequential problems.
- Keywords: deep learning, sequence modeling, universal approximation, composite functions, linear mappings, nonlinear mappings, residual networks, neural ODE, natural language processing, image recognition, reinforcement learning, transforming non-sequential problems into sequential ones, handling long-term dependencies, finite vocabulary for universal approximation, approximation power of mapping compositions, RNNs, Transformers, BERT, GPT, ResNets
Neural Networks Learn Statistics of Increasing Complexity (Poster)
- Authors: Nora Belrose, Quintin Pope, Lucia Quirke, Alex Mallen, Xiaoli Fern
- Affiliations: EleutherAI, Oregon State University
- TL;DR: This study investigates the distributional simplicity bias (DSB) in neural networks, demonstrating that they initially learn low-order statistics of data distributions before higher-order correlations. The findings reveal that early-training networks can effectively generalize to maximum-entropy distributions that match the low-order statistics of their training data, but this ability diminishes later in training.
- Keywords: distributional simplicity bias, neural networks, low-order statistics, Taylor expansion, optimal transport methods, image classification, synthetic data generation, generalization behavior, reliance on statistics of different orders, criteria for model sensitivity to statistics, empirical evidence for DSB, maximum-entropy distributions, token n-gram frequencies, embedding vectors
Relational DNN Verification With Cross Executional Bound Refinement (Poster)
- Authors: Debangshu Banerjee, Gagandeep Singh
- Affiliations: Department of Computer Science, University of Illinois Urbana-Champaign, USA, Department of Computer Science, University of Illinois Urbana-Champaign, USA; VMware Research, USA
- TL;DR: This paper presents RACoon, a scalable relational verifier for deep neural networks that improves the precision of verifying relational properties by utilizing cross-execution dependencies across all layers. The study addresses the limitations of existing DNN verification techniques, particularly in handling universal adversarial perturbations and other relational properties.
- Keywords: relational properties, deep neural networks (DNNs), verification, scalable relational verifier, cross-execution dependencies, MILP (Mixed Integer Linear Program), autonomous driving, medical diagnosis, robustness against universal adversarial perturbations (UAP), imprecise relational properties, computational expense of verification, RACoon (the developed verifier), improved precision over SOTA baselines, universal adversarial perturbations (UAP), worst-case hamming distance
Neural Diffusion Models (Poster)
- Authors: Grigory Bartosh, Dmitry Vetrov, Christian Andersson Naesseth
- Affiliations: University of Amsterdam, Constructor University, Bremen
- TL;DR: This paper introduces Neural Diffusion Models (NDMs), a framework that allows for non-linear, time-dependent transformations in generative modeling, significantly improving the performance of diffusion models. NDMs achieve state-of-the-art results on various image generation benchmarks, including ImageNet and CelebA-HQ, by optimizing the reverse process and utilizing learnable transformations.
- Keywords: generative models, diffusion models, Neural Diffusion Models (NDMs), non-linear transformations, variational bound, stochastic differential equations (SDE), ordinary differential equations (ODE), image generation, data augmentation, unsupervised learning, limitations of fixed forward processes, optimization of reverse processes, state-of-the-art results, simulation-free training, learnable transformations, MNIST, CIFAR-10, ImageNet, CelebA-HQ
Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies (Poster)
- Authors: Brian Bartoldson, James Diffenderfer, Konstantinos Parasyris, Bhavya Kailkhura
- Affiliations: Lawrence Livermore National Laboratory
- TL;DR: This study investigates the challenges of achieving adversarial robustness in image classifiers, particularly on the CIFAR10 dataset, revealing that while scaling can improve performance, a plateau in robustness exists around 90%. The findings suggest that current adversarial attack formulations need rethinking to produce valid images consistent with their original labels.
- Keywords: Adversarial robustness, Image classification, Adversarial training, Scaling laws, Image recognition, Adversarial machine learning, Robustness to ℓ∞-norm bounded perturbations, Limitations of current adversarial defenses, New scaling laws for adversarial robustness, Improved compute-efficient models, CIFAR10, AutoAttack
The Role of Learning Algorithms in Collective Action (Poster)
- Authors: Omri Ben-Dov, Jake Fawkes, Samira Samadi, Amartya Sanyal
- Affiliations: Department of Statistics, University of Oxford, Max Planck Institute for Intelligent Systems, Tübingen, Germany; Tübingen AI Center, Max Planck Institute for Intelligent Systems, Tübingen, Germany
- TL;DR: This study investigates how the choice of learning algorithms affects the success of collective action in machine learning, revealing that small collectives perform better with Distributionally Robust Optimization (DRO), while larger collectives may underperform contrary to expectations. The findings emphasize the importance of considering algorithm properties in collective action strategies.
- Keywords: Collective action in machine learning, impact of learning algorithms, Distributionally Robust Optimization (DRO), Stochastic Gradient Descent (SGD), Performance on minority sub-populations, group error optimization, Higher success of small collectives with DRO, simplicity bias in learning algorithms, Bayes classifiers, empirical risk minimization (ERM), simplicity bias
By Tying Embeddings You Are Assuming the Distributional Hypothesis (Spotlight Poster)
- Authors: Bertolotti Francesco, Walter Cazzola
- Affiliations: Department of Computer Science, Università degli Studi di Milano, Milan, Italy
- TL;DR: This study investigates the impact of tied input-output embeddings in language models, revealing that such embeddings reflect the distributional hypothesis where semantically similar words share similar representations. The findings suggest that weight tying is effective when the distributional hypothesis is valid for the data, providing insights into the organization of embeddings in foundational language models.
- Keywords: tied input-output embeddings, distributional hypothesis, semantic space, Natural Language Processing (NLP), semantic organization of embeddings, effectiveness of weight tying, foundational language models, semantic equivalence
Refining Minimax Regret for Unsupervised Environment Design (Poster)
- Authors: Michael Beukman, Samuel Coward, Michael Matthews, Mattie Fellows, Minqi Jiang, Michael Dennis, Jakob Foerster
- Affiliations: UC Berkeley, University College London, University of Oxford
- TL;DR: This study introduces Bayesian level-perfect minimax regret (BLP) as a refinement of the minimax regret objective in unsupervised environment design, addressing the issue of learning stagnation caused by training on high-regret levels. The proposed ReMiDi algorithm enables continued learning and improvement beyond the limitations of traditional minimax regret policies.
- Keywords: Unsupervised Environment Design, Reinforcement Learning, Minimax Regret, Bayesian Level-Perfect MMR, ReMiDi Algorithm, Regret Stagnation, Learning Stagnation, BLP Policies, Robustness Guarantees
VNN: Verification-Friendly Neural Networks with Hard Robustness Guarantees (Poster)
- Authors: Anahita Baninajjar, Ahmed Rezine, Amir Aminifar
- Affiliations: Department of Computer and Information Science, Linköping University, Linköping, Sweden, Department of Electrical and Information Technology, Lund University, Lund, Sweden
- TL;DR: This study introduces Verification-Friendly Neural Networks (VNNs) that balance prediction performance with formal verification capabilities, addressing scalability challenges in verifying Deep Neural Networks (DNNs). The proposed framework demonstrates significant efficiency improvements, allowing for verification of up to 76 times more samples while maintaining accuracy in safety-critical applications.
- Keywords: Verification-Friendly Neural Networks, Deep Neural Networks, Formal Verification, Post-training optimization, Sparsity enforcement, Safety-critical applications, Medical domain (epileptic seizure detection, cardiac arrhythmia detection), Lack of formal correctness guarantees, Scalability challenges in verification, Over-approximation in verification frameworks, Robustness establishment for VNNs, Efficiency in verification time, Increased sample verification capability, MNIST, CHB-MIT, MIT-BIH, Adversarial examples, Verification frameworks
Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation (Poster)
- Authors: Luca Beurer-Kellner, Marc Fischer, Martin Vechev
- Affiliations: Department of Computer Science, ETH Zürich, Switzerland
- TL;DR: This paper presents DOMINO, a novel decoding algorithm for large language models that enforces syntactic constraints during text generation with minimal performance overhead. The method significantly improves task accuracy and can achieve nearly double the speed of traditional unconstrained decoding approaches.
- Keywords: Large Language Models, Constrained Generation, Constrained Decoding, Subword Alignment, Speculative Decoding, Performance Overhead, Task Accuracy, Token Misalignment, DOMINO Algorithm, Speedup in Decoding
Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features (Poster)
- Authors: Aleksandr Beznosikov, David Dobre, Gauthier Gidel
- Affiliations: Universite de’ Montreal and Mila, Canada, Universite de’ Montreal and Mila, Canada; Canada CIFAR AI Chair, Innopolis University, Russia; Skolkovo Institute of Science and Technology, Russia; Moscow Institute of Physics and Technology, Russia; Yandex, Russia
- TL;DR: This paper presents two new variants of the Frank-Wolfe method for stochastic finite-sum minimization, achieving the best convergence rates for both convex and non-convex functions while avoiding the need for large batches or full gradients. The proposed methods are experimentally validated to demonstrate faster theoretical rates compared to existing approaches.
- Keywords: Constrained optimization, Stochastic optimization, Frank-Wolfe method, Stochastic finite-sum minimization, Machine learning, Empirical risk minimization, Large datasets, Expensive full gradient computation, Projection-free methods, New variants of Frank-Wolfe algorithms, Improved convergence guarantees, Linear minimization oracle (LMO), Conditional Gradient algorithm
Monotone Individual Fairness (Poster)
- Authors: Yahav Bechavod
- Affiliations: Department of Computer and Information Sciences, University of Pennsylvania
- TL;DR: This paper addresses the challenge of ensuring individual fairness in online learning by extending existing frameworks to allow for more flexible auditing schemes. The authors present new algorithms that improve both predictive accuracy and fairness, achieving better bounds on regret and fairness violations compared to previous methods.
- Keywords: individual fairness, online learning, predictive accuracy, monotone aggregation functions, oracle-efficient algorithms, lending, hiring, education, healthcare, fairness violations, regret in online learning, improved bounds for oracle-efficient algorithms, computational efficiency, Lipschitz condition, adversarial online learning
Position: Scaling Simulation is Neither Necessary Nor Sufficient for In-the-Wild Robot Manipulation (Poster)
- Authors: Homanga Bharadhwaj
- Affiliations: The Robotics Institute, School of Computer Science, Carnegie Mellon University
- TL;DR: This paper critiques the reliance on scaling robotic simulations for achieving effective real-world manipulation, arguing that such scaling is neither necessary nor sufficient for developing robots that can generalize across diverse tasks while adhering to human preferences. The authors emphasize the unique challenges of real-world environments and the need for more principled approaches in robotic manipulation research.
- Keywords: robotic manipulation, real-world deployment, human preferences, scaling simulation, zero-shot generalization, dynamic environments, high-dimensional state-space
CoLoRA: Continuous low-rank adaptation for reduced implicit neural modeling of parameterized partial differential equations (Poster)
- Authors: Jules Berman, Benjamin Peherstorfer
- Affiliations: Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
- TL;DR: This study presents Continuous Low-Rank Adaptation (CoLoRA) for efficiently modeling the evolution of solution fields in parameterized partial differential equations, allowing for rapid predictions under varying physics parameters and initial conditions. CoLoRA significantly outperforms classical methods in speed and accuracy, particularly in data-scarce scenarios.
- Keywords: Reduced modeling, Continuous Low-Rank Adaptation (CoLoRA), Parameterized Partial Differential Equations (PDEs), Low-Rank Adaptation (LoRA), Galerkin-optimal approximations, Science and engineering, Fluid mechanics, Heat transfer, Data scarcity, High computational cost of numerical methods, Kolmogorov barrier, Nonlinear reduced modeling, Efficient approximation of solution fields
Best of Both Worlds Guarantees for Smoothed Online Quadratic Optimization (Poster)
- Authors: Neelkamal Bhuyan, Debankur Mukherjee, Adam Wierman
- Affiliations: H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA, Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- TL;DR: This study investigates the smoothed online quadratic optimization (SOQO) problem, presenting a new online optimal algorithm called Lazy Adaptive Interpolation (LAI) that performs well in both adversarial and stochastic settings. The findings highlight a trade-off between adversarial and stochastic performance, leading to a best-of-both-worlds algorithm that balances robust adversarial performance with near-optimal stochastic outcomes.
- Keywords: Smoothed Online Quadratic Optimization (SOQO), Adversarial and Stochastic Settings, Lazy Adaptive Interpolation (LAI), Dynamic Interpolation Algorithm, Smart Grid Management, Adaptive Control, Data Center Management, Quadratic Hitting Cost, ℓ2-norm Switching Cost, Stochastic Analysis, Online Optimal Algorithm, Stochastic-Adversarial Trade-off
Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning (Poster)
- Authors: Matteo Bettini, Ryan Kortvelesy, Amanda Prorok
- Affiliations: Department of Computer Science, University of Cambridge, Cambridge, UK
- TL;DR: This study introduces Diversity Control (DiCo), a method for precisely controlling behavioral diversity in Multi-Agent Reinforcement Learning (MARL) without altering the learning objective. The method is theoretically validated and empirically demonstrated to enhance performance and sample efficiency in MARL tasks.
- Keywords: Multi-Agent Reinforcement Learning (MARL), Behavioral Diversity, Diversity Control (DiCo), Actor-Critic Algorithms, Cooperative and Competitive Tasks in MARL, Controlling diversity to an exact value, Sample Efficiency, Novel method for diversity control, Theoretical proofs of diversity achievement, Policy Parameter Sharing, Homogeneous and Heterogeneous Policies
Total Variation Distance Meets Probabilistic Inference (Poster)
- Authors: Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel, Dimitrios Myrisiotis, A. Pavan, N. Vinodchandran
- Affiliations: Department of Computer Science, University of Toronto, Canada, School of Computing, University of Nebraska - Lincoln, USA, Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, India, School of Computing, National University of Singapore, Singapore, Department of Computer Science, Iowa State University, USA, CNRS@CREATE LTD., Singapore
- TL;DR: This paper establishes a novel connection between total variation distance estimation and probabilistic inference, leading to a fully polynomial randomized approximation scheme for estimating TV distances in Bayesian networks with small treewidth. The findings provide efficient algorithms for statistical distance estimation, enhancing the understanding of high-dimensional distributions.
- Keywords: Total Variation Distance, Probabilistic Inference, Graphical Models, Fully Polynomial Randomized Approximation Scheme (FPRAS), Relative Approximation, Bayesian Networks, High-Dimensional Distributions, Estimation of TV Distances, Approximation Schemes, New notion of partial couplings, Efficient algorithms for TV distance estimation, Bayes nets, Treewidth
Naive Bayes Classifiers over Missing Data: Decision and Poisoning (Poster)
- Authors: Song Bian, Xiating Ouyang, ZHIWEI FAN, Paris Koutris
- Affiliations: Department of Computer Sciences, University of Wisconsin-Madison, Madison WI, USA
- TL;DR: This study investigates the certifiable robustness of Naive Bayes Classifiers on dirty datasets with missing values, demonstrating efficient algorithms for determining robustness and addressing data poisoning attacks. The findings suggest that understanding data quality can reduce the need for exhaustive data cleaning processes.
- Keywords: certifiable robustness, machine learning classifiers, dirty datasets, Naive Bayes Classifiers (NBC), polynomial time algorithms, missing values, data cleaning, data poisoning attacks, efficient algorithms, robustness decision-making
Biharmonic Distance of Graphs and its Higher-Order Variants: Theoretical Properties with Applications to Centrality and Clustering (Poster)
- Authors: Mitchell Black, Lucy Lin, Weng-Keen Wong, Amir Nayyeri
- Affiliations: School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, USA
- TL;DR: This study investigates the biharmonic distance as a variant of effective resistance in graphs, demonstrating its significance in measuring edge importance to global topology. The authors introduce clustering algorithms based on this distance and explore its applications in edge centrality and graph clustering.
- Keywords: biharmonic distance, effective resistance, graph connectivity, clustering algorithms, k-harmonic distance, centrality, graph clustering, measuring distance between vertices, global topology of graphs, theoretical results connecting biharmonic distance to graph connectivity measures
Position: Explain to Question not to Justify (Poster)
- Authors: Przemyslaw Biecek, Wojciech Samek
- Affiliations: Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Germany; Department of Electrical Engineering and Computer Science, Technical University of Berlin, Germany; BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany, MI2.AI, University of Warsaw, Poland; MI2.AI, Warsaw University of Technology, Poland
- TL;DR: This paper discusses the current crisis in Explainable Artificial Intelligence (XAI) due to conflicting goals and highlights the under-explored area of model/validation-oriented explanations (RED XAI) as a promising research avenue. The authors argue for the need for more methods to enhance explainability and ensure the safety of AI systems.
- Keywords: Explainable Artificial Intelligence (XAI), RED XAI, BLUE XAI, AI safety, model validation, Divergent goals in XAI, flawed AI models, need for explainability methods, New challenges in RED XAI, methods for questioning models, Interpretability, trust, fairness, causal reasoning
Improving fine-grained understanding in image-text pre-training (Poster)
- Authors: Ioana Bica, Anastasija Ilic, Matthias Bauer, Goker Erdogan, Matko Bošnjak, Christos Kaplanis, Alexey Gritsenko, Matthias Minderer, Charles Blundell, Razvan Pascanu, Jovana Mitrovic
- Affiliations: Google DeepMind, London, UK, Google DeepMind, Zurich, Switzerland
- TL;DR: This paper introduces SPARse fine-grained Contrastive alignment (SPARC) to enhance multimodal representations from image-text pairs, addressing the challenge of fine-grained visual information loss. The method shows improved performance on both coarse-grained and fine-grained tasks, including classification and object detection.
- Keywords: fine-grained multimodal representations, image-text pre-training, SPARse fine-grained Contrastive alignment (SPARC), sparse similarity metric, fine-grained sequence-wise loss, image classification, retrieval, object detection, segmentation, discarding fine-grained visual information, poor performance on localization, counting, understanding spatial relationships, improved performance on fine-grained tasks, learning representations that encode global and local information
Generalization in Kernel Regression Under Realistic Assumptions (Spotlight Poster)
- Authors: Daniel Barzilai, Ohad Shamir
- Affiliations: Weizmann Institute of Science
- TL;DR: This paper provides a unified theory to upper bound the excess risk of kernel regression under realistic assumptions, demonstrating that many kernels exhibit self-regularization properties that lead to good generalization even in high-dimensional settings. The findings highlight the implications for understanding generalization in over-parameterized models, particularly neural networks.
- Keywords: kernel regression, generalization, over-parameterization, self-regularization, eigenvalue perturbation bounds, neural networks, high-dimensional data, bias-variance tradeoff, overfitting, noise fitting, upper bounds for excess risk, convergence rates for regularized regression, Neural Tangent Kernel (NTK), Gaussian Process Kernel (GPK)
Standardized Interpretable Fairness Measures for Continuous Risk Scores (Poster)
- Authors: Ann-Kristin Becker, Oana Dumitrasc, Klaus Broelemann
- Affiliations: SCHUFA Holding AG, Wiesbaden, Germany
- TL;DR: This paper proposes a standardized method for measuring fairness in continuous risk scores using the Wasserstein distance, which effectively quantifies group disparities and outperforms traditional ROC-based measures. The findings highlight the importance of monitoring biases across different models and populations to ensure non-discriminatory decision-making processes.
- Keywords: fairness measures, group disparities, continuous risk scores, Wasserstein distance, standardized fairness measures, finance, education, social media, medicine, algorithmic fairness, discrimination based on protected characteristics, biases in risk scoring, novel approach to quantifying group disparities, monitoring bias over time, ROC-based fairness measures
Why do Variational Autoencoders Really Promote Disentanglement? (Poster)
- Authors: Pratik Bhowal, Achint Soni, Sirisha Rambhatla
- Affiliations: NVIDIA, India, Department of Management Science and Engineering, University of Waterloo, Ontario, Canada, Department of Computer Science, University of Waterloo, Ontario, Canada
- TL;DR: This study investigates why Variational Autoencoders (VAEs) effectively promote disentangled representation learning (DRL), focusing on the orthogonality properties of the decoder. The findings provide theoretical insights and experimental corroboration, emphasizing the importance of understanding DRL capabilities in real-world VAEs.
- Keywords: Disentangled Representation Learning (DRL), Variational Autoencoders (VAEs), Computer Vision, Text-based Media Generation, Understanding disentanglement in real-world VAEs, latent space rotation, Theoretical establishment of orthogonality properties of the decoder, dSprites dataset
Scale-Free Image Keypoints Using Differentiable Persistent Homology (Poster)
- Authors: Giovanni Barbarani, Francesco Vaccarino, Gabriele Trivigno, Marco Guerra, Gabriele Berton, Carlo Masone
- Affiliations: Institut Fourier, Université Grenoble Alpes, France, Department of Control and Computer Engineering, Politecnico di Torino, Italy, Department of Mathematical Sciences "Giuseppe Luigi Lagrange", Politecnico di Torino, Italy, Department of Control and Computer Engineering, Politecnico di Torino, Italy; Department of Control and Computer Engineering, Politecnico di Torino, Italy
- TL;DR: This paper presents MorseDet, a novel keypoint detection model that utilizes Morse theory and persistent homology to address scale dependency and flexibility issues in existing methods. The proposed approach demonstrates competitive performance in keypoint repeatability, marking a significant advancement in topology-based learning for feature detection.
- Keywords: keypoint detection, computer vision, topological learning, Morse theory, persistent homology, differentiable loss function, robotics, image retrieval, visual localization, SLAM, 3D reconstruction, place recognition, scale dependency, lack of flexibility in existing methods, MorseDet (topology-based learning model), competitive performance in keypoint repeatability
Dynamic Survival Analysis with Controlled Latent States (Poster)
- Authors: Linus Bleistein, Van NGUYEN, Adeline Fermanian, Agathe Guilloux
- Affiliations: Inria Paris, F-75015 Paris, France; Centre de Recherche des Cordeliers, INSERM, Université de Paris, Sorbonne Université, F-75006 Paris, France, Inria Paris, F-75015 Paris, France; LOPF, Califrais’ Machine Learning Lab, Paris, France; Laboratoire de Probabilités, Statistique et Modélisation, LPSM, Univ. Paris Cité, F-75005, Paris, France, Inria Paris, F-75015 Paris, France; Centre de Recherche des Cordeliers, INSERM, Université de Paris, Sorbonne Université, F-75006 Paris, France; LaMME, UEVE and UMR 8071, Paris Saclay University, F-91042, Evry, France, LOPF, Califrais’ Machine Learning Lab, Paris, France
- TL;DR: This study introduces a novel approach for learning individual-specific intensities of counting processes using controlled differential equations and neural estimators. The proposed models, including a signature-based estimator called CoxSig, demonstrate strong performance across various real-world datasets in fields such as finance and healthcare.
- Keywords: survival analysis, time-to-event data, individual-specific intensities, controlled differential equations, neural controlled differential equations, signature-based estimator (CoxSig), finance, predictive maintenance, food supply chain management, healthcare, predicting event occurrence times, modeling individual-specific intensity, handling time-dependent data, theoretical learning guarantees, flexible models for intensity prediction, counting processes, Cox models, Hawkes processes, deep architectures
Stability Evaluation through Distributional Perturbation Analysis (Poster)
- Authors: Jose Blanchet, Peng Cui, Jiajin Li, Jiashuo Liu
- Affiliations: Department of Management Science and Engineering, Stanford University; Department of Computer Science and Technology, Tsinghua University, Department of Management Science and Engineering, Stanford University, Department of Computer Science and Technology, Tsinghua University
- TL;DR: This paper proposes a stability evaluation criterion based on distributional perturbations to assess the reliability of learning models in out-of-sample environments. The authors demonstrate the practical utility of their criterion through empirical studies, addressing challenges such as data corruptions and sub-population shifts.
- Keywords: stability evaluation, distributional perturbations, optimal transport (OT) discrepancy, strong duality theorem, healthcare, economics, self-driving, out-of-sample performance, data corruptions, sub-population shifts, stability evaluation criterion, tractable convex formulations, computational methods
Multi-Patch Prediction: Adapting Language Models for Time Series Representation Learning (Poster)
- Authors: Yuxuan Bian, Xuan Ju, Jiangtong Li, Zhijian Xu, Dawei Cheng, Qiang Xu
- Affiliations: The Chinese University of Hong Kong; Tongji University, Tongji University, The Chinese University of Hong Kong
- TL;DR: This study introduces aLLM4TS, a framework that adapts Large Language Models for time-series representation learning by reconceptualizing forecasting as a self-supervised, multi-patch prediction task. The framework demonstrates superior performance in capturing temporal dynamics and enhancing transferability across various downstream tasks.
- Keywords: time-series representation learning, self-supervised learning, multi-patch prediction, Large Language Models (LLMs), patch-wise decoding, causal continual pre-training, time-series analysis, forecasting, classification, anomaly detection, challenges in generating comprehensive time-series representations, temporal dynamics capture, aLLM4TS framework, enhanced transferability of temporal representations
Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling (Poster)
- Authors: Denis Blessing, Xiaogang Jia, Johannes Esslinger, Francisco Vargas, Gerhard Neumann
- Affiliations: Autonomous Learning Robots, Karlsruhe Institute of Technology, Karlsruhe, Germany; FZI Research Center for Information Technology, Karlsruhe, Germany, Autonomous Learning Robots, Karlsruhe Institute of Technology, Karlsruhe, Germany, University of Cambridge, Cambridge, United Kingdom
- TL;DR: This study introduces a comprehensive benchmark for evaluating variational methods for sampling, addressing the challenges of disparate performance measures and mode collapse. The findings highlight the strengths and weaknesses of existing methods, providing valuable insights for future developments in sampling techniques.
- Keywords: Variational Inference, Monte Carlo methods, sampling methods, Annealed Importance Sampling (AIS), Sequential Monte Carlo (SMC), integral probability metrics (IPMs), maximum mean discrepancy, Wasserstein distance, Bayesian statistics, natural sciences, Intractable probability distributions, mode collapse, lack of standardized evaluation protocols, Benchmark for evaluating sampling methods, novel metrics for quantifying mode collapse
Shifted Interpolation for Differential Privacy (Poster)
- Authors: Jinho Bok, Weijie Su, Jason Altschuler
- Affiliations: Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, PA, USA
- TL;DR: This paper presents a refined analysis of differential privacy in the context of noisy gradient descent algorithms, establishing the "privacy amplification by iteration" phenomenon within the f-differential privacy framework. The findings lead to tighter privacy bounds and the first exact privacy analysis for strongly convex optimization settings.
- Keywords: differential privacy, private optimization, machine learning, noisy gradient descent, f-differential privacy, shifted interpolated processes, privacy leakage, tight characterizations, convex optimization, privacy amplification by iteration, exact privacy analysis, generalizations beyond divergence-based relaxations, convex losses, strongly convex optimization, stochastic gradient descent
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features (Poster)
- Authors: Simone Bombari, Marco Mondelli
- Affiliations: Institute of Science and Technology, Austria
- TL;DR: This study investigates the concept of word sensitivity in attention layers compared to fully connected architectures, demonstrating that attention layers exhibit higher word sensitivity, which enhances their generalization capabilities in NLP tasks. The findings suggest that the softmax function in attention layers plays a crucial role in this property, leading to better performance in capturing contextual meaning.
- Keywords: word sensitivity, attention layers, transformers, natural language processing, random features, softmax, BERT-Base, NLP tasks, contextual meaning, semantic changes, generalization bounds, high word sensitivity, low word sensitivity in random features, imdb review dataset
Position: Machine Learning-powered Assessments of the EU Digital Services Act Aid Quantify Policy Impacts on Online Harms (Poster)
- Authors: Eleonora Bonel, Luca Nannini, Davide Bassi, Michele Maggini
- Affiliations: École d’Affaires Publique, Sciences Po, Paris, France, Centro Singular de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS), Santiago de Compostela, Spain; Minsait, Indra Sistemas SA, Madrid, Spain, Centro Singular de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS), Santiago de Compostela, Spain
- TL;DR: The paper evaluates the impact of the EU Digital Services Act on mitigating online harms, particularly disinformation, while highlighting the potential of machine learning techniques to assess regulatory effectiveness. It calls for improved data access and methodological approaches to better understand the DSA's real-world implications.
- Keywords: Machine Learning, Digital Services Act, Online Harms, Disinformation, Large Language Models, Generative Modelling, Online Platforms, Policy Evaluation, Disinformation Spread, Algorithmic Accountability, Content Moderation, Quantifying Policy Impacts, Data-Driven Approaches
Improving Neural Additive Models with Bayesian Principles (Poster)
- Authors: Kouroche Bouchiat, Alexander Immer, Hugo Yèche, Gunnar Ratsch, Vincent Fortuin
- Affiliations: ETH Zürich, Zürich, Switzerland; Max Planck Institute for Intelligent Systems, Tübingen, Germany, Helmholtz AI, Munich, Germany; TU Munich, Munich, Germany; Munich Center for Machine Learning, Munich, Germany, ETH Zürich, Zürich, Switzerland, ETH Zürich, Zürich, Switzerland; None
- TL;DR: This study proposes a Bayesian variant of Neural Additive Models (LA-NAMs) that enhances model transparency and uncertainty estimation while enabling feature selection and interaction ranking. The proposed method demonstrates improved performance on tabular datasets and real-world medical tasks compared to existing models.
- Keywords: Neural Additive Models, Bayesian Principles, Transparency in Deep Learning, Laplace Approximation, Empirical Bayes Procedure, Tabular Datasets, Medical Tasks, Lack of Calibrated Uncertainties, Overconfidence in Models, Feature Selection, Improved Uncertainty Estimation, Feature Interaction Selection, Generalized Additive Models, Deep Neural Networks
Applying language models to algebraic topology: generating simplicial cycles using multi-labeling in Wu's formula (Poster)
- Authors: Kirill Brilliantov, Fedor Pavutnitskiy, Dmitrii A. Pasechniuk, German Magai
- Affiliations: Beijing Institute of Mathematical Sciences and Applications, Beijing; Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi; Ivannikov Institute for System Programming of the Russian Academy of Sciences, Moscow, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi; Ivannikov Institute for System Programming of the Russian Academy of Sciences, Moscow, HSE University, Moscow; Noeon Research, Tokyo, ETH, Zurich
- TL;DR: This paper explores the use of machine learning to generate simplicial cycles in algebraic topology, specifically through the lens of Wu's formula. The authors propose a framework that reformulates the problem as sampling from algorithmic datasets, aiming to enhance understanding of the group-theoretic structure of homotopy groups.
- Keywords: algebraic topology, homotopy groups, machine learning, language modeling, multi-labeling, simplicial groups, mathematical research, generating simplicial cycles, sampling from intersections of normal subgroups, proof-of-concept framework, algorithmic datasets, Dyck languages, Wu’s formula, simplicial cycles
Private Gradient Descent for Linear Regression: Tighter Error Bounds and Instance-Specific Uncertainty Estimation (Poster)
- Authors: Gavin Brown, Krishnamurthy Dvijotham, Georgina Evans, Daogao Liu, Adam Smith, Abhradeep Guha Thakurta
- Affiliations: Department of Computer Science, Boston University, Google DeepMind, Paul G Allen School of Computer Science and Engineering, University of Washington; Boston University
- TL;DR: This paper presents an improved analysis of differentially private gradient descent for linear regression, demonstrating that the sample complexity can be linearly dependent on the data dimension. The authors derive tighter error bounds and establish methods for constructing adaptive conﬁdence intervals for the empirical optimizer.
- Keywords: differential privacy, linear regression, gradient descent, noisy gradient descent (DP-GD), Gaussian distribution, machine learning, statistical learning, sample complexity, parameter estimation, privacy distortion, tighter error bounds, conﬁdence intervals, empirical optimizer, Gaussian process, ordinary least squares estimator
Semantically-correlated memories in a dense associative model (Poster)
- Authors: Thomas F Burns
- Affiliations: Institute for Computational and Experimental Research in Mathematics, Brown University, USA; SciAI Center, Cornell University, USA; Neural Coding and Brain Computing Unit, OIST Graduate University, Japan
- TL;DR: This study introduces the Correlated Dense Associative Memory (CDAM) model, which integrates auto- and hetero-association for continuous-valued memory patterns. The model demonstrates effectiveness in real-world applications, including image retrieval and simulating finite automata, while revealing distinct dynamical modes of memory association.
- Keywords: associative memory, continuous-valued memory patterns, Correlated Dense Associative Memory (CDAM), anti-Hebbian learning rules, image retrieval, neuroscience experiments, finite automata simulation, multi-scale representations, dynamic attractors
Differentially Private Bias-Term Fine-tuning of Foundation Models (Poster)
- Authors: Zhiqi Bu, Yu-Xiang Wang, Sheng Zha, George Karypis
- Affiliations: Amazon AI; University of California, San Diego, Amazon AI
- TL;DR: This study introduces differentially private bias-term fine-tuning (DP-BiTFiT) for large pre-trained models, achieving state-of-the-art accuracy while significantly improving computational efficiency. The method allows for effective fine-tuning on sensitive data without the extensive overhead typically associated with differential privacy.
- Keywords: Differential Privacy, Fine-tuning, Large Pre-trained Models, Differentially Private Bias-Term Fine-tuning (DP-BiTFiT), Parameter Efficient Fine-tuning, Language Tasks, Vision Tasks, Privacy Constraints, Computational Overhead, High Accuracy, Efficiency in Fine-tuning
Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees (Poster)
- Authors: Marco Bressan, Mauro Sozio
- Affiliations: Department of Computer Science, University of Milan, Italy, Institut Polytechnique de Paris, Telecom Paris
- TL;DR: This study presents a novel algorithm for maintaining decision trees in a fully-dynamic setting, ensuring both high-quality trees and efficient update times. The algorithm achieves optimal update time guarantees while addressing the challenges posed by adversarial sequences of data insertions and deletions.
- Keywords: Fully-dynamic decision trees, Machine learning, Gini gain, Decision tree algorithms (ID3, C4.5), Data mining, Classification, Regression, Maintaining decision trees under adversarial updates, Dynamic dataset updates, Algorithms with worst-case update time guarantees, Quality maintenance of decision trees
Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples (Poster)
- Authors: Dake Bu, Wei Huang, Taiji Suzuki, Ji Cheng, Qingfu Zhang, Zhiqiang Xu, Hau-San Wong
- Affiliations: Mohamed bin Zayed University of Artificial Intelligence, Masdar, United Arab Emirates, Department of Computer Science, City University of Hong Kong, Hong Kong SAR, Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan; Department of Computer Science, City University of Hong Kong, Hong Kong SAR, Department of Mathematical Informatics, the University of Tokyo, Tokyo, Japan
- TL;DR: This study provides a unified explanation for the success of uncertainty-based and diversity-based query criteria in Neural Network-based Active Learning (NAL) by demonstrating that both prioritize samples containing yet-to-be-learned features. The findings indicate that these strategies lead to lower test errors with smaller labeled datasets compared to passive learning approaches.
- Keywords: Neural Network-based Active Learning (NAL), Deep Active Learning (DAL), Uncertainty-based query criteria, Diversity-based query criteria, 2-layer Neural Networks, Data selection, Sample labeling, Inadequate learning of yet-to-be-learned features, Large test error in passive learning, Unified explanation for query criteria success, Small test error with small labeled set, Feature-noise data model, Easy-to-learn features, Hard-to-learn features
Tackling Prevalent Conditions in Unsupervised Combinatorial Optimization: Cardinality, Minimum, Covering, and More (Poster)
- Authors: Fanchen Bu, Hyeonsoo Jo, Soo Yong Lee, Sungsoo Ahn, Kijung Shin
- Affiliations: Kim Jaechul Graduate School of Artificial Intelligence, KAIST, Seoul, Republic of Korea; School of Electrical Engineering, KAIST, Daejeon, Republic of Korea, Graduate School of Artificial Intelligence, POSTECH, Pohang, Republic of Korea; Department of Computer Science and Engineering, Pohang, Republic of Korea, Kim Jaechul Graduate School of Artificial Intelligence, KAIST, Seoul, Republic of Korea, School of Electrical Engineering, KAIST, Daejeon, Republic of Korea; AI4CO Open-Source Community
- TL;DR: This study addresses challenges in unsupervised combinatorial optimization by deriving nontrivial objectives and derandomization methods under common conditions. The authors validate their approach through extensive experiments, demonstrating improvements in optimization quality and speed.
- Keywords: Combinatorial optimization, Unsupervised learning for combinatorial optimization, Probabilistic method, Derandomization, Discrete optimization, Cardinality constraints, Derandomization challenges, Nontrivial objectives, UCOM2 (Unsupervised Combinatorial Optimization Under Commonly-involved Conditions)
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads (Poster)
- Authors: Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason Lee, Deming Chen, Tri Dao
- Affiliations: Princeton University, University of Connecticut, University of Illinois Urbana-Champaign, Carnegie Mellon University, Princeton University; Together AI
- TL;DR: The paper introduces MEDUSA, a framework that enhances the inference speed of Large Language Models by adding extra decoding heads for parallel token prediction. Experiments show that MEDUSA-1 achieves over 2.2× speedup without quality loss, while MEDUSA-2 further improves speedup to 2.3-2.8×.
- Keywords: Large Language Models, Inference Acceleration, Auto-regressive decoding, Tree-based attention mechanism, Speculative decoding, Natural Language Processing, Inference latency, Memory-bandwidth bottleneck, MEDUSA framework, Fine-tuning procedures, Self-distillation, Acceptance scheme
Langevin Policy for Safe Reinforcement Learning (Poster)
- Authors: Fenghao Lei, Long Yang, Shiting Wen, Zhixiong Huang, Zhiwang Zhang, Chaoyi Pang
- Affiliations: School of Artificial Intelligence, Peking University, Beijing, China, School of Computer and Data Engineering, NingboTech University, Ningbo, China, College of Computer Science and Technology, Zhejiang University, Hangzhou, China
- TL;DR: This paper introduces the Langevin policy for safe reinforcement learning and proposes the Langevin Actor-Critic (LAC) method to enhance policy inference. The results demonstrate the effectiveness of LAC in achieving safe policies in various RL tasks, particularly in environments like MuJoCo and Safety Gym.
- Keywords: Safe Reinforcement Learning, Policy Optimization, Langevin Policy, Langevin Actor-Critic (LAC), Monte Carlo Sampling, Autonomous Driving, Robot Control, Safety in Reinforcement Learning, Exploration vs. Exploitation Trade-off, New Policy Learning Approach, Empirical Results on MuJoCo and Safety Gym, MuJoCo, Safety Gym
Sample-specific Masks for Visual Reprogramming-based Prompting (Spotlight Poster)
- Authors: Chengyi Cai, Zesheng Ye, Lei Feng, Jianzhong Qi, Feng Liu
- Affiliations: School of Computing and Information Systems, The University of Melbourne, School of Computing and Information Systems, The University of Melbourne; Information Systems Technology and Design Pillar, Singapore University of Technology and Design, Information Systems Technology and Design Pillar, Singapore University of Technology and Design
- TL;DR: This paper introduces a new framework for visual reprogramming called sample-specific multi-channel masks (SMM), which generates individual masks for each sample to improve generalization and reduce approximation error. The proposed method demonstrates performance gains over existing state-of-the-art visual reprogramming techniques on various tasks.
- Keywords: Visual Reprogramming (VR), Model Re-purposing, Sample-specific multi-channel masks (SMM), ConvNet, Patch-wise interpolation, Image classification, Medical data prediction, Generalization capability, Approximation error, Sample-level adaptation, Reduction of approximation error, Performance gain on ResNet and ViT, ImageNet, OxfordPets dataset
How Spurious Features are Memorized: Precise Analysis for Random and NTK Features (Poster)
- Authors: Simone Bombari, Marco Mondelli
- Affiliations: Institute of Science and Technology, Austria
- TL;DR: This study provides a theoretical framework to understand how deep learning models memorize spurious features that are uncorrelated with the learning task. It characterizes the memorization process through model stability and feature alignment, demonstrating that increased generalization capability reduces the memorization of spurious features.
- Keywords: deep learning, spurious features, overfitting, random features (RF), neural tangent kernel (NTK) regression, memorization of spurious features, generalization error, feature alignment characterization, stability of models, MNIST, CIFAR-10
Random matrix theory improved Fréchet mean of symmetric positive definite matrices (Poster)
- Authors: Florent Bouchard, Ammar Mian, Malik TIOMOKO, Guillaume GINOLHAC, Frederic Pascal
- Affiliations: Université Savoie Mont Blanc, LISTIC, Université Paris Saclay, CNRS, Centrale-Supélec, L2S, Huawei Paris Research Center
- TL;DR: This study introduces a random matrix theory-based method for estimating Fréchet means of symmetric positive definite matrices, which is particularly effective in scenarios with low sample support and high dimensionality. Experimental results demonstrate significant improvements over existing state-of-the-art methods in various machine learning applications.
- Keywords: covariance matrices, symmetric positive definite matrices, Fréchet means, machine learning, random matrix theory, iterative algorithms, Riemannian gradient, EEG, remote sensing, deep learning networks, metric learning, domain adaptation, low sample support, high intra-class variability, numerical unfeasibility, singular matrices, improved estimation of Fréchet means, regularization techniques, synthetic datasets, real-world EEG datasets, hyperspectral datasets, Karcher mean, nearest centroid, Bures-Wasserstein geometry, log-Euclidean geometry
Accelerated Algorithms for Constrained Nonconvex-Nonconcave Min-Max Optimization and Comonotone Inclusion (Poster)
- Authors: Yang Cai, Argyris Oikonomou, Weiqiang Zheng
- Affiliations: Department of Computer Science, Yale University, New Haven, USA
- TL;DR: This paper presents accelerated algorithms for constrained comonotone min-max optimization, achieving an optimal convergence rate of O(1/T) and ensuring point convergence to the solution set. The study extends existing algorithms to address limitations in previous research, contributing to the understanding of nonconvex-nonconcave optimization problems.
- Keywords: nonconvex-nonconcave min-max optimization, comonotone optimization, Extra Anchored Gradient (EAG), Fast Extra Gradient (FEG), game theory, optimization, online learning, finding first-order stationary points, convergence rates, optimal convergence rate of O(1/T), point convergence guarantees, GANs (Generative Adversarial Networks), robust optimization, reinforcement learning
On dimensionality of feature vectors in MPNNs (Poster)
- Authors: César Bravo, Alexander Kozachinskiy, Cristobal Rojas
- Affiliations: Centro Nacional de Inteligencia Artificial, Chile; Instituto Milenio Fundamentos de los Datos, Chile, Instituto de Ingeniería Matemática y Computacional, Universidad Católica de Chile; Centro Nacional de Inteligencia Artificial, Chile
- TL;DR: This paper demonstrates that for message-passing graph neural networks (MPNNs), feature vectors of dimension d = 1 are sufficient to ensure equivalence to the Weisfeiler-Leman (WL) isomorphism test, regardless of graph size. This finding addresses the gap between theoretical guarantees and practical architectures in graph representation learning.
- Keywords: message-passing graph neural networks (MPNNs), expressive power, graph isomorphism, Weisfeiler-Leman (WL) test, non-polynomial analytic activation functions, graph data representation, deep learning, distinguishing non-isomorphic graphs, dimensionality of feature vectors, reduced dimensionality of feature vectors to d = 1 for MPNNs
Can a Few Decide for Many? The Metric Distortion of Sortition (Poster)
- Authors: Ioannis Caragiannis, Evi Micha, Jannik Peters
- Affiliations: Department of Computer Science, Aarhus University, Aarhus, Denmark, Research Group Efficient Algorithms, Faculty IV – Electrical Engineering and Computer Science, TU Berlin, Berlin, Germany, Computer Science, Harvard University, Cambridge, USA
- TL;DR: The study investigates whether randomly selected sortition panels accurately reflect the opinions of the entire population. It finds that uniform selection can achieve almost optimal decision alignment with minimal panel sizes, and introduces Fair Greedy Capture as a method that maintains these guarantees while ensuring fairness.
- Keywords: sortition, citizens' assemblies, representation, fairness, metric distortion, uniform selection, Fair Greedy Capture, societal issues, climate change, AI threats, representation of the population, decision alignment with population preferences, almost optimal distortion, constant ex-post distortion
Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation (Poster)
- Authors: Lincan Cai, Shuang Li, Wenxuan Ma, Jingxuan Kang, Binhui Xie, Zixun Sun, Chengwei Zhu
- Affiliations: Beijing Institute of Technology, Interactive Entertainment Group, Tencent, University of Illinois Urbana-Champaign
- TL;DR: This paper presents PaRe, an end-to-end method for enhancing cross-modal fine-tuning by generating intermediate modalities to bridge the modality gap and address data scarcity. The proposed approach demonstrates superior performance across multiple benchmarks compared to existing methods.
- Keywords: Cross-modal fine-tuning, Multimodal perception, Gating mechanism, Patch Replacement scheme, Protein sequence analysis, Cosmic ray data analysis, Modality gap, Data scarcity, Enhanced stability and transferability of fine-tuning, Intermediate modality generation, Large-scale pretrained models, Multimodal large language models
How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers (Spotlight Poster)
- Authors: Gon Buzaglo, Itamar Harel, Mor Shpigel Nacson, Alon Brutzkus, Nati Srebro, Daniel Soudry
- Affiliations: Technion Institute of Technology, Haifa, Israel, Toyota Technological Institute at Chicago, Chicago IL, USA
- TL;DR: This study investigates why over-parameterized Neural Networks generalize well when trained to zero loss, demonstrating that random Neural Networks sampled from a uniform prior can generalize effectively if there exists a narrow teacher network that aligns with the labels. The findings reveal that this sampling method induces a bias towards simpler functions, allowing for efficient learning with reduced sample complexity.
- Keywords: Neural Networks, Generalization, Over-parameterization, Stochastic Gradient Descent (SGD), Uniform Sampling, Generalization of Neural Networks, Implicit Bias, Generalization Guarantees, Sample Complexity, Teacher Neural Network, Interpolating Neural Networks
Successor Features for Efficient Multi-Subject Controlled Text Generation (Poster)
- Authors: Meng Cao, Mehdi Fatemi, Jackie Chi Kit Cheung, Samira Shabanian
- Affiliations: School of Computer Science, McGill University; Mila – Québec AI Institute; Work was done during internship at Microsoft Research, School of Computer Science, McGill University; Mila – Québec AI Institute; Canada CIFAR AI Chair, Wand X, None
- TL;DR: This study introduces SF-GEN, a novel framework for controllable text generation that utilizes successor features to enhance the efficiency of large language models in generating text with specific attributes. The method demonstrates comparable performance to state-of-the-art approaches while being more efficient in handling multiple control subjects.
- Keywords: controllable text generation, large language models (LLMs), reinforcement learning, action-value function, successor features, natural language generation (NLG), controlling generated text for safety, factuality, non-toxicity, efficiency in multi-subject control, SF-GEN method, memory-efficient and computationally efficient training and decoding
Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design (Poster)
- Authors: Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, Tommi Jaakkola
- Affiliations: Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Massachusetts, USA, Department of Statistics, University of Oxford, UK
- TL;DR: This study introduces Discrete Flow Models (DFMs), a novel flow-based model for discrete data that enhances the capabilities of generative models to handle multimodal data, specifically in protein co-design. The approach achieves state-of-the-art performance by enabling flexible generation of protein structures and sequences.
- Keywords: multimodal generative models, discrete and continuous data, Discrete Flow Models (DFMs), Continuous Time Markov Chains (CTMCs), denoising neural network, protein co-design, combining discrete and continuous data, sampling flexibility, multimodal problems, improved performance over existing diffusion-based approaches, state-of-the-art co-design performance
Feasibility Consistent Representation Learning for Safe Reinforcement Learning (Poster)
- Authors: Zhepeng Cen, Yihang Yao, Zuxin Liu, Ding Zhao
- Affiliations: Carnegie Mellon University
- TL;DR: This study introduces Feasibility Consistent Safe Reinforcement Learning (FCSRL), a framework that integrates representation learning with feasibility-oriented objectives to enhance safety constraint estimation in reinforcement learning. The proposed method demonstrates improved performance in safety-aware embedding and policy learning across various tasks compared to existing baselines.
- Keywords: Safe Reinforcement Learning, Representation Learning, Self-supervised Learning, Feasibility Score, Healthcare, Finance, Autonomous Systems, Safety Constraints, Cost Estimation, Sparse Cost Signals, Feasibility Consistent Safe Reinforcement Learning (FCSRL), Safety-aware Embedding
Simple Ingredients for Offline Reinforcement Learning (Poster)
- Authors: Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati
- Affiliations: Meta, FAIR at Meta, Paris, France, Sakana AI, Tokyo, Japan; Meta, FAIR at Meta, Paris, France
- TL;DR: This study investigates the challenges of offline reinforcement learning when using heterogeneous datasets, revealing that performance declines when combining data from different tasks. The authors introduce a new testbed (MOOD) and demonstrate that simple methods like AWAC and IQL can achieve better results with increased policy size, outperforming existing state-of-the-art algorithms.
- Keywords: Offline reinforcement learning, heterogeneous data, AWAC, IQL, TD3+BC, policy constrained methods, Robotics, control tasks, Performance deterioration with diverse data, data sparsity, extrapolation issues, Introduction of MOOD testbed, improved performance with increased policy size, DeepMind Control suite, D4RL benchmark, Actor-critic algorithms, value function, expectile regression, Gumbel regression
Auditing Private Prediction (Poster)
- Authors: Karan Chadha, Matthew Jagielski, Nicolas Papernot, Christopher A. Choquette Choo, Milad Nasr
- Affiliations: Google DeepMind, Stanford University; Google DeepMind
- TL;DR: This paper introduces a novel framework for auditing private prediction algorithms in machine learning, focusing on the privacy leakage associated with different adversarial capabilities. The findings indicate that easier-to-poison algorithms exhibit significantly higher privacy leakage, and that adversaries with limited query control experience lower leakage compared to those with full control.
- Keywords: Differential Privacy, Private Prediction, Renyi Differential Privacy, Auditing Techniques, Machine Learning, Inference Privacy, Privacy Leakage, Adversarial Attacks, Query Control, Auditing Framework, Privacy Analysis Improvements, PATE, CaPC, PromptPATE, Private-kNN
Limited Preference Aided Imitation Learning from Imperfect Demonstrations (Poster)
- Authors: Xingchen Cao, Fan-Ming Luo, Junyin Ye, Tian Xu, Zhilong Zhang, Yang Yu
- Affiliations: National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, China; School of Artificial Intelligence, Nanjing University, China; Polixir Technologies, Nanjing, Jiangsu, China
- TL;DR: This paper introduces Preference Aided Imitation Learning (PAIL), a novel algorithm that enhances imitation learning by leveraging limited human preferences to improve policy performance beyond imperfect demonstrations. Empirical results demonstrate that PAIL significantly outperforms existing methods, achieving a 73.2% improvement in performance.
- Keywords: Imitation Learning, Policy Learning, Preference Aided Imitation Learning (PAIL), Bradley-Terry model, Robotics, Tokamak fusion devices, Imperfect demonstrations, Performance bottleneck, Improved policy performance, Reweighting demonstrations, Human preferences, Expert data
Online Learning under Budget and ROI Constraints via Weak Adaptivity (Poster)
- Authors: Matteo Castiglioni, Andrea Celli, Christian Kroer
- Affiliations: Department of Computing Sciences, Bocconi University, Milan, Italy, IEOR Department, Columbia University, New York, NY, DEIB, Politecnico di Milano, Milan, Italy
- TL;DR: This paper addresses online learning problems where decision makers must maximize rewards while adhering to budget and ROI constraints, proposing a dual-balancing framework that circumvents traditional assumptions about feasibility. The findings include no-regret guarantees under both stochastic and adversarial conditions, with practical applications in online ad auctions.
- Keywords: online learning, budget constraints, ROI constraints, primal-dual algorithms, weakly adaptive regret minimizers, online ad auctions, decision-making under constraints, adversarial inputs, non-packing constraints, dual-balancing framework, no-regret guarantees, Slater parameters, adversarial bandits, knapsack problem
AI Alignment with Changing and Influenceable Reward Functions (Poster)
- Authors: Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, Anca Dragan
- Affiliations: UC Berkeley
- TL;DR: The paper addresses the challenge of aligning AI systems with human preferences that change over time and can be influenced by AI interactions. It introduces Dynamic Reward Markov Decision Processes (DR-MDPs) to model these dynamics and highlights the risks of static-preference assumptions in existing alignment techniques.
- Keywords: AI alignment, changing preferences, influenceable preferences, Dynamic Reward Markov Decision Processes (DR-MDPs), AI systems, human-AI interaction, Static-preference assumption, preference change, AI influence on preferences, New framework for modeling preference changes in AI alignment
Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting (Poster)
- Authors: Serina Chang, Frederic Koehler, Zhaonan Qu, Jure Leskovec, Johan Ugander
- Affiliations: Department of Economics, Stanford University; Department of Management Science & Engineering, Stanford University, Department of Statistics and Data Science Institute, University of Chicago, Department of Computer Science, Stanford University, Department of Management Science & Engineering, Stanford University
- TL;DR: This study addresses the challenge of inferring dynamic networks from time-aggregated adjacency matrices and time-varying marginals using iterative proportional fitting (IPF). The authors establish a generative model that provides a statistical foundation for IPF, introduces a new algorithm to ensure convergence on sparse data, and demonstrates the practical value of their contributions through experiments.
- Keywords: dynamic networks, network inference, time-aggregated adjacency matrix, iterative proportional fitting (IPF), Sinkhorn’s algorithm, epidemic response, transportation planning, mobility networks, data constraints, estimation of dynamic networks, convergence issues in sparse data, generative network model, maximum likelihood estimates, structure-dependent error bounds
Feature Importance Disparities for Data Bias Investigations (Poster)
- Authors: Peter Chang, Leor Fishman, Seth Neel
- Affiliations: SAFR AI Lab, Harvard Business School & Kempner Institute, Boston, MA, Harvard College, Cambridge, MA
- TL;DR: This paper presents a method for identifying feature importance disparities (FID) in training data to assist in data bias investigations, highlighting subgroups where specific features have disproportionate influence. The findings suggest that these disparities can indicate serious bias issues, providing a new approach to understanding and addressing bias in machine learning models.
- Keywords: Data bias investigations, Feature importance disparity, Machine learning, Fairness in classifiers, Bias in training data, Subgroup bias, Method for identifying feature importance disparities, 4 datasets used in experiments, Feature importance disparity (FID)
On the Implicit Bias of Adam (Poster)
- Authors: Matias Cattaneo, Jason Klusowski, Boris Shigida
- Affiliations: Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA
- TL;DR: This study investigates the implicit regularization effects of adaptive optimization algorithms like RMSProp and Adam through backward error analysis, revealing that their behavior depends on hyperparameters and training stages. The findings suggest that these algorithms can influence generalization by either penalizing or impeding the reduction of the one-norm of loss gradients.
- Keywords: Implicit regularization, Gradient descent, Ordinary differential equations (ODEs), RMSProp, Adam, Implicit bias in optimization algorithms, Generalization, Modified loss functions, Backward error analysis
Scribble-Supervised Semantic Segmentation with Prototype-based Feature Augmentation (Poster)
- Authors: Guiyang Chan, Pengcheng Zhang, Hai Dong, Shunhui Ji, Bainian Chen
- Affiliations: School of Computing Technologies, RMIT University, Melbourne, Australia, Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing, China; College of Computer Science and Software Engineering, Hohai University, Nanjing, China
- TL;DR: This study introduces a prototype-based feature augmentation method for scribble-supervised semantic segmentation, addressing limitations in existing methods by leveraging feature prototypes. The proposed approach achieves state-of-the-art performance on the PASCAL VOC 2012 dataset, demonstrating its effectiveness in reducing annotation costs while enhancing segmentation accuracy.
- Keywords: Scribble-supervised semantic segmentation, weakly supervised learning, Prototype-based feature augmentation, Image semantic segmentation, High annotation costs, feature propagation limitations, State-of-the-art performance on PASCAL VOC 2012 dataset, PASCAL VOC 2012
Predictive Dynamic Fusion (Poster)
- Authors: Bing Cao, Yinan Xia, Yi Ding, Changqing Zhang, Qinghua Hu
- Affiliations: College of Intelligence and Computing, Tianjin University, Tianjin, China; Tianjin Key Lab of Machine Learning, Tianjin, China
- TL;DR: This study introduces a Predictive Dynamic Fusion (PDF) framework for multimodal learning that addresses the challenges of dynamic data quality in open environments. The proposed method theoretically guarantees improved generalization and reliability in multimodal fusion, validated through extensive experiments.
- Keywords: Multimodal fusion, Dynamic fusion, Predictive Dynamic Fusion (PDF), Collaborative Belief (Co-Belief), Relative calibration, Autonomous driving, Clinical diagnosis, Sentiment analysis, Unreliability and instability in dynamic multimodal fusion, Modality imbalance, High noise, Theoretical guarantees for multimodal fusion, Reduction of generalization error
MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion (Poster)
- Authors: Di Chang, Yichun Shi, Quankai Gao, Hongyi Xu, Jessica Fu, Guoxian Song, Qing Yan, Yizhe Zhu, Xiao Yang, Mohammad Soleymani
- Affiliations: ByteDance Inc., University of Southern California, University of Southern California; ByteDance Inc.
- TL;DR: The study presents MagicPose, a diffusion-based model for retargeting human poses and facial expressions while preserving identity. It introduces a two-stage training strategy that enables robust control over generated images and generalizes well to unseen identities without additional fine-tuning.
- Keywords: human pose retargeting, facial expression retargeting, human motion transfer, diffusion models, appearance-control block, appearance-disentangled pose control, image stylization, digital human synthesis, data generation for training perception models, generalization to unseen identities, interpolation of invisible body parts, dependency on image warping, robust appearance control, zero-shot retargeting, plug-in module for Stable Diffusion, GANs (Generative Adversarial Networks), image diffusion models
How Interpretable Are Interpretable Graph Neural Networks? (Poster)
- Authors: Yongqiang Chen, Yatao Bian, Bo Han, James Cheng
- Affiliations: The Chinese University of Hong Kong; None, Hong Kong Baptist University, Tencent AI Lab, The Chinese University of Hong Kong
- TL;DR: This study presents a theoretical framework for interpretable subgraph learning in graph neural networks, highlighting the limitations of existing methods in approximating the subgraph multilinear extension (SubMT). The proposed Graph Multilinear neT (GMT) architecture significantly improves interpretability and generalizability, outperforming state-of-the-art models by up to 10% across various benchmarks.
- Keywords: Interpretable Graph Neural Networks, Explainability, Attention-based mechanism, Subgraph multilinear extension (SubMT), Graph Multilinear neT (GMT), Scientific applications, Graph-structured data, Approximation failure, Interpretability of extracted subgraphs, Out-of-Distribution (OOD) generalization, Improved interpretability and generalizability, New XGNN architecture (GMT), Graph classification benchmarks, XGNNs (eXplainable Graph Neural Networks), GNNs (Graph Neural Networks), Causal subgraph, Sampling probability
Doubly Robust Causal Effect Estimation under Networked Interference via Targeted Learning (Oral)
- Authors: Weilin Chen, Ruichu Cai, Zeqin Yang, Jie Qiao, Yuguang Yan, Zijian Li, Zhifeng Hao
- Affiliations: School of Computer Science, Guangdong University of Technology, Guangzhou, China; None, College of Science, Shantou University, Shantou, China, School of Computer Science, Guangdong University of Technology, Guangzhou, China; Pazhou Laboratory (Huangpu), Guangzhou, China, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE, School of Computer Science, Guangdong University of Technology, Guangzhou, China
- TL;DR: This study proposes a novel doubly robust causal effect estimator for networked interference by adapting targeted learning techniques, addressing the challenges of model misspecification. The proposed estimator demonstrates a faster convergence rate and effectiveness through extensive experiments on real-world networks.
- Keywords: Causal effect estimation, Networked interference, Doubly robust estimator, Targeted learning, Neural networks, Epidemiology, Human ecology, Advertisement, Misspecification problems, Confounding bias, Violation of SUTVA, End-to-end causal effect estimator, Faster convergence rate, Stable Unit Treatment Value Assumption (SUTVA), Spillover effects, Main effects, Total effects
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content (Poster)
- Authors: Zhuowen Yuan, Zidi Xiong, Yi Zeng, Ning Yu, Ruoxi Jia, Dawn Song, Bo Li
- Affiliations: Salesforce Research, University of California Berkeley, University of Illinois Urbana-Champaign, Virginia Tech, University of Chicago; University of Illinois Urbana-Champaign
- TL;DR: This paper presents RigorLLM, a novel framework for moderating harmful content in Large Language Models (LLMs) that effectively addresses biases and vulnerabilities to adversarial attacks. Experimental results show that RigorLLM outperforms existing content moderation solutions and sets a new standard for AI safety in LLMs.
- Keywords: Large Language Models, Content Moderation, Energy-based training, Langevin dynamics, Minimax optimization, KNN (K-Nearest Neighbors), AI Safety, AI Alignment, Biases in LLMs, Harmful content generation, Jailbreaking attacks, RigorLLM framework, Robust content moderation solutions, OpenAI API, Perspective API, Nemo Guardrails, LlamaGuard, Resilience to adversarial attacks
Probabilistic Forecasting with Stochastic Interpolants and Föllmer Processes (Poster)
- Authors: Yifan Chen, Mark Goldstein, Mengjian Hua, Michael Albergo, Nicholas Boffi, Eric Vanden-Eijnden
- Affiliations: Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
- TL;DR: This paper presents a framework for probabilistic forecasting of dynamical systems using generative modeling and stochastic interpolants, enabling the sampling of future states from the current state. The approach is validated on complex forecasting problems, demonstrating effective handling of stochastic dynamics and measurement noise.
- Keywords: probabilistic forecasting, dynamical systems, generative modeling, stochastic interpolants, stochastic differential equations (SDEs), square loss regression, climate modeling, fluid dynamics, video prediction, time series data extrapolation, forecasting future states, handling stochastic dynamics, measurement noise, generative modeling approach, adjustment of drift and diffusion coefficients, Föllmer process, KTH dataset, CLEVRER dataset
Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective (Poster)
- Authors: Yang Chen, Cong Fang, Zhouchen Lin, Bing Liu
- Affiliations: National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; Institute for Artificial Intelligence, Peking University; Pazhou Laboratory (Huangpu), Guangzhou, China, National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; Institute for Artificial Intelligence, Peking University, Department of Computer Science, University of Illinois Chicago
- TL;DR: This study introduces a mathematical framework that formalizes relational learning as hypergraph recovery to enhance the understanding of pre-training in Foundation Models. The findings suggest that this approach can effectively capture the complex relationships between entities, offering insights into the capabilities and generalization of pre-trained models.
- Keywords: Relational learning, Foundation Models, Hypergraph recovery, Hypergraph recovery, Minimax near-optimal analysis, Multimodal learning, Entity alignment, Understanding relationships between entities, Data efficiency in pre-training, Novel mathematical framework for relational learning
Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation (Poster)
- Authors: Xuexin Chen, Ruichu Cai, Zhengting Huang, Yuxuan Zhu, Julien Horwood, Zhifeng Hao, Zijian Li, Jose Miguel Hernandez-Lobato
- Affiliations: Shantou University, Shantou 515063, China, School of Computer Science, Guangdong University of Technology, Guangzhou, China; None, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, United Kingdom, Mohamed bin Zayed University of Artificial Intelligence, Masdar City, Abu Dhabi, School of Computer Science, Guangdong University of Technology, Guangzhou, China; Pazhou Laboratory (Huangpu), Guangzhou, China, School of Computer Science, Guangdong University of Technology, Guangzhou, China
- TL;DR: This study introduces Feature Attribution with Necessity and Sufficiency (FANS) to enhance the discriminative power of feature attribution methods in machine learning by addressing the limitations of standard perturbation tests. The proposed method demonstrates improved performance over existing attribution methods across six benchmarks.
- Keywords: Explainability, Feature Attribution, Feature Attribution Methods (FAMs), Dual-stage Perturbation Test, Counterfactual Reasoning, Machine Learning, Difficulty in distinguishing contributions of different features, Perturbation test limitations, Feature Attribution with Necessity and Sufficiency (FANS), Improved discriminative power of FAMs, Probability of being Necessity and Sufficiency (PNS), Shapley Values
Offline Transition Modeling via Contrastive Energy Learning (Poster)
- Authors: Ruifeng Chen, Chengxing Jia, Zefang Huang, Tian-Shuo Liu, Xu-Hui Liu, Yang Yu
- Affiliations: National Key Laboratory for Novel Software Technology, Nanjing University, China, National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Artificial Intelligence, Nanjing University, China
- TL;DR: This study introduces Energy-based Transition Models (ETM) to effectively capture complex transition dynamics in offline decision-making tasks, demonstrating improved evaluation accuracy and generalization to out-of-distribution data. The proposed models significantly outperform existing off-policy evaluation methods and enhance reinforcement learning performance in benchmark tasks.
- Keywords: transition modeling, sequential decision-making, offline settings, Energy-based Transition Models (ETM), Forward Transition Model (FTM), autoregressive dynamics models, Trajectory Transformer (TT), reinforcement learning, offline policy evaluation, complex transition dynamics, discontinuity, model errors, prediction uncertainty, improved evaluation accuracy, better generalization to out-of-distribution transition data, DOPE benchmark, D4RL Gym-Mujoco tasks
High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization (Poster)
- Authors: Yihang Chen, Fanghui Liu, Taiji Suzuki, Volkan Cevher
- Affiliations: Laboratory for Information and Inference Systems, École Polytechnique Fédérale de Lausanne (EPFL), Switzerland, Department of Computer Science, University of Warwick, United Kingdom, Department of Mathematical Informatics, The University of Tokyo, Japan; Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
- TL;DR: This study investigates kernel ridge regression under covariate shifts, focusing on the impact of importance re-weighting on bias and variance. The findings reveal that the re-weighting strategy can reduce variance while the bias may vary significantly depending on the regularization scale, highlighting the need for refined analyses in high-capacity models.
- Keywords: kernel ridge regression, covariate shift, importance weighting, bias-variance decomposition, high-dimensional data, model overfitting, data-dependent regularization, asymptotic expansion of kernel functions, Radon-Nikodym derivative, spectral decay
CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding (Poster)
- Authors: Kaiyuan Chen, Xingzhuo Guo, Yu Zhang, Jianmin Wang, Mingsheng Long
- Affiliations: School of Software, BNRist, Tsinghua University
- TL;DR: This study introduces Cognitive Diffusion Probabilistic Models (CogDPM) that leverage Predictive Coding theory to enhance prediction skills in real-world forecasting tasks. The results demonstrate that CogDPM outperforms existing models by effectively estimating data predictability through a precision weighting mechanism.
- Keywords: Predictive Coding, Cognitive Diffusion Probabilistic Models, Diffusion probabilistic models, hierarchical sampling, Weather forecasting, real-world prediction tasks, Enhancement of prediction skills, precision weighting mechanism, Precision estimation method, improved forecasting capabilities, United Kingdom precipitation dataset, ERA surface wind dataset
Towards AutoAI: Optimizing a Machine Learning System with Black-box and Differentiable Components (Poster)
- Authors: Zhiliang Chen, Chuan-Sheng Foo, Bryan Kian Hsiang Low
- Affiliations: Department of Computer Science, National University of Singapore, Singapore, Institute for Infocomm Research, ASTAR, Singapore; Centre for Frontier AI Research, ASTAR, Singapore, Department of Computer Science, National University of Singapore, Singapore; Institute for Infocomm Research, A*STAR, Singapore
- TL;DR: This study introduces A-BAD-BO, an algorithm designed to optimize complex machine learning systems that include both differentiable and black-box components. The results indicate that A-BAD-BO achieves better system optimality and is more sample-efficient compared to traditional gradient-driven methods.
- Keywords: AutoAI, machine learning optimization, complex systems, Bayesian optimization (BO), A-BAD-BO algorithm, healthcare systems, self-driving cars, large language models (LLMs), high-dimensional parameter optimization, lack of analytical form in components, improved system optimality, sample efficiency, differentiable ML components, black-box components
InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models (Poster)
- Authors: Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, Tianyi Zhou
- Affiliations: Department of Computer Science, University of Maryland, College Park
- TL;DR: This paper presents INSTRUCTZERO, a method for optimizing instructions for black-box large language models by using soft prompts and Bayesian optimization. The approach demonstrates improved performance in generating effective instructions across various tasks compared to existing methods.
- Keywords: Instruction Optimization, Large Language Models, Bayesian Optimization, Soft Prompt Optimization, Zero-shot Learning, Few-shot Learning, Instruction Sensitivity, Combinatorial Optimization, Improved Instruction Generation, Enhanced Zero-shot Performance, Black-box LLMs, Open-source LLMs, Prompt Engineering
Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces (Poster)
- Authors: Ziyi Chen, Heng Huang
- Affiliations: Department of Computer Science, University of Maryland College Park
- TL;DR: This study presents an accelerated policy gradient algorithm for s-rectangular robust Markov decision processes, achieving improved iteration complexity in both deterministic and stochastic settings. The proposed methods enhance robustness against environmental perturbations and address the challenges of simulation-to-reality gaps in reinforcement learning applications.
- Keywords: Robust Markov Decision Processes, Reinforcement Learning, Policy Gradient Methods, Accelerated Policy Gradient Algorithm, Entropy Regularization, Robotics, Energy Flow Control, Production Scheduling, Flight Control, Simulation-to-Reality Gap, Environmental Perturbation, NP-Hard Problems, Scalable Policy Gradient Methods, Iteration Complexity Reduction, s-Rectangularity, (s, a)-Rectangularity
Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy (Poster)
- Authors: Wei-Ning Chen, Berivan Isik, Peter Kairouz, Albert No, Sewoong Oh, Zheng Xu
- Affiliations: Stanford University, Yonsei University, Google, Google; University of Washington
- TL;DR: This paper addresses the challenges of L2 mean estimation under central differential privacy and communication constraints in federated learning. The authors introduce a novel privacy accounting method that significantly improves communication efficiency while maintaining strong privacy guarantees, achieving at least a 100x improvement in compression for DP-SGD across various tasks.
- Keywords: L2 mean estimation, differential privacy, federated learning, Gaussian mechanism, sparsification, matrix factorization, Federated learning (FL), Communication constraints, privacy protection, model update sensitivity, Improved communication-privacy trade-offs, novel privacy accounting method, DP-SGD, DP-FTRL, mean square errors (MSEs)
Bagged Deep Image Prior for Recovering Images in the Presence of Speckle Noise (Poster)
- Authors: Xi Chen, Zhewen Hou, Christopher Metzler, Arian Maleki, Shirin Jalali
- Affiliations: Department of Computer Science, University of Maryland, College Park, MD, USA, Department of Electrical and Computer Engineering, Rutgers University, New Brunswick, NJ, USA, Department of Statistics, Columbia University, NY, USA
- TL;DR: This study presents a novel method called Bagged Deep Image Prior for recovering images affected by speckle noise, establishing a theoretical upper bound on the Mean Squared Error (MSE) of the maximum likelihood estimator. The proposed method integrates projected gradient descent with the Newton-Schulz algorithm, achieving state-of-the-art performance in image recovery.
- Keywords: Image recovery, Speckle noise, Maximum likelihood estimator, Deep Image Prior, Projected gradient descent, Newton-Schulz algorithm, Coherent imaging systems, Speckle noise, Ill-conditioned measurement matrix, Bagged Deep Image Prior, Theoretical upper bound on Mean Squared Error (MSE)
Performative Prediction with Bandit Feedback: Learning through Reparameterization (Poster)
- Authors: Yatong Chen, Wei Tang, Chien-Ju Ho, Yang Liu
- Affiliations: Department of Computer Science and Engineering, University of California, Santa Cruz, California, United States, Data Science Institute, Columbia University; Department of Decisions, Operations, and Technology, the Chinese University of Hong Kong, Department of Computer Science and Engineering, Washington University in St. Louis
- TL;DR: This paper introduces a reparameterization framework for performative prediction that addresses the challenges of non-convex objectives and provides a two-level optimization procedure. The approach allows for the transformation of the performative risk into a convex form, achieving sublinear regret guarantees in relation to the number of performative samples.
- Keywords: Performative prediction, social prediction, data distribution changes, Reparameterization framework, zeroth-order optimization, Education, recommendation systems, criminal prediction, Non-convex objective, performative risk, Convex transformation of objectives, provable regret guarantees, Performative risk, empirical risk minimization (ERM)
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism (Poster)
- Authors: Yanxi Chen, Xuchen Pan, Yaliang Li, Bolin Ding, Jingren Zhou
- Affiliations: Alibaba Group
- TL;DR: The study introduces EE-LLM, a framework designed to enhance the training and inference of early-exit large language models using 3D parallelism. It demonstrates significant improvements in training efficiency and inference speed without sacrificing output quality, addressing the challenges of high costs and latency in LLM deployment.
- Keywords: early-exit large language models, large-scale training and inference, 3D parallelism, pipeline parallelism, KV caching, natural language processing, generative models, excessive costs, carbon emissions, inference latency, training efficiency, inference speedup, algorithmic innovations, Megatron-LM
MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective (Poster)
- Authors: Yizhuo Chen, Chun-Fu (Richard) Chen, Hsiang Hsu, Shaohan Hu, Marco Pistoia, Tarek Abdelzaher
- Affiliations: Global Technology Applied Research, JPMorgan Chase, USA, Department of Computer Science, University of Illinois Urbana-Champaign, USA, Department of Computer Science, University of Illinois Urbana-Champaign, USA; Global Technology Applied Research, JPMorgan Chase, USA
- TL;DR: This study proposes a formal information-theoretic framework for utility-preserving data transformation that selectively suppresses sensitive attributes while maintaining the utility of other attributes. The method demonstrates effectiveness across various datasets and tasks, addressing significant privacy concerns in data handling.
- Keywords: Data privacy protection, Utility-preserving data transformation, Information-theoretic definition, Data-driven learnable data transformation framework, Facial images, Voice audio clips, Human activity motion sensor signals, Privacy concerns, Sensitive attribute suppression, Data availability degradation, Effective and generalizable data transformation method, ImageNet, UCI HAR dataset, Common Crawl, Differential Privacy, Attribute inference attacks, Membership inference attacks
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models (Poster)
- Authors: Justin Chih-Yao Chen, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal
- Affiliations: UNC Chapel Hill
- TL;DR: The study introduces MAGDI, a method for structured distillation of reasoning interactions from multiple Large Language Models into smaller models, significantly improving their reasoning capabilities and efficiency. Experiments demonstrate that MAGDI outperforms existing distillation methods while enhancing generalizability and scalability.
- Keywords: Multi-agent interactions, Reasoning improvement, Language models, Structured distillation, Graph encoder, Next-token prediction, Contrastive loss, Commonsense reasoning, Math reasoning, Long generations, Expensive multi-agent interactions, Lack of a final model for inference, Improved reasoning capabilities, Higher efficiency, Enhanced generalizability, Large Language Models (LLMs), Multi-agent interaction graphs (MAG), Self-consistency
Robust Classification via a Single Diffusion Model (Poster)
- Authors: Huanran Chen, Yinpeng Dong, Zhengyi Wang, Xiao Yang, Chengqi Duan, Hang Su, Jun Zhu
- Affiliations: Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, 100084, China, Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, 100084, China; RealAI, Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, 100084, China; Zhongguancun Laboratory, Beijing, 100080, China, School of Computer Science, Beijing Institute of Technology; Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, 100084, China
- TL;DR: This study introduces the Robust Diffusion Classifier (RDC), leveraging pre-trained diffusion models to enhance adversarial robustness in image classification. RDC demonstrates superior performance against various adaptive attacks, achieving a robust accuracy of 75.67% on CIFAR-10, surpassing previous state-of-the-art models.
- Keywords: Adversarial robustness, Generative classifiers, Diffusion models, Robust Diffusion Classifier (RDC), Bayes’ theorem, Image classification, Adversarial training, Vulnerability to adversarial examples, Limitations of existing methods, Improved robust accuracy, Generalizability against unseen threats, CIFAR-10
Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes (Poster)
- Authors: Yingyi Chen, Qinghua Tao, Francesco Tonin, Johan Suykens
- Affiliations: LIONS, EPFL, Switzerland, ESAT-STADIUS, KU Leuven, Belgium
- TL;DR: This study introduces Kernel-Eigen Pair Sparse Variational Gaussian Processes (KEP-SVGP) to address the asymmetry in self-attention kernels and improve uncertainty estimation in Transformers. The proposed method reduces computational complexity while providing reliable predictions across various benchmarks.
- Keywords: Transformers, Uncertainty Estimation, Bayesian Inference, Gaussian Processes (GPs), Sparse Variational Gaussian Processes (SVGP), Kernel Singular Value Decomposition (KSVD), Safety-Critical Applications, Feature Learning, Overconfident Predictions, High Complexity in GP Posteriors, Asymmetry in Attention Kernels, Kernel-Eigen Pair Sparse Variational Gaussian Processes (KEP-SVGP), Evidence Lower Bound for Optimization, Self-Attention, Asymmetric Kernels, Inducing Points
DiJiang: Efficient Large Language Models through Compact Kernelization (Oral)
- Authors: Hanting Chen, Liuzhicheng Liuzhicheng, Xutao Wang, Yuchuan Tian, Yunhe Wang
- Affiliations: Peking University, Huawei Noah’s Ark Lab
- TL;DR: This paper introduces DiJiang, a novel approach for transforming pre-trained Transformers into linear complexity models with minimal retraining costs. The method achieves comparable performance to existing models while significantly reducing training costs and improving inference speeds.
- Keywords: Efficient Transformers, Large Language Models, Frequency Domain Kernelization, Linear Attention, Discrete Cosine Transform (DCT), Quasi-Monte Carlo method, Natural Language Processing (NLP), Speech Recognition, Machine Translation, Document Generation, Computational load reduction, Extensive retraining challenges, Resource constraints, Comparable performance to original Transformers, Reduced training costs, Faster inference speeds, LLaMA2-7B, DiJiang-7B, Transformer, Linear Transformers, Attention Mechanism
Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments (Poster)
- Authors: Runfa Chen, Ling Wang, Yu Du, Tianrui Xue, Fuchun Sun, Jianwei Zhang, Wenbing Huang
- Affiliations: Dept. of Info. Eng., Xi’an Research Institute of High-Tech, Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua University, Gaoling School of Artificial Intelligence, Renmin University of China; Beijing Key Laboratory of Big Data Management and Analysis Methods, TAMS, Dept. of Informatics, University of Hamburg, Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua University; THU-Bosch JCML Center, School of Elec. Eng., Naval University of Engineering, College of Arts & Sci., New York University
- TL;DR: This paper introduces Subequivariant Hierarchical Neural Networks (SHNN) to address the complexities of learning policies in multi-entity systems within 3D environments by dynamically decoupling the global state space into local views. The proposed method shows significant advancements over existing approaches and introduces a new benchmark for multi-entity reinforcement learning.
- Keywords: multi-entity systems, reinforcement learning, 3D environments, Subequivariant Hierarchical Neural Networks (SHNN), subequivariant message passing, exponential complexity, global state space expansion, task assignment, Multi-entity Benchmark (MEBEN), advancements in policy learning, E(3) subequivariance, local reference frames (LRF)
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models (Poster)
- Authors: Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, Quanquan Gu
- Affiliations: Department of Computer Science, University of California, Los Angeles, CA 90095, USA
- TL;DR: This study introduces Self-Play Fine-Tuning (SPIN) as a method to enhance weak Large Language Models (LLMs) without additional human-annotated data, demonstrating significant performance improvements across various benchmarks. The findings suggest that self-play can effectively elevate LLM capabilities to human-level performance.
- Keywords: Large Language Models, Self-Play Fine-Tuning, Supervised Fine-Tuning (SFT), Self-Play Fine-Tuning (SPIN), Reinforcement Learning from Human Feedback (RLHF), Need for additional human-annotated data, enhancing weak models, Improved LLM performance, alignment with target data distribution, HuggingFace Open LLM Leaderboard, MT-Bench, Big-Bench
GRATH: Gradual Self-Truthifying for Large Language Models (Poster)
- Authors: Weixin Chen, Dawn Song, Bo Li
- Affiliations: UIUC; UChicago, UIUC, UC Berkeley
- TL;DR: The study introduces GRATH, a novel post-processing method aimed at enhancing the truthfulness of large language models (LLMs) by utilizing out-of-domain question prompts for training. The results demonstrate significant improvements in truthfulness metrics on the TruthfulQA benchmark, surpassing even larger models.
- Keywords: truthfulness, large language models (LLMs), hallucination, GRAdual self-truTHifying (GRATH), direct preference optimization (DPO), real-world applications, safety-critical applications, generating truthful content, hallucination phenomenon, state-of-the-art performance on TruthfulQA, improvement in model truthfulness, TruthfulQA, ARC-Challenge
Recovering Labels from Local Updates in Federated Learning (Poster)
- Authors: Huancheng Chen, Haris Vikalo
- Affiliations: University of Texas at Austin, Texas, USA
- TL;DR: This paper introduces a novel label recovery method, Recovering Labels from Local Updates (RLU), which effectively reconstructs labels from local updates in federated learning settings, achieving high accuracy even with multiple local training epochs and heterogeneous data. The proposed method outperforms existing techniques and enhances the quality of reconstructed images in gradient inversion attacks.
- Keywords: Federated Learning, Privacy in Machine Learning, Gradient Inversion, Label Recovery, Least-Square Problem, Healthcare, Finance, Privacy Attacks, Data Reconstruction, Heterogeneous Data, Recovering Labels from Local Updates (RLU), Improved Data Reconstruction Quality
Diffusion Model-Augmented Behavioral Cloning (Poster)
- Authors: Shang-Fu Chen, Hsiang-Chun Wang, Ming-Hao Hsu, Chun-Mao Lai, Shao-Hua Sun
- Affiliations: National Taiwan University, Taipei, Taiwan
- TL;DR: This study proposes a novel imitation learning framework called Diffusion Model-Augmented Behavioral Cloning (DBC) that combines conditional and joint probability modeling to enhance generalization in behavioral cloning. The results demonstrate that DBC outperforms existing methods in various continuous control tasks, addressing limitations in traditional approaches.
- Keywords: Imitation learning, Behavioral cloning, Diffusion models, Generative models, Continuous control tasks, Navigation, Robot arm manipulation, Dexterous manipulation, Locomotion, Generalization issues, Manifold overfitting, Diffusion Model-Augmented Behavioral Cloning (DBC), Policy optimization, Conditional probability, Joint probability
Locally Differentially Private Decentralized Stochastic Bilevel Optimization with Guaranteed Convergence Accuracy (Poster)
- Authors: Ziqin Chen, Yongqiang Wang
- Affiliations: Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, 29634, United States
- TL;DR: This study presents a new decentralized stochastic bilevel-optimization algorithm that achieves both differential privacy and accurate convergence, addressing significant privacy concerns in decentralized optimization. The proposed method characterizes convergence rates under various conditions and quantifies the impact of differential privacy on these rates.
- Keywords: Decentralized bilevel optimization, Differential privacy, Stochastic bilevel optimization algorithms, Machine learning, Meta-learning, Hyperparameter optimization, Imitation learning, Neural architecture search, Privacy concerns in decentralized optimization, Information exchange challenges, New decentralized stochastic bilevel-optimization algorithm, Convergence rate characterization
A General Framework for Learning from Weak Supervision (Poster)
- Authors: Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj
- Affiliations: Peking University, RIKEN AIP; The University of Tokyo, Microsoft Research, Carnegie Mellon University; Mohamed bin Zayed University of AI, Singapore University of Technology and Design, Microsoft Research; William & Mary, Carnegie Mellon University
- TL;DR: This paper presents a general framework for learning from weak supervision (GLWS) that utilizes an Expectation-Maximization formulation to effectively handle various weak supervision sources. The proposed method significantly enhances scalability and demonstrates superior performance across multiple weak supervision scenarios.
- Keywords: weakly supervised learning, scalability, practical deployment, Expectation-Maximization (EM), Non-deterministic Finite Automaton (NFA), forward-backward algorithm, applicability to various scenarios, diverse weak supervision, complexity of existing algorithms, general framework for learning from weak supervision (GLWS), improved performance across weak supervision scenarios
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (Poster)
- Authors: Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He
- Affiliations: HKUST, City University of Hong Kong, Shanghai Jiao Tong University, National University of Singapore, Stanford University, Penn State University
- TL;DR: This study investigates the mechanisms behind hallucinations in large language models by analyzing inner representations and proposes an entropy-based metric to improve decoding accuracy. The approach demonstrates significant effectiveness in mitigating hallucinations, achieving notable improvements on various benchmarks.
- Keywords: Large Language Models, Hallucination Mitigation, Entropy-based metric, Constrained decoding, Knowledge-seeking tasks, Factuality in language models, Hallucinations, Factual errors in LLMs, Improved understanding of hallucinations, Enhanced decoding approach, COUNTERFACT dataset, TruthfulQA, Inner representations, Context activations
Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning (Poster)
- Authors: Junfeng CHEN, Kailiang Wu
- Affiliations: Department of Mathematics, Southern University of Science and Technology, Shenzhen 518055, China; Shenzhen International Center for Mathematics, Southern University of Science and Technology, Shenzhen 518055, China; National Center for Applied Mathematics Shenzhen (NCAMS), Shenzhen 518055, China
- TL;DR: This paper introduces the Position-induced Transformer (PiT), a novel approach for operator learning in Partial Differential Equations (PDEs) that utilizes a position-attention mechanism to improve efficiency and performance over traditional self-attention methods. The findings demonstrate PiT's superior capabilities in handling complex operator learning tasks and its enhanced discretization convergence compared to existing models.
- Keywords: Operator learning, Partial Differential Equations (PDEs), Position-induced Transformer (PiT), position-attention mechanism, self-attention, Surrogate modeling, complex systems, High computational demands, limited interpretability, challenges in solving complex nonlinear systems, Enhanced discretization convergence, superior performance over state-of-the-art neural operators, Transformers, numerical methods for PDEs, Fourier neural operator
CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process (Poster)
- Authors: Guangyi Chen, Yifan Shen, Zhenhao Chen, Xiangchen Song, Yuewen Sun, Weiran Yao, Xiao Liu, Kun Zhang
- Affiliations: Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE, Salesforce, San Francisco, US, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE; Carnegie Mellon University, Pittsburg, US, Carnegie Mellon University, Pittsburg, US
- TL;DR: This study introduces CaRiNG, a method for learning causal representations from sequential data under non-invertible generation processes, addressing challenges in identifying latent causal variables. The approach demonstrates improved temporal understanding and reasoning, validated through experiments on synthetic datasets.
- Keywords: Temporal causal representation, Sequential data analysis, Independent Component Analysis (ICA), Nonlinear ICA, Video analysis, Time series data, Visual perception, Non-invertible generation process, Information loss, Causal dynamics identification, Identifiability theory, CaRiNG method for causal representation, Synthetic datasets
Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank (Poster)
- Authors: Mouxiang Chen, Chenghao Liu, Zemin Liu, Zhuo Li, Jianling Sun
- Affiliations: Salesforce Research Asia, State Street Technology (Zhejiang) Ltd., Zhejiang University, National University of Singapore
- TL;DR: This study investigates the conditions under which true relevance can be recovered from biased click data in Unbiased Learning to Rank (ULTR) and introduces methods to restore the connectivity of the identifiability graph. The findings highlight that a disconnected identifiability graph can lead to suboptimal ranking performance, and the proposed methods effectively mitigate data bias.
- Keywords: Unbiased Learning to Rank, click data, ranking models, examination hypothesis, joint optimization, information retrieval systems, data bias, position bias, recoverability of relevance, node intervention, node merging, identifiability graph, simulated dataset, LTR benchmark datasets
Efficient Pareto Manifold Learning with Low-Rank Structure (Spotlight Poster)
- Authors: Weiyu CHEN, James Kwok
- Affiliations: Department of Computer Science and Engineering, The Hong Kong University of Science and Technology
- TL;DR: This study presents a novel approach to multi-task learning that integrates a main network with low-rank matrices to efficiently learn the Pareto manifold, significantly reducing the number of parameters and enhancing performance on large task datasets. The proposed method outperforms existing state-of-the-art algorithms, particularly in scenarios with numerous tasks.
- Keywords: Multi-task learning, Multi-objective optimization, Pareto front, Low-rank matrices, Orthogonal regularization, Scalability issues, Balancing tasks, Parameter reduction, Efficient learning of Pareto manifold, Improved performance on large task datasets, Pareto Manifold Learning (PaMaL), Hypernetwork
Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting (Poster)
- Authors: Anthony Chen, Huanrui Yang, Yulu Gan, Denis Gudovskiy, Zhen Dong, Haofan Wang, Tomoyuki Okuno, Yohei Nakata, EECS Kurt Keutzer, Shanghang Zhang
- Affiliations: School of Computer Science, Peking University, Carnegie Mellon University, University of California, Berkeley, Panasonic Holdings Corporation
- TL;DR: This study introduces the Split-Ensemble method to enhance uncertainty estimation for deep learning models without requiring additional OOD data or incurring extra computational costs. The proposed method demonstrates significant improvements in accuracy and OOD detection across multiple datasets.
- Keywords: Uncertainty estimation, Out-of-distribution (OOD) detection, Deep learning, Split-Ensemble method, Subtask-splitting ensemble training, Image classification, OOD detection, Uncalibrated uncertainty, High computational costs of ensembles, Improved accuracy, Generalizable uncertainty estimation, CIFAR-10, CIFAR-100, Tiny-ImageNet
Compact Optimality Verification for Optimization Proxies (Poster)
- Authors: Wenbo Chen, Haoruo Zhao, Mathieu Tanneau, Pascal Van Hentenryck
- Affiliations: H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, USA; NSF Artificial Intelligence Research Institute for Advances in Optimization (AI4OPT), USA
- TL;DR: This paper presents a compact formulation for verifying the optimality of optimization proxies, addressing both convex and non-convex problems, and introduces a gradient-based heuristic that significantly improves computational efficiency. The proposed methods are validated through applications in DC Optimal Power Flow and knapsack problems, demonstrating their practical benefits.
- Keywords: optimization proxies, parametric optimization problems, optimality verification, gradient-based primal heuristic, Projected Gradient Attack (PGA), Mixed-Integer Programming (MIP), DC Optimal Power Flow (DC-OPF), knapsack problems, power systems operations, supply chain management, manufacturing, worst-case optimality gap, feasibility, constraint violations, compact formulation for optimality verification, computational benefits, quality guarantees for optimization proxies
LLaGA: Large Language and Graph Assistant (Poster)
- Authors: Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, Zhangyang “Atlas” Wang
- Affiliations: Snap Inc., The University of Texas at Austin
- TL;DR: This paper introduces the Large Language and Graph Assistant (LLaGA), a model that integrates large language model capabilities to effectively handle graph-structured data. LLaGA demonstrates superior performance across various datasets and tasks, surpassing existing state-of-the-art graph models in both supervised and zero-shot scenarios.
- Keywords: Graph Neural Networks, Large Language Models, Message Passing, Aggregation Techniques, Graph-structured Data Analysis, Social Networks, Biological Networks, Recommendation Systems, Translating Graph Structures to Language, Performance on Graph Tasks, Large Language and Graph Assistant (LLaGA), Versatile Projector, GNNs (Graph Neural Networks), LLMs (Large Language Models), Explainability, Generalizability, Interpretability
Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness (Poster)
- Authors: Honghao Chen, Zhang Yurong, xiaokun Feng, Xiangxiang Chu, Kaiqi Huang
- Affiliations: Meituan, Shanghai Jiao Tong University, CRISE, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
- TL;DR: This study evaluates the robustness of large kernel convolutional networks compared to traditional small kernel CNNs and vision transformers, revealing that pure CNNs can achieve robustness levels comparable to or exceeding those of ViTs. The findings provide new insights into the factors contributing to this robustness, including occlusion invariance and kernel attention patterns.
- Keywords: robustness, deep learning, large kernel convolutional networks, convolutional neural networks (CNNs), vision transformers (ViTs), image classification, object detection, semantic segmentation, self-supervised learning, robustness evaluation, occlusion invariance, adversarial attacks, exceptional robustness of pure CNNs, insights into kernel attention patterns and frequency characteristics, robustness benchmark datasets
Stacking Deep Set Networks and Pooling by Quantiles (Poster)
- Authors: Zhuojun Chen, Xinghua Zhu, Dongzhe Su, Justin CHUANG
- Affiliations: None, ASTRI, Hong Kong, China
- TL;DR: This paper introduces Stacked Deep Sets and Quantile Pooling as a novel approach for learning from set data, combining the strengths of max and average pooling. The proposed methods demonstrate improved performance and efficiency in handling complex data distributions compared to traditional pooling techniques.
- Keywords: Deep Learning, Set Data Processing, Stacked Deep Sets, Quantile Pooling, Max Pooling, Average Pooling, Information Loss in Pooling, Complex Data Distributions, Enhanced Pooling Methods, Improved Learning Efficiency, Permutation-Invariant Functions
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations (Spotlight Poster)
- Authors: Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown
- Affiliations: NYU Shanghai, Columbia University; NYU Shanghai, New York University, Columbia University, UC Berkeley
- TL;DR: This study evaluates whether large language models can explain their outputs effectively through counterfactual simulatability, revealing that current explanations often mislead users and have low precision. The findings suggest that simply optimizing for human approval may not be sufficient for improving model explanations.
- Keywords: Explainability, Large Language Models, Counterfactual Simulatability, Multi-hop Factual Reasoning, Reward Modeling, Natural Language Processing, Low precision of explanations, Misleading mental models, Evaluation metrics for explanations
DRCT: Diffusion Reconstruction Contrastive Training towards Universal Detection of Diffusion Generated Images (Spotlight Poster)
- Authors: Baoying Chen, Jishen Zeng, Jianquan Yang, Rui Yang
- Affiliations: School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China, Alibaba Group
- TL;DR: This study introduces Diffusion Reconstruction Contrastive Training (DRCT) to enhance the generalizability of image detection methods for images generated by various diffusion models. The proposed framework significantly improves detection accuracy by over 10% in cross-set tests and includes the creation of a large dataset for evaluation.
- Keywords: Diffusion models, Image detection, Diffusion Reconstruction Contrastive Learning (DRCT), Contrastive training, Digital content generation, Image generation, Generalizability of image detectors, Hard sample classification, Improved accuracy in detecting generated images, Development of a million-scale dataset (DRCT-2M), DRCT-2M, MSCOCO
Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation (Poster)
- Authors: Yu Chen, XiangCheng Zhang, Siwei Wang, Longbo Huang
- Affiliations: Tsinghua University, Beijing, China, Microsoft Research Asia
- TL;DR: This paper presents a framework for Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL) that incorporates static Lipschitz Risk Measures and general function approximation, addressing the challenges of risk management in RL. The authors introduce two meta-algorithms and derive a novel regret upper bound, contributing to the development of statistically efficient algorithms in this field.
- Keywords: Risk-Sensitive Reinforcement Learning, Distributional Reinforcement Learning, Static Lipschitz Risk Measures, Least Squares Regression, Maximum Likelihood Estimation, Financial investment, Medical treatment, Autonomous driving, Risk management, Sample complexity, Cumulative reward distribution, RS-DisRL-M, RS-DisRL-V, Regret upper bound, Markov Decision Process (MDP), Coherent risk, Conditional Value-at-Risk (CVaR), Entropy risk measures (ERM)
Enhancing Implicit Shape Generators Using Topological Regularizations (Poster)
- Authors: Liyan Chen, Yan Zheng, Yang Li, Lohit A. Jagarapu, Haoxiang Li, Hao Kang, Gang Hua, Qixing Huang
- Affiliations: Tsinghua Shenzhen International Graduate School, Info Building 1108A, Shenzhen, China, Wormpex AI Research, 500 108th Ave NE, Ste 1740, Bellevue WA, Department of Computer Science, The University of Texas at Austin, Austin TX
- TL;DR: This paper presents a method to enhance implicit 3D shape generators by using topological regularization losses to address artifacts in synthetic models. The approach focuses on aligning the persistent diagram distributions of training and synthetic shapes while ensuring smoothness among adjacent synthetic shapes, leading to improved generalization performance.
- Keywords: 3D shape generative models, topological generalization, persistent diagram (PD), generative model, topological regularization losses, visual computing, synthetic 3D model generation, topological artifacts, data sparsity in training, alignment of PD distributions, improved generalization behavior, distribution alignment loss, smoothness regularization, ShapeNet
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks (Poster)
- Authors: Xingwu Chen, Difan Zou
- Affiliations: Department of Computer Science, The University of Hong Kong, Department of Computer Science and Institute of Data Science, The University of Hong Kong
- TL;DR: This study investigates how the depth of transformer architectures influences their capabilities in memorization, reasoning, and generalization through a novel set of sequence learning tasks. The findings indicate that at least two attention layers are necessary for reasoning and generalization, while three layers may be required for contextual generalization.
- Keywords: Transformer architecture, sequence learning tasks, Attention layers, Natural language processing, deep learning, Memorization, reasoning, generalization, contextual generalization, Evaluation of transformer capabilities, stacking attention layers
Diffusive Gibbs Sampling (Poster)
- Authors: Wenlin Chen, Mingtian Zhang, Brooks Paige, Jose Miguel Hernandez-Lobato, David Barber
- Affiliations: University of Cambridge, Cambridge, UK, University of Cambridge, Cambridge, UK; Max Planck Institute for Intelligent Systems, Tübingen, Germany, University College London, London, UK
- TL;DR: This paper introduces Diffusive Gibbs Sampling (DiGS), a novel sampling method that effectively addresses the challenges of inadequate mixing in conventional MCMC methods for multi-modal distributions. DiGS demonstrates superior performance in sampling tasks across various applications, including Bayesian inference and molecular dynamics.
- Keywords: multi-modal distributions, sampling methods, Diffusive Gibbs Sampling (DiGS), Gaussian convolution, Metropolis-within-Gibbs, Bayesian inference, molecular dynamics, mixtures of Gaussians, Bayesian neural networks, inadequate mixing of MCMC methods, disconnected modes, improved sampling performance, better mixing properties, Markov Chain Monte Carlo (MCMC), Langevin SDE, Metropolis-adjusted Langevin Algorithm (MALA), Hamiltonian Monte Carlo (HMC)
FedMBridge: Bridgeable Multimodal Federated Learning (Oral)
- Authors: Jiayi Chen, Aidong Zhang
- Affiliations: Department of Computer Science, University of Virginia, Charlottesville, VA 22903, USA
- TL;DR: The study introduces FedMBridge, a novel approach to Multimodal Federated Learning that addresses the challenges of statistical and architecture heterogeneity among clients. The proposed method enhances knowledge sharing and communication efficiency, demonstrating effectiveness in various simulations.
- Keywords: Multimodal Federated Learning, Personalized Federated Learning, Topology-aware hypernetwork, blockwise model aggregation, Statistical heterogeneity, architecture heterogeneity, knowledge sharing among clients, FedMBridge, communication-efficient information sharing
GaussianPro: 3D Gaussian Splatting with Progressive Propagation (Poster)
- Authors: Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, Xuejin Chen
- Affiliations: Nanjing University, The University of Hong Kong, Texas A&M University, ShanghaiTech University, MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, The University of Adelaide
- TL;DR: This paper introduces GaussianPro, a novel method for 3D Gaussian Splatting that addresses the challenges of low-quality renderings in texture-less areas by applying a progressive propagation strategy. The method significantly improves rendering quality, achieving a 1.15dB increase in PSNR compared to existing techniques on the Waymo dataset.
- Keywords: 3D Gaussian Splatting, neural rendering, novel view synthesis, Structure-from-Motion (SfM), multi-view stereo (MVS), patch matching, virtual reality, autonomous driving, 3D content generation, data sparsity, low-quality renderings, optimization challenges, GaussianPro method, improved PSNR, Waymo dataset
RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation (Spotlight Poster)
- Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Sabrina Yang, Gang Wang, Xinyu Xing
- Affiliations: Presentation High School, San Jose, California, USA, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA, Department of Computer Science, Northwestern University, Evanston, Illinois, USA
- TL;DR: This paper introduces RICE, a novel refining scheme for deep reinforcement learning that utilizes explanation methods to overcome training bottlenecks, particularly in environments with sparse rewards. The results show that RICE significantly improves agent performance compared to existing methods.
- Keywords: Deep Reinforcement Learning (DRL), Training Bottlenecks, Explanation Methods, RICE (Refining scheme), StateMask, Autonomous Vehicles, Cybersecurity, Simulated Games, Sparse Rewards, Training Bottlenecks, Local Optima, Enhanced Agent Performance, Tighter Sub-optimality Bound
Kernel Semi-Implicit Variational Inference (Poster)
- Authors: Ziheng Cheng, Longlin Yu, Tianyu Xie, Shiyue Zhang, Cheng Zhang
- Affiliations: School of Mathematical Sciences, Peking University, China, School of Mathematical Sciences, Peking University, China; Center for Statistical Science, Peking University, China
- TL;DR: This paper introduces kernel semi-implicit variational inference (KSIVI), which improves upon traditional variational inference methods by eliminating the need for lower-level optimization through kernel tricks. The proposed method demonstrates effectiveness and efficiency in Bayesian inference tasks while providing novel convergence guarantees.
- Keywords: Semi-implicit variational inference, Variational inference, Kernel tricks, Score matching, Stochastic gradient descent, Bayesian inference, Intractable densities, Biases in training, Kernel SIVI (KSIVI), Convergence guarantees, Reproducing kernel Hilbert space (RKHS), Kernel Stein discrepancy (KSD)
Hard Tasks First: Multi-Task Reinforcement Learning Through Task Scheduling (Poster)
- Authors: MYUNG-SIK CHO, Jong Eui Park, Suyoung Lee, Youngchul Sung
- Affiliations: School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea
- TL;DR: This study presents a novel Scheduled Multi-Task Training (SMT) algorithm that prioritizes challenging tasks in multi-task reinforcement learning to improve learning efficiency and mitigate negative transfer. The method demonstrates significant performance improvements on the Meta-World benchmark, showcasing its effectiveness in handling varying task difficulties.
- Keywords: Multi-task reinforcement learning, task scheduling, Scheduled Multi-Task Training (SMT), dynamic task prioritization, Robotics, control tasks, Varying task difficulties, negative transfer, simplicity bias, Improved learning efficiency, enhanced adaptability and robustness, Meta-World benchmark, Deep reinforcement learning, deep neural networks (DNNs)
Kernel Debiased Plug-in Estimation: Simultaneous, Automated Debiasing without Influence Functions for Many Target Parameters (Poster)
- Authors: Brian Cho, Yaroslav Mukhin, Kyra Gan, Ivana Malenica
- Affiliations: Department of ORIE, Cornell Tech, NY, USA, Department of Statistics, Harvard University, Cambridge, MA, USA, Massachusetts Institute of Technology, Cambridge, MA, USA
- TL;DR: This paper introduces kernel debiased plug-in estimation (KDPE), a method that simultaneously debiases multiple target parameters in nonparametric models without requiring influence functions. The proposed method enhances efficiency and computational tractability in statistical estimation.
- Keywords: nonparametric models, plug-in bias, targeted maximum likelihood estimation, kernel debiased plug-in estimation (KDPE), regularized likelihood maximization, causal inference, statistical estimation, plug-in bias, bias-variance trade-off, computational challenges, efficient RAL estimators, simultaneous debiasing of target parameters, influence function (IF), efficient influence function (EIF), reproducing kernel Hilbert spaces
How Flawed Is ECE? An Analysis via Logit Smoothing (Poster)
- Authors: Muthu Chidambaram, Holden Lee, Colin McSwiggen, Semon Rezchikov
- Affiliations: Johns Hopkins University, New York University, Department of Computer Science, Duke University, Princeton University
- TL;DR: This study analyzes the Expected Calibration Error (ECE) and its discontinuities, proposing a new metric called Logit-Smoothed ECE (LS-ECE) to address these issues. Initial experiments suggest that LS-ECE closely tracks binned ECE, indicating potential practical solutions to ECE's theoretical shortcomings.
- Keywords: calibration, expected calibration error (ECE), Logit-Smoothed ECE (LS-ECE), image classification, medical diagnosis, self-driving, model calibration issues, overconfidence in predictions, Logit-Smoothed ECE (LS-ECE), analysis of ECE discontinuities, Polish spaces, predictive uncertainty
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation (Poster)
- Authors: Minsik Cho, Mohammad Rastegari, Devang Naik
- Affiliations: Meta. USA; (the work done while being with Apple), Apple. USA
- TL;DR: This paper introduces KV-Runahead, a novel parallelization technique designed to enhance the prompt phase of Large Language Model inference by efficiently populating the key-value cache, thereby minimizing the time-to-first-token (TTFT). Experimental results show that KV-Runahead achieves significant speedups compared to existing parallelization methods.
- Keywords: Large Language Models, Causal Inference, Parallelization, Key-Value Cache (KV-cache), Natural Language Processing, Time-to-first-token (TTFT), Time Per Output Token (TPOT), KV-Runahead, Context-level load-balancing, Llama 7B, Falcon 7B, Generative Pre-trained Transformer (GPT), Causal Attention
Listwise Reward Estimation for Offline Preference-based Reinforcement Learning (Poster)
- Authors: Heewoong Choi, Sangwon Jung, Hongjoon Ahn, Taesup Moon
- Affiliations: Department of Electrical and Computer Engineering, Seoul National University; ASRI/INMC/IPAI/AIIS, Seoul National University, Department of Electrical and Computer Engineering, Seoul National University
- TL;DR: This paper introduces Listwise Reward Estimation (LiRE), a novel method for offline Preference-based Reinforcement Learning that utilizes second-order preference information to improve reward estimation. The proposed approach outperforms existing methods even with limited feedback, demonstrating robustness against feedback noise.
- Keywords: Preference-based Reinforcement Learning (PbRL), Reward Estimation, Listwise Reward Estimation (LiRE), Bradley-Terry model, Robotics, Game AI, Autonomous Driving, Designing precise reward functions, Aligning with human intent, Novel reward estimation approach, Improved performance with modest feedback budgets, New offline PbRL dataset, Second-order preference, Ranked List of Trajectories (RLT)
Neurodegenerative Brain Network Classification via Adaptive Diffusion with Temporal Regularization (Poster)
- Authors: Hyuna Cho, Jaeyoon Sim, Guorong Wu, Won Hwa Kim
- Affiliations: Pohang University of Science and Technology (POSTECH), South Korea, University of North Carolina at Chapel Hill, USA
- TL;DR: This study introduces the Adaptive Graph diffusion network with Temporal regularization (AGT) to classify neurodegenerative brain networks, addressing the challenges of disease progression and the complex structures of brain connectomes. The proposed method demonstrates superior interpretability and performance on benchmark datasets for Alzheimer's and Parkinson's diseases.
- Keywords: neurodegenerative diseases, brain connectomes, disease progression, Adaptive Graph diffusion network, node-wise convolution, temporal regularization, neuroimaging, brain network classification, progressive dynamics of diseases, cross-sectional studies, high-dimensional and sparse graphs, homophily and heterophily, interpretable results at node-level and group-level, validation on neurodegenerative disease benchmarks, Alzheimer’s Disease Neuroimaging Initiative (ADNI), Parkinson’s Progression Markers Initiative (PPMI), graph neural networks (GNNs), oversmoothing
RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences (Spotlight Poster)
- Authors: Jie Cheng, Gang Xiong, Xingyuan Dai, Qinghai Miao, Yisheng Lv, Fei-Yue Wang
- Affiliations: School of Artificial Intelligence, the University of Chinese Academy of Sciences, State Key Laboratory of Multimodal Artificial Intelligence Systems, CASIA; School of Artificial Intelligence, the University of Chinese Academy of Sciences
- TL;DR: This paper presents RIME, a robust preference-based reinforcement learning algorithm designed to effectively learn from noisy preferences, addressing the challenges of feedback inefficiency and robustness in reward learning. The proposed method significantly enhances the performance of state-of-the-art PbRL techniques in robotic manipulation and locomotion tasks.
- Keywords: Preference-based Reinforcement Learning (PbRL), Robustness in RL, Denoising discriminator, Sample selection-based discriminator, Robotic manipulation, Locomotion tasks, Noisy preferences, Lack of robustness, Feedback inefficiency, RIME algorithm, Warm start for reward model, Kullback–Leibler (KL) divergence, Human-in-the-loop paradigm
Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport (Poster)
- Authors: Jaemoo Choi, Jaewoong Choi, Myungjoo Kang
- Affiliations: Korea Institute for Advanced Study, Seoul National University, Seoul National University; Korea Institute for Advanced Study
- TL;DR: This paper introduces a scalable generative model called Semi-dual JKO (S-JKO) that leverages the semi-dual form of the JKO scheme to reduce training complexity from quadratic to linear. The model significantly outperforms existing Wasserstein Gradient Flow-based generative models, achieving competitive FID scores on benchmark datasets.
- Keywords: Generative modeling, Wasserstein Gradient Flow, JKO scheme, Unbalanced Optimal Transport, Image generation, High-resolution image datasets, Scalability challenges, Training complexity, Semi-dual JKO (S-JKO), Reduced training complexity, CIFAR-10, CelebA-HQ-256
PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning (Poster)
- Authors: Hyeong Kyu Choi, Sharon Li
- Affiliations: Department of Computer Sciences, University of Wisconsin–Madison, United States
- TL;DR: This study introduces Persona In-Context Learning (PICLe), a framework for eliciting specific personality traits from Large Language Models (LLMs) using a novel example selection criterion based on likelihood ratios. The effectiveness of PICLe is demonstrated through extensive comparisons with baseline methods across multiple contemporary LLMs.
- Keywords: Persona elicitation, Large Language Models, Persona In-Context Learning (PICLe), Bayesian inference, In-context learning (ICL), Eliciting diverse personas, behavioral preferences of LLMs, Likelihood-ratio-based selection mechanism, effective persona elicitation, Llama-2, Vicuna, GPT-J
BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks (Poster)
- Authors: Zhiyuan Cheng, Zhaoyi Liu, Tengda Guo, Shiwei Feng, Dongfang Liu, Mingjie Tang, Xiangyu Zhang
- Affiliations: Department of Computer Engineering, Rochester Institute of Technology, Rochester, USA, Department of Computer Science, Purdue University, West Lafayette, USA, College of Computer Science, Sichuan University, Chengdu, China
- TL;DR: This study introduces BADPART, a unified black-box adversarial patch attack framework targeting pixel-wise regression tasks, demonstrating significant vulnerabilities in models like monocular depth estimation and optical flow estimation. The framework outperforms existing methods and poses a serious threat to the security of applications in autonomous driving and augmented reality.
- Keywords: Adversarial Patch Attacks, Pixel-wise Regression Tasks, Query-based Black-box Attacks, Adversarial Patch Optimization, Probabilistic Square Sampling, Score-based Gradient Estimation, Monocular Depth Estimation (MDE), Optical Flow Estimation (OFE), Autonomous Driving, Augmented Reality, Video Composition, Adversarial Robustness, Black-box Vulnerabilities, Scalability Issues, BADPART Prototype, Attack Performance, Efficiency Improvements
MS-TIP: Imputation Aware Pedestrian Trajectory Prediction (Poster)
- Authors: Pranav Singh Chib, Achintya Nath, Paritosh Kabra, Ishu Gupta, Pravendra Singh
- Affiliations: Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India
- TL;DR: This study introduces MS-TIP, a novel approach for pedestrian trajectory prediction that effectively handles missing observations while predicting future trajectories. The method leverages advanced techniques like transformers and multi-scale hypergraphs to enhance prediction accuracy and model complex interactions among pedestrians.
- Keywords: pedestrian trajectory prediction, imputation of missing observations, transformers, diagonal masked self-attention, multi-scale hypergraphs, self-driving cars, human motion pattern analysis, missing values in observed sequences, inaccuracies in predicting final goals, recursive prediction errors, MultiScale hypergraph for Trajectory Imputation and Prediction (MS-TIP), accurate future trajectory inference
Creative Text-to-Audio Generation via Synthesizer Programming (Poster)
- Authors: Manuel Cherep, Nikhil Singh, Jessica Shand
- Affiliations: Media Lab, Massachusetts Institute of Technology, Cambridge MA, USA
- TL;DR: This study presents CTAG, a novel text-to-audio generation method that utilizes a virtual modular synthesizer to create high-quality, abstract audio representations from text prompts. The approach allows for easy parameter manipulation, offering a complementary tool to existing neural audio synthesis methods while emphasizing creative sound design over acoustic realism.
- Keywords: text-to-audio generation, sound design, neural audio synthesis, virtual modular synthesizer, procedural sound design, music, film, video games, advertising, product design, difficulty in tweaking neural audio synthesis results, large latent spaces, CTAG method, high-quality audio renderings, abstract sound representation
Online bipartite matching with imperfect advice (Poster)
- Authors: Davin Choo, Themis Gouleakis, Chun Kai Ling, Arnab Bhattacharyya
- Affiliations: Industrial Engineering and Operations Research, Columbia University, School of Computing, National University of Singapore
- TL;DR: This study investigates online bipartite matching with imperfect advice, demonstrating that while traditional algorithms achieve a competitive ratio of 1 - 1/e, no learning-augmented method can be both 1-consistent and better than 1/2-robust under adversarial conditions. The authors propose a new algorithm that leverages external advice to achieve competitive ratios that interpolate between advice-free methods and the optimal ratio of 1, depending on advice quality.
- Keywords: online bipartite matching, learning-augmented algorithms, RANKING algorithm, distribution testing, internet advertising, two-sided markets, competitive ratio, adversarial arrival model, random arrival model, algorithms utilizing external advice, competitive ratio improvement
Leveraging (Biased) Information: Multi-armed Bandits with Offline Data (Spotlight Poster)
- Authors: Wang Chi Cheung, Lixing Lyu
- Affiliations: Institute of Operations Research and Analytics, National University of Singapore, Singapore, Department of Industrial Systems Engineering and Management, National University of Singapore, Singapore
- TL;DR: This study explores the use of offline data to enhance online learning in stochastic multi-armed bandits, proposing a new policy called MIN-UCB that outperforms traditional UCB when the offline and online distributions are closely aligned. The findings indicate that leveraging offline data can significantly reduce exploration and improve cumulative rewards in decision-making processes.
- Keywords: multi-armed bandits, online learning, offline data, UCB policy, MIN-UCB, distributional shift, exploration vs. exploitation, regret bounds, adaptive policy
Enhancing Trajectory Prediction through Self-Supervised Waypoint Distortion Prediction (Poster)
- Authors: Pranav Singh Chib, Pravendra Singh
- Affiliations: Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India
- TL;DR: This study introduces a self-supervised approach called SSWDP to enhance trajectory prediction by effectively modeling the distortion in observed trajectories. Experimental results demonstrate significant improvements in prediction accuracy across multiple datasets, highlighting the method's effectiveness in complex environments.
- Keywords: Trajectory prediction, Self-supervised learning, Self-Supervised Waypoint Distortion Prediction (SSWDP), Generative Adversarial Networks (GANs), Conditional Variational Autoencoders (CVAE), Transformers, Autonomous driving, Robotics, Surveillance systems, Drones, Uncertainty in future trajectories, Complex spatio-temporal representations, Interaction between agents, Improvement in representation learning, Significant performance improvements in ADE/FDE metrics, NBA dataset, TrajNet++, ETH-UCY dataset, ADE (Average Displacement Error), FDE (Final Displacement Error)
Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning (Poster)
- Authors: Chia-Cheng Chiang, Li-Cheng Lan, Wei-Fang Sun, Chien Feng, Cho-Jui Hsieh, Chun-Yi Lee
- Affiliations: ELSA Lab, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, CA, USA, Department of Computer Science, University of California, Los Angeles, CA, USA
- TL;DR: This study introduces Transition Discriminator-based IL (TDIL) to enhance single-demonstration imitation learning by addressing the challenge of sparse reward signals through a denser surrogate reward function. The proposed method outperforms existing approaches and achieves expert-level performance in various environments.
- Keywords: single-demonstration imitation learning, imitation learning, Transition Discriminator-based IL (TDIL), inverse reinforcement learning (IRL), autonomous robots, surgical robots, vehicle control, sparse reward signals, reward sparsity, high-dimensional environments, denser surrogate reward function, transition discriminator, MuJoCo benchmarks, Adroit Door robotic environment
A connection between Tempering and Entropic Mirror Descent (Poster)
- Authors: Nicolas Chopin, Francesca R Crucinio, Anna Korba
- Affiliations: ENSAE, CREST, Institut Polytechnique de Paris
- TL;DR: This paper establishes a connection between tempering in Sequential Monte Carlo and entropic mirror descent, demonstrating that tempering can be viewed as a descent scheme of the KL divergence with respect to Fisher-Rao geometry. The findings include convergence rates for tempering iterates and the development of adaptive tempering rules that enhance existing benchmarks.
- Keywords: Tempering, Entropic Mirror Descent, Sequential Monte Carlo (SMC), Kullback-Leibler (KL) divergence, Fisher-Rao geometry, Wasserstein-2 geometry, Computational statistics, Machine learning, Sampling from target probability distributions, Optimization of probability distributions, Convergence rates for tempering iterates, Adaptive tempering rules
Prompt-tuning Latent Diffusion Models for Inverse Problems (Poster)
- Authors: Hyungjin Chung, Jong Chul YE, Peyman Milanfar, Mauricio Delbracio
- Affiliations: KAIST, Daejeon, Korea, Google Research, Mountain View, US
- TL;DR: This study introduces a novel method called P2L for solving imaging inverse problems by optimizing text prompts in conjunction with latent diffusion models. The approach significantly reduces image artifacts and outperforms existing methods in tasks like super-resolution, deblurring, and inpainting.
- Keywords: imaging inverse problems, latent diffusion models, prompt tuning, reverse diffusion, alternating minimization, super-resolution, deblurring, inpainting, suboptimal performance, image artifacts, P2L method, optimization framework
How Private are DP-SGD Implementations? (Oral)
- Authors: Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang
- Affiliations: Google Research
- TL;DR: This paper investigates the privacy guarantees of Differentially Private Stochastic Gradient Descent (DP-SGD) under different batch sampling methods, specifically shuffling and Poisson subsampling. The findings reveal a significant gap in privacy analysis between these methods, urging caution in reporting privacy parameters for DP-SGD implementations.
- Keywords: Differential Privacy, Stochastic Gradient Descent, Differentially Private Stochastic Gradient Descent (DP-SGD), Adaptive Batch Linear Queries (ABLQ), Machine Learning, Neural Networks, Privacy guarantees, privacy analysis, batch sampling, Comparison of privacy guarantees between shuffling and Poisson subsampling, TensorFlow Privacy, PyTorch Opacus, JAX Privacy
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts (Poster)
- Authors: Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, Christopher Carothers
- Affiliations: IBM Research, Yorktown Heights, NY, USA, Department of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Institute, NY, USA, Department of Computer Science, Rensselaer Polytechnic Institute, NY, USA
- TL;DR: This paper presents a novel method for pruning experts in fine-tuned sparse mixture-of-experts (MoE) models, demonstrating that prioritizing the pruning of experts with minimal changes to the router's L2 norm preserves test accuracy while significantly reducing model size and computational requirements. The method is validated on large vision MoE models using benchmark datasets like CIFAR-10, CIFAR-100, and ImageNet.
- Keywords: Sparse Mixture-of-Experts (MoE), Model Pruning, Trainable Routers, L2 Norm Analysis, Large Vision Models, Resource-Constrained Environments, High Memory Requirements, Inference Computation Costs, Efficient Expert Pruning Technique, Preservation of Test Accuracy, CIFAR-10, CIFAR-100, ImageNet, Transformer Architecture, Feed-Forward Networks (FFN)
$\mathtt{VITS}$ : Variational Inference Thompson Sampling for contextual bandits (Poster)
- Authors: Pierre Clavier, Tom Huix, Alain Oliviero Durmus
- Affiliations: CMAP, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris, 91120 Palaiseau, France, CMAP, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris, 91120 Palaiseau, France; Inria Paris, 75015 Paris, France; Centre de Recherche des Cordeliers, INSERM, Universite de Paris, Sorbonne Universite, 75006 Paris, France
- TL;DR: This paper introduces Variational Inference Thompson Sampling (VITS), a new algorithm for contextual bandits that efficiently approximates posterior distributions and achieves a sub-linear regret bound. The effectiveness of VITS is demonstrated through experiments on both synthetic and real-world datasets.
- Keywords: Contextual Bandits, Thompson Sampling, Variational Inference, Gaussian Variational Inference, Recommender Systems, Mobile Health, Finance, Intractable posterior distribution, Exploration vs. Exploitation trade-off, Variational Inference TS (VITS), Sub-linear regret bound, Synthetic datasets, Real-world datasets, Multi-Armed Bandit (MAB), Posterior sampling
Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens (Poster)
- Authors: Ross Clarke, Jose Miguel Hernandez-Lobato
- Affiliations: University of Cambridge
- TL;DR: This study investigates the effectiveness of K-FAC's heuristics when applied to the Adam optimizer, introducing AdamQLR, a damped learning rate strategy. The findings suggest that an untuned AdamQLR can achieve performance comparable to tuned benchmarks, highlighting the variable effectiveness of K-FAC's adaptive heuristics.
- Keywords: optimisation for deep learning, first-order methods, second-order methods, Adam, K-FAC, quasi-Newton methods, SGD, regression tasks, classification tasks, computational efficiency, stability issues in optimisation, AdamQLR, damped automatic learning rate strategy, adaptive heuristics
Improving Token-Based World Models with Parallel Observation Prediction (Poster)
- Authors: Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor
- Affiliations: ByteDance, Technion – Israel Institute of Technology
- TL;DR: This study introduces a novel mechanism called Parallel Observation Prediction (POP) to enhance token-based world models, significantly improving imagination speed and sample efficiency in reinforcement learning. The proposed agent, REM, achieves superhuman performance in multiple Atari games while training in under 12 hours.
- Keywords: token-based world models, reinforcement learning, Parallel Observation Prediction (POP), Retentive Network (RetNet), visual environments, Atari games, sample efficiency, data demands, bottleneck in imagination, REM (Retentive Environment Model), faster imagination, superhuman performance, Atari 100K benchmark, Transformers, discrete tokens, tokenizer
Statistical Inference Under Constrained Selection Bias (Poster)
- Authors: Santiago Cortes-Gomez, Mateo Dulce Rubio, Carlos Miguel Patiño, Bryan Wilder
- Affiliations: Department of Machine Learning, Carnegie Mellon University, Department of Statistics, Carnegie Mellon University, Factored AI
- TL;DR: The study proposes a framework for statistical inference that accounts for selection bias under user-specified constraints, aiming to provide high-probability bounds on estimands for target distributions. The method leverages domain knowledge to partially identify estimands and demonstrates effectiveness through various simulated and real-world applications.
- Keywords: Statistical inference, selection bias, distribution shifts, Public health, policy analysis, epidemiological studies, Selection bias, distributional shifts, confounding, High-probability bounds, estimands
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation (Poster)
- Authors: Juntao Dai, Yaodong Yang, Qian Zheng, Gang Pan
- Affiliations: College of Computer Science and Technology, Zhejiang University, Hangzhou, China; The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China, Center for AI Safety and Governance, Peking University, Beijing, China
- TL;DR: This study introduces a novel method for estimating constraints in Safe Reinforcement Learning, addressing the limitations of existing approaches that rely on infinite-horizon assumptions. The proposed Gradient-based Estimation method leads to the development of the Constrained Gradient-based Policy Optimization algorithm, which effectively ensures safe and efficient policy updates in finite-horizon scenarios.
- Keywords: Safe Reinforcement Learning, Policy Optimization, Gradient-based Estimation, Advantage-based Estimation, Autonomous Driving, Service Robots, Safety-violation updates, Finite-horizon constraints, Constrained Gradient-based Policy Optimization (CGPO), Estimation of constraint changes, Constrained Markov Decision Process (CMDP), Markov Decision Process (MDP)
A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering (Poster)
- Authors: Vincent Cohen-Addad, Tommaso d'Orsi, Aida Mousavifar
- Affiliations: Google Research, Google Research; BIDSA, Bocconi
- TL;DR: This paper presents a near-linear time approximation algorithm for the Balanced Cut problem in semi-random graphs, achieving an approximation value of O(α). The findings suggest that the algorithm can be extended to related problems like the Sparsest Cut and hierarchical clustering.
- Keywords: Graph clustering, Graph partitioning, Beyond-worst-case complexity, Near-linear time algorithm, Polynomial time algorithm, Semidefinite programming, Data mining, Unsupervised machine learning, Balanced Cut problem, Sparsest Cut problem, Complexity of graph partitioning, O(α) approximation, O(1)-approximation to Dagupta’s objective function, Semi-random graph model, Bipartite graph, Stochastic block model
Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Features Model (Poster)
- Authors: Hien Dang, Tho Tran Huu, Tan Nguyen, Nhat Ho
- Affiliations: FPT Software AI Center, Vietnam, Department of Statistics and Data Sciences, University of Texas at Austin, USA; FPT Software AI Center, Vietnam, Department of Mathematics, National University of Singapore, Singapore, Department of Statistics and Data Sciences, University of Texas at Austin, USA
- TL;DR: This study generalizes the Neural Collapse phenomenon to class-imbalanced datasets, demonstrating that while within-class features still collapse, class-means converge to a structure of orthogonal vectors influenced by the number of training samples. The findings highlight the alignment of classifier weights with scaled and centered class-means, addressing challenges in training deep neural networks under imbalanced conditions.
- Keywords: Neural Collapse, Class-Imbalanced Learning, Deep Learning, Cross-entropy loss, Unconstrained ReLU features model, Classification tasks, Class imbalance, Training loss minimization, Drop in accuracy for minority classes, Generalization of Neural Collapse to imbalanced datasets, Class-mean convergence structure, Equiangular Tight Frame (ETF), Class-means
Position: Beyond Personhood: Agency, Accountability, and the Limits of Anthropomorphic Ethical Analysis (Oral)
- Authors: Jessica Dai
- Affiliations: Department of Computer Science, University of California Berkeley
- TL;DR: This paper explores the concept of agency in AI, contrasting mechanistic and volitional views, and argues that AI should not be considered an ethical agent but rather as a product of political processes. The findings highlight the limitations of viewing AI through a human-like ethical lens and emphasize the need for a broader understanding of accountability in AI systems.
- Keywords: Agency, Ethical AI, Political Processes, Mechanistic Agency, Volitional Agency, Ethical considerations in AI, Accountability, Ethical characteristics of AI, Human-like behavior
Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis (Poster)
- Authors: Daniel Csillag, Claudio Struchiner, Guilherme Goedert
- Affiliations: School of Applied Mathematics, Fundac¸˜ao Get´ulio Vargas, Rio de Janeiro, Brazil
- TL;DR: This paper proposes a theory based on generalization bounds for causal machine learning, providing guarantees on model loss even in the presence of hidden confounding and violations of positivity. The authors introduce a novel change-of-measure inequality that allows for tight bounds on causal loss, demonstrating its effectiveness on both semi-synthetic and real data.
- Keywords: Causal machine learning, Generalization bounds, Change-of-measure inequality, Pearson χ2 divergence, Economics, Medicine, Education research, Hidden confounding, Violations of positivity, Treatment propensities, Model loss bounds, Causal loss estimation, Semi-synthetic data, Real data, Outcome regression, Individual treatment effect estimation
Weighted distance nearest neighbor condensing (Poster)
- Authors: Lee-Ad Gottlieb, Timor Sharabi, Roi Weiss
- Affiliations: Department of Computer Science, Ariel University, Ariel, Israel
- TL;DR: This paper introduces the weighted distance nearest neighbor condensing problem, which assigns weights to points in a condensed set to improve classification accuracy. The study demonstrates that this approach can achieve significantly better condensing than traditional methods while maintaining similar generalization bounds.
- Keywords: Nearest neighbor condensing, Weighted distance nearest neighbor, Weighted distance function, Condensing heuristic, Classification, Sample compression, Nearest neighbor rule disadvantages, Improved condensing, Generalization bounds
Dynamic Correlation Clustering in Sublinear Update Time (Spotlight Poster)
- Authors: Vincent Cohen-Addad, Silvio Lattanzi, Andreas Maggiori, Nikos Parotsidis
- Affiliations: Google Research, Columbia University
- TL;DR: This paper presents an algorithm for dynamic correlation clustering in node streams, achieving an O(1)-approximation with O(polylog n) amortized update time. The study addresses the challenge of continuously partitioning nodes to minimize negative edges within clusters and positive edges between clusters.
- Keywords: correlation clustering, dynamic node streams, O(1)-approximation algorithm, O(polylog n) amortized update time, clustering, machine learning, data analysis, minimizing negative edges within clusters, minimizing positive edges crossing clusters, new algorithm for dynamic correlation clustering
A2Q+: Improving Accumulator-Aware Weight Quantization (Poster)
- Authors: Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig, Yaman Umuroglu
- Affiliations: AMD SW Technology Team, San Diego, California, USA, AMD Research and Advanced Development, Dublin, Ireland
- TL;DR: This study introduces A2Q+, an improved method for accumulator-aware weight quantization that alleviates constraints and enhances model accuracy while avoiding numerical overflow. The results demonstrate that ResNet50 can maintain 95% of its baseline accuracy with 12-bit accumulation, achieving a 17% improvement in test top-1 accuracy over previous methods.
- Keywords: Quantization, Neural Networks, Accumulator-aware quantization (A2Q), Weight normalization, Numerical overflow, Quantization error, A2Q+, Improved ℓ1-norm bound, Weight initialization strategy, ImageNet, ResNet50
Multi-View Stochastic Block Models (Poster)
- Authors: Vincent Cohen-Addad, Tommaso d'Orsi, Silvio Lattanzi, Rajai Nasser
- Affiliations: Google Research, Google Research; BIDSA, Bocconi
- TL;DR: This paper introduces a new family of models called multi-view stochastic block models for graph clustering, focusing on leveraging multiple data sources to improve clustering outcomes. The authors present efficient algorithms that outperform previous methods and provide theoretical insights into the model's limitations.
- Keywords: graph clustering, multi-view clustering, multi-view stochastic block models, efficient algorithms, data mining, social sciences, statistics, social network analysis, clustering structure recovery, partial information from multiple graphs, new efficient algorithm, information-theoretic lower bound, stochastic block model
Harmonizing Generalization and Personalization in Federated Prompt Learning (Poster)
- Authors: Tianyu Cui, Hongxia Li, Jingya Wang, Ye Shi
- Affiliations: ShanghaiTech University
- TL;DR: This study introduces Federated Prompt Learning with CLIP Generalization and low-rank Personalization (FedPGP) to balance personalization and generalization in federated learning. The proposed method demonstrates superior performance in addressing data heterogeneity and improving generalization capabilities across various datasets.
- Keywords: Federated Learning, Vision-Language Models, Personalization, Generalization, Prompt Tuning, CLIP, Low-Rank Adaptation, Contrastive Loss, Data Heterogeneity, Model Overfitting, Generalization to Unseen Domains, Federated Prompt Learning with CLIP Generalization and low-rank Personalization (FedPGP), Vision-Language Models (VLM), Federated Prompt Learning (FPL)
A decoder-only foundation model for time-series forecasting (Poster)
- Authors: Abhimanyu Das, Weihao Kong, Rajat Sen, Yichen Zhou
- Affiliations: Google Research
- TL;DR: This paper presents TimesFM, a decoder-only foundation model for time-series forecasting that achieves near state-of-the-art zero-shot performance across various datasets. The model leverages a large-scale time-series corpus and demonstrates effectiveness in handling diverse forecasting scenarios without additional training.
- Keywords: time-series forecasting, foundation model, decoder style attention model, input patching, retail, finance, manufacturing, healthcare, natural sciences, zero-shot forecasting, varying history lengths, prediction lengths, time granularities, TimesFM model, state-of-the-art zero-shot accuracy, real-world datasets, synthetic datasets
Conformal Prediction Sets Improve Human Decision Making (Poster)
- Authors: Jesse Cresswell, yi sui, Bhargava Kumar, Noël Vouitsis
- Affiliations: Layer 6 AI, TD Securities
- TL;DR: This study investigates the effectiveness of conformal prediction sets in enhancing human decision-making accuracy. The findings indicate that providing calibrated prediction sets significantly improves task performance compared to fixed-size prediction sets.
- Keywords: Conformal prediction, Human decision making, Conformal prediction, Prediction sets, Human-in-the-loop decision making, Human-AI teams, Uncertainty quantification, Lack of alternative predictions, Improved accuracy with conformal prediction sets
Test-Time Degradation Adaptation for Open-Set Image Restoration (Spotlight Poster)
- Authors: Yuanbiao Gou, Haiyu Zhao, Boyun Li, Xinyan Xiao, Xi Peng
- Affiliations: Baidu Inc., Beijing, China, College of Computer Science, Sichuan University, Chengdu, China
- TL;DR: This study introduces a test-time degradation adaptation framework for open-set image restoration, addressing the challenge of handling unknown degradations not seen during pretraining. The proposed method demonstrates comparable or superior performance to task-specific methods through experiments on multiple degradations.
- Keywords: open-set image restoration, image restoration, test-time adaptation, degradation-agnostic diffusion model, image restoration, unknown degradations, distribution shifts, test-time degradation adaptation framework
New Bounds on the Cohesion of Complete-link and Other Linkage Methods for Agglomerative Clustering (Poster)
- Authors: Sanjoy Dasgupta, Eduardo Laber
- Affiliations: University of California San Diego, USA, Departamento de Informática, PUC-RIO, Brazil
- TL;DR: This study presents new bounds on the maximum diameter of clustering produced by complete-linkage methods in metric spaces, demonstrating that complete-linkage is more effective than single-linkage for creating compact clusters. The findings improve existing bounds and provide insights into the cohesion of various linkage methods, including average-linkage.
- Keywords: hierarchical clustering, agglomerative clustering, complete-linkage, average-linkage, minimax, exploratory analysis, computational resource reduction, clustering quality, maximum diameter, compact clusters, new bounds on clustering diameter, improved analysis
High-Order Contrastive Learning with Fine-grained Comparative Levels for Sparse Ordinal Tensor Completion (Poster)
- Authors: Yu Dai, Junchen Shen, Zijie Zhai, Danlin Liu, Jingyang Chen, Yu Sun, Ping Li, Jie Zhang, Kai Zhang
- Affiliations: Indeed Inc., Sunnyvale, United States, School of Computer Science and Technology, East China Normal University, Shanghai, China, Southwest Petroleum University, Chengdu, China, Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai, China
- TL;DR: The study introduces High-Order Contrastive Tensor Completion (HOCTC), a novel approach to extend contrastive learning for sparse ordinal tensor regression, effectively capturing high-order interactions and improving representation learning. Experiments demonstrate its effectiveness in traffic monitoring and recommendation systems.
- Keywords: Contrastive learning, Sparse ordinal tensor completion, High-Order Contrastive Tensor Completion (HOCTC), Attention-based strategy, Query-expansion, Traffic monitoring, Recommender systems, High-dimensional tensor interactions, Data sparsity, Negative sample growth, Efficient sampling scheme, Self-supervised signals for high-order representation learning
Asymptotically Optimal and Computationally Efficient Average Treatment Effect Estimation in A/B testing (Poster)
- Authors: VIKAS DEEP, Achal Bassamboo, Sandeep Juneja
- Affiliations: Kellogg School of Management, Northwestern University, Evanston, IL 60201, Ashoka University, Sonipat, Haryana, India
- TL;DR: This study focuses on estimating the average treatment effect (ATE) in A/B testing by developing adaptive policies that minimize expected sample size while ensuring a confidence interval with specified coverage guarantees. The findings reveal that both the proposed asymptotically optimal and an alternative computationally efficient policy perform similarly across practical values, providing effective solutions for A/B testing scenarios.
- Keywords: A/B testing, average treatment effect (ATE), confidence interval (CI), adaptive policy, max-min optimization, clinical trials, online platforms, estimating ATE, minimizing expected sample size, coverage guarantee, asymptotically optimal policies, computationally efficient methods
Provably Better Explanations with Optimized Aggregation of Feature Attributions (Poster)
- Authors: Thomas Decker, Ananta Bhattarai, Jindong Gu, Volker Tresp, Florian Buettner
- Affiliations: LMU Munich; Munich Center for Machine Learning (MCML), University of Oxford, LMU Munich; Siemens AG, Siemens AG; Technical University of Munich, Siemens AG; Goethe University Frankfurt; German Cancer Research Center (DKFZ)
- TL;DR: This study proposes a novel approach to improve the quality of feature attributions in machine learning by combining multiple explanation methods through optimal convex combinations. The results demonstrate significant enhancements in robustness and faithfulness compared to individual attribution methods.
- Keywords: Explainability, Feature Attribution, Machine Learning Transparency, Convex Combinations, Feature Attribution Techniques, Reliability of Feature Attributions, Sensitivity to Input Perturbations, Improved Robustness, Faithfulness of Explanations
ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback (Poster)
- Authors: Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Bingxiang He, Wei Zhu, Yuan Ni, Guotong Xie, Ruobing Xie, Yankai Lin, Zhiyuan Liu, Maosong Sun
- Affiliations: NLP Group, DCST, IAI, BNRIST, Tsinghua University; ModelBest.Inc, Tencent, NLP Group, DCST, IAI, BNRIST, Tsinghua University, Jiangsu Collaborative Innovation Center for Language Ability, Renmin University of China, University of Illinois Urbana-Champaign, PingAn Technology
- TL;DR: This study introduces ULTRAFEEDBACK, a large-scale AI feedback dataset aimed at enhancing the alignment of large language models with human preferences by automating feedback collection. The findings demonstrate that scaled AI feedback can effectively support the development of robust open-source chat language models.
- Keywords: AI feedback, human feedback, language model alignment, reinforcement learning, best-of-n sampling, open-source chat language models, data scarcity, annotation biases, alignment with human preferences, ULTRAFEEDBACK dataset, scalable AI feedback, ULTRAFEEDBACK, GPT-4, large language models (LLMs), user-assistant interactions
Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed Environments (Poster)
- Authors: Antoine Dedieu, Wolfgang Lehrach, Guangyao Zhou, Dileep George, Miguel Lazaro-Gredilla
- Affiliations: Google DeepMind
- TL;DR: This study introduces a transformer variant that learns cognitive maps from observations in partially observed environments, enabling efficient path planning. The proposed model retains high predictive performance while significantly improving the speed of solving shortest path problems compared to traditional methods.
- Keywords: Partially Observed Environments (POEs), Cognitive Maps, Path Planning, Transformer with discrete bottlenecks (TDB), Next-token prediction, In-context learning (ICL), Navigation, Reinforcement Learning, Natural Language Processing, Path planning in POEs, Perceptual aliasing, Disambiguation of spatial positions, Compressed representation of observations and actions, Interpretable cognitive maps, Efficient path planning, Large Language Models (LLMs), Action-conditioned latent graph
Multi-View Clustering by Inter-cluster Connectivity Guided Reward (Poster)
- Authors: Hao Dai, Yang Liu, Peng Su, Hecheng Cai, Shudong Huang, Jiancheng Lv
- Affiliations: College of Computer Science, Sichuan University, Chengdu 610065, China; Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, Chengdu 610065, China
- TL;DR: This paper presents a novel graph-based multi-view clustering algorithm that infers the unknown number of clusters (k) using a graph consistency reward mechanism. The proposed method effectively produces clustering results without requiring a predefined cluster number, demonstrating superior performance on benchmark datasets compared to existing methods.
- Keywords: Multi-view clustering, Graph-based clustering, Reinforcement learning, Graph consistency reward mechanism, Unknown number of clusters (k), Inter-cluster connectivity, Novel clustering algorithm, Effective clustering results without predefined k, Benchmark datasets
Boosting Offline Optimizers with Surrogate Sensitivity (Poster)
- Authors: Cuong Dao, Phi Le Nguyen, Thao Nguyen Truong, Nghia Hoang
- Affiliations: School of Electrical Engineering and Computer Science, Washington State University, Washington, USA, School of Information and Communications Technology, Hanoi University of Science and Technology, Hanoi, Vietnam, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
- TL;DR: This study addresses the challenges of offline optimization in material engineering by developing a sensitivity measurement for surrogate models, which helps regulate their sensitivity and improve optimization performance. The findings suggest that conditioning offline optimizers with less sensitive surrogates can lead to more reliable predictions and better outcomes in material design.
- Keywords: Offline optimization, material engineering, Surrogate modeling, sensitivity measurement, sensitivity-informed regularizer, Material design, computational optimization, Model sensitivity, unreliable predictions, out-of-distribution data, Optimizable sensitivity measurement, improved optimization performance, Black-box function, parameterization
Global Reinforcement Learning : Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods (Poster)
- Authors: Riccardo De Santi, Manish Prajapat, Andreas Krause
- Affiliations: ETH Zurich; ETH AI Center
- TL;DR: This paper introduces Global Reinforcement Learning (GRL) to address the limitations of classic RL in modeling complex interactions between states by defining rewards over trajectories. The authors propose a novel algorithmic scheme that efficiently converts GRL problems into classic RL problems, demonstrating its effectiveness through empirical results.
- Keywords: Global Reinforcement Learning, interactions between states, submodular optimization, semi-gradient methods, experiment design, exploration, imitation learning, risk-averse RL, additive objectives, negative interactions, synergetic effects, curvature-dependent approximation guarantees, hardness of approximation results
Geometric Active Exploration in Markov Decision Processes: the Benefit of Abstraction (Poster)
- Authors: Riccardo De Santi, Federico Arangath Joseph, Noah Liniger, Mirco Mutti, Andreas Krause
- Affiliations: Technion, Haifa, Israel, Department of Computer Science, ETH Zurich, Zurich, Switzerland, Department of Computer Science, ETH Zurich, ETH AI Center, Zurich, Switzerland
- TL;DR: This study proposes the Geometric Active Exploration (GAE) algorithm to enhance the efficiency of Active Exploration in Markov Decision Processes by leveraging geometric structures. The findings indicate that abstraction through MDP homomorphisms significantly improves sample efficiency in experimental design tasks within scientific discovery contexts.
- Keywords: Active Exploration, Optimal Experimental Design, Reinforcement Learning, Convex Reinforcement Learning, MDP Homomorphisms, Scientific Discovery, Environmental Sensing, Scalability of Active Exploration, Uncertainty Minimization, Geometric Active Exploration (GAE) algorithm, Sample Efficiency
Prediction-powered Generalization of Causal Inferences (Poster)
- Authors: Ilker Demirel, Ahmed Alaa, Anthony Philippakis, David Sontag
- Affiliations: Department of Computational Precision Health, UC Berkeley and UCSF, MIT CSAIL, Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, MIT CSAIL; Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard
- TL;DR: This study addresses the challenge of generalizing causal inferences from randomized controlled trials to target populations with different distributions of effect modifiers. The authors develop prediction-powered algorithms that enhance generalization by integrating trial data with observational study data, demonstrating improved performance even in the presence of unmeasured confounding.
- Keywords: Causal inference, Generalization of trial results, Generalization algorithms, Prediction models, Randomized controlled trials, Observational studies, Limited external validity, Confounding bias, Estimation of causal effects, Prediction-powered estimators, Improved generalization methods, Randomized controlled trials (RCT), Average treatment effect (ATE), Effect modifiers
Predicting Lagrangian Multipliers for Mixed Integer Linear Programs (Poster)
- Authors: Francesco Demelas, Joseph Roux, Mathieu Lacroix, Axel Parmentier
- Affiliations: CERMICS, École des Ponts, France, Laboratoire d’Informatique de Paris-Nord, Université Sorbonne Paris Nord — CNRS, France
- TL;DR: This study introduces a deep learning approach to predict Lagrangian Multipliers for Mixed Integer Linear Programs, effectively optimizing the computational burden associated with traditional iterative methods. The proposed method significantly narrows the gap between continuous relaxation and the best Lagrangian bound, providing a valuable warm-start for further optimization techniques.
- Keywords: Lagrangian Relaxation, Mixed Integer Linear Programs (MILPs), deep learning, graph neural network, probabilistic encoder, decoder, Multi-Commodity Network Design, Generalized Assignment, difficult constraints, optimization, computational burden, high-dimensional representations, warm-start for descent-based methods, Lagrangian Multipliers (LMs), Continuous Relaxation (CR)
Exploring the Low-Pass Filtering Behavior in Image Super-Resolution (Poster)
- Authors: Haoyu Deng, Zijing Xu, Yule Duan, Xiao Wu, Wen-Jie Shu, Liang-Jian Deng
- Affiliations: University of Electronic Science and Technology of China
- TL;DR: This paper investigates the behavior of deep neural networks in image super-resolution (ISR) by interpreting their functionality through signal processing theories, revealing that these networks can be decomposed into linear and non-linear systems. The authors introduce a new metric, Frequency Spectrum Distribution Similarity (FSDS), to quantify high-frequency information injected by the networks, addressing the interpretability issues associated with neural networks in ISR tasks.
- Keywords: Image Super-Resolution (ISR), Deep Neural Networks, Hybrid Response Analysis (HyRA), Low-Pass Filtering, Image Processing, Black Box Nature of Neural Networks, Interpretation of Neural Network Behavior, Frequency Spectrum Distribution Similarity (FSDS), Sinc Phenomenon, Linear System, Non-Linear System, Signal Processing
Network Tight Community Detection (Poster)
- Authors: Jiayi Deng, Xiaodong Yang, Jun Yu, Jun Liu, Zhaiming Shen, Danyang Huang, Huimin Cheng
- Affiliations: School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China, Center for Applied Statistics and School of Statistics, Renmin University of China, Beijing, China, Department of Biostatistics, Boston University, Boston MA, USA, Department of Mathematics, University of Georgia, Athens GA, USA, Department of Statistics, Harvard University, Cambridge MA, USA
- TL;DR: This study introduces a tight community detection (TCD) method that effectively identifies tight communities in networks while excluding noninformative scattered nodes. The proposed method demonstrates strong theoretical guarantees and scalability, addressing biases introduced by conventional community detection methods.
- Keywords: community detection, network analysis, tight community detection (TCD), spectral clustering, biological networks, social networks, scattered nodes, community structure identification, identification accuracy, scalability for large networks, tight nodes, scattered nodes
An Unsupervised Approach for Periodic Source Detection in Time Series (Poster)
- Authors: Berken Utku Demirel, Christian Holz
- Affiliations: Department of Computer Science, ETH Zurich, Switzerland
- TL;DR: This study presents a novel unsupervised method for detecting periodic patterns in noisy time series data without requiring labeled data or complex augmentations. The proposed approach significantly outperforms existing methods, achieving over 45-50% performance improvements in various tasks.
- Keywords: Periodic pattern detection, Time series analysis, Health monitoring, Behavior analysis, Unsupervised learning, Self-supervised learning, Data representation learning, Health monitoring, Behavior analysis, Time series data analysis, Noisy time series data, Lack of labeled data, Data augmentation challenges, Model collapse, Novel method for periodicity detection, Performance improvements over state-of-the-art methods
Going beyond Compositions, DDPMs Can Produce Zero-Shot Interpolations (Poster)
- Authors: Justin Deschenaux, Igor Krawczuk, Grigorios Chrysos, Volkan Cevher
- Affiliations: LIONS, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, Department of Computer Science, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, Department of Electrical and Computer Engineering, University of Wisconsin-Madison, USA
- TL;DR: This study investigates the capabilities of Denoising Diffusion Probabilistic Models (DDPMs) to generate images in intermediate regions of the data distribution, demonstrating zero-shot interpolation between distinct attributes such as smiling and non-smiling faces. The findings suggest that DDPMs can effectively generalize beyond their training data, which has implications for fairness and bias mitigation in generative models.
- Keywords: Denoising Diffusion Probabilistic Models (DDPMs), image generation, compositionality, Generating images in unexplored, intermediate regions of the distribution, zero-shot interpolation, latent factors, generative models, interpolation, fairness, bias mitigation
Collaborative Learning with Different Labeling Functions (Poster)
- Authors: yuyang deng, Mingda Qiao
- Affiliations: Pennsylvania State University, State College, PA, USA, University of California, Berkeley, Berkeley, CA, USA
- TL;DR: This study explores a variant of Collaborative PAC Learning to develop accurate classifiers for multiple data distributions with different labeling functions while minimizing sample

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICML 2024 Paper Summaries

Papers List

About

Releases

Packages

ylhz/ICML2024-paperlist

Folders and files

Latest commit

History

Repository files navigation

ICML 2024 Paper Summaries

Papers List

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages