From 8e41fd692b12293b3a8cbc56ac9a8886fc7172b6 Mon Sep 17 00:00:00 2001 From: Automated Date: Tue, 7 May 2024 09:05:58 +0000 Subject: [PATCH] Latest data: Tue May 7 09:05:58 UTC 2024 --- index.html | 332 ++++++++--------------------------------------------- 1 file changed, 46 insertions(+), 286 deletions(-) diff --git a/index.html b/index.html index 34628e90..b1acb13d 100644 --- a/index.html +++ b/index.html @@ -25,7 +25,7 @@

Tim's Arxiv FrontPage


-

Generated on 2024-05-06.


+

Generated on 2024-05-07.


This frontpage is generated by scraping new papers on Arxiv and using an embedding model to find papers matching topics I'm interested in. Currently, the false positive rate is fairly high. The repo is here. Forked and customized from this project


@@ -36,20 +36,20 @@

Artificial General Intelligen -

2024-05-03

+

2024-05-06

- The Cambridge RoboMaster: An Agile Multi-Robot Research Platform + Artificial Intelligence in the Autonomous Navigation of Endovascular Interventions: A Systematic Review

-

Compact robotic platforms with powerful compute and actuation capabilities are key enablers for practical, real-world deployments of multi-agent research. 0.831This article introduces a tightly integrated hardware, control, and simulation software stack on a fleet of holonomic ground robot platforms designed with this motivation.Our robots, a fleet of customised DJI Robomaster S1 vehicles, offer a balance between small robots that do not possess sufficient compute or actuation capabilities and larger robots that are unsuitable for indoor multi-robot tests.They run a modular ROS2-based optimal estimation and control stack for full onboard autonomy, contain ad-hoc peer-to-peer communication infrastructure, and can zero-shot run multi-agent reinforcement learning (MARL) policies trained in our vectorized multi-agent simulation framework.We present an in-depth review of other platforms currently available, showcase new experimental validation of our system's capabilities, and introduce case studies that highlight the versatility and reliabilty of our system as a testbed for a wide range of research demonstrations.Our system as well as supplementary material is available online: https://proroklab.github.io/cambridge-robomaster

+

Purpose: Autonomous navigation of devices in endovascular interventions can decrease operation times, improve decision-making during surgery, and reduce operator radiation exposure while increasing access to treatment.This systematic review explores recent literature to assess the impact, challenges, and opportunities artificial intelligence (AI) has for the autonomous endovascular intervention navigation. 0.831Methods: PubMed and IEEEXplore databases were queried.Eligibility criteria included studies investigating the use of AI in enabling the autonomous navigation of catheters/guidewires in endovascular interventions.Following PRISMA, articles were assessed using QUADAS-2.PROSPERO: CRD42023392259. Results:Among 462 studies, fourteen met inclusion criteria.Reinforcement learning (9/14, 64%) and learning from demonstration (7/14, 50%) were used as data-driven models for autonomous navigation.Studies predominantly utilised physical phantoms (10/14, 71%) and in silico (4/14, 29%) models.Experiments within or around the blood vessels of the heart were reported by the majority of studies (10/14, 71%), while simple non-anatomical vessel platforms were used in three studies (3/14, 21%), and the porcine liver venous system in one study.We observed that risk of bias and poor generalisability were present across studies.No procedures were performed on patients in any of the studies reviewed.Studies lacked patient selection criteria, reference standards, and reproducibility, resulting in low clinical evidence levels. Conclusions: AI's potential in autonomous endovascular navigation is promising, but in an experimental proof-of-concept stage, with a technology readiness level of 3.We highlight that reference standards with well-identified performance metrics are crucial to allow for comparisons of data-driven algorithms proposed in the years to come.

- + link

@@ -58,20 +58,20 @@

Artificial General Intelligen -

2024-05-03

+

2024-05-06

- Adversarial Botometer: Adversarial Analysis for Social Bot Detection + Functional Equivalence with NARS

-

Social bots play a significant role in many online social networks (OSN) as they imitate human behavior.This fact raises difficult questions about their capabilities and potential risks.Given the recent advances in Generative AI (GenAI), social bots are capable of producing highly realistic and complex content that mimics human creativity. 0.849As the malicious social bots emerge to deceive people with their unrealistic content, identifying them and distinguishing the content they produce has become an actual challenge for numerous social platforms.Several approaches to this problem have already been proposed in the literature, but the proposed solutions have not been widely evaluated.To address this issue, we evaluate the behavior of a text-based bot detector in a competitive environment where some scenarios are proposed: \textit{First}, the tug-of-war between a bot and a bot detector is examined.It is interesting to analyze which party is more likely to prevail and which circumstances influence these expectations.In this regard, we model the problem as a synthetic adversarial game in which a conversational bot and a bot detector are engaged in strategic online interactions. 0.827\textit{Second}, the bot detection model is evaluated under attack examples generated by a social bot; to this end, we poison the dataset with attack examples and evaluate the model performance under this condition.\textit{Finally}, to investigate the impact of the dataset, a cross-domain analysis is performed.Through our comprehensive evaluation of different categories of social bots using two benchmark datasets, we were able to demonstrate some achivement that could be utilized in future works.

+

This study explores the concept of functional equivalence within the framework of the Non-Axiomatic Reasoning System (NARS), specifically through OpenNARS for Applications (ONA).Functional equivalence allows organisms to categorize and respond to varied stimuli based on their utility rather than perceptual similarity, thus enhancing cognitive efficiency and adaptability.In this study, ONA was modified to allow the derivation of functional equivalence.This paper provides practical examples of the capability of ONA to apply learned knowledge across different functional situations, demonstrating its utility in complex problem-solving and decision-making.An extended example is included, where training of ONA aimed to learn basic human-like language abilities, using a systematic procedure in relating spoken words, objects and written words. 0.822The research carried out as part of this study extends the understanding of functional equivalence in AGI systems, and argues for its necessity for level of flexibility in learning and adapting necessary for human-level AGI. 0.846

- + link

@@ -80,20 +80,20 @@

Artificial General Intelligen -

2024-05-03

+

2024-05-06

- Comparative Analysis of Retrieval Systems in the Real World + Embedded Distributed Inference of Deep Neural Networks: A Systematic Review

-

This research paper presents a comprehensive analysis of integrating advanced language models with search and retrieval systems in the fields of information retrieval and natural language processing. 0.825The objective is to evaluate and compare various state-of-the-art methods based on their performance in terms of accuracy and efficiency.The analysis explores different combinations of technologies, including Azure Cognitive Search Retriever with GPT-4, Pinecone's Canopy framework, Langchain with Pinecone and different language models (OpenAI, Cohere), LlamaIndex with Weaviate Vector Store's hybrid search, Google's RAG implementation on Cloud VertexAI-Search, Amazon SageMaker's RAG, and a novel approach called KG-FID Retrieval.The motivation for this analysis arises from the increasing demand for robust and responsive question-answering systems in various domains. 0.823The RobustQA metric is used to evaluate the performance of these systems under diverse paraphrasing of questions.The report aims to provide insights into the strengths and weaknesses of each method, facilitating informed decisions in the deployment and development of AI-driven search and retrieval systems.

+

Embedded distributed inference of Neural Networks has emerged as a promising approach for deploying machine-learning models on resource-constrained devices in an efficient and scalable manner.The inference task is distributed across a network of embedded devices, with each device contributing to the overall computation by performing a portion of the workload.In some cases, more powerful devices such as edge or cloud servers can be part of the system to be responsible of the most demanding layers of the network.As the demand for intelligent systems and the complexity of the deployed neural network models increases, this approach is becoming more relevant in a variety of applications such as robotics, autonomous vehicles, smart cities, Industry 4.0 and smart health. 0.834We present a systematic review of papers published during the last six years which describe techniques and methods to distribute Neural Networks across these kind of systems.We provide an overview of the current state-of-the-art by analysing more than 100 papers, present a new taxonomy to characterize them, and discuss trends and challenges in the field.

- + link

@@ -102,20 +102,20 @@

Artificial General Intelligen -

2024-05-03

+

2024-05-06

- Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo + The high dimensional psychological profile and cultural bias of ChatGPT

-

The rapid advancement in artificial intelligence and natural language processing has led to the development of large-scale datasets aimed at benchmarking the performance of machine learning models. 0.841Herein, we introduce 'RetChemQA,' a comprehensive benchmark dataset designed to evaluate the capabilities of such models in the domain of reticular chemistry.This dataset includes both single-hop and multi-hop question-answer pairs, encompassing approximately 45,000 Q&As for each type.The questions have been extracted from an extensive corpus of literature containing about 2,530 research papers from publishers including NAS, ACS, RSC, Elsevier, and Nature Publishing Group, among others.The dataset has been generated using OpenAI's GPT-4 Turbo, a cutting-edge model known for its exceptional language understanding and generation capabilities.In addition to the Q&A dataset, we also release a dataset of synthesis conditions extracted from the corpus of literature used in this study.The aim of RetChemQA is to provide a robust platform for the development and evaluation of advanced machine learning algorithms, particularly for the reticular chemistry community.The dataset is structured to reflect the complexities and nuances of real-world scientific discourse, thereby enabling nuanced performance assessments across a variety of tasks.The dataset is available at the following link: https://github.com/nakulrampal/RetChemQA

+

Given the rapid advancement of large-scale language models, artificial intelligence (AI) models, like ChatGPT, are playing an increasingly prominent role in human society. 0.848However, to ensure that artificial intelligence models benefit human society, we must first fully understand the similarities and differences between the human-like characteristics exhibited by artificial intelligence models and real humans, as well as the cultural stereotypes and biases that artificial intelligence models may exhibit in the process of interacting with humans. 0.832This study first measured ChatGPT in 84 dimensions of psychological characteristics, revealing differences between ChatGPT and human norms in most dimensions as well as in high-dimensional psychological representations.Additionally, through the measurement of ChatGPT in 13 dimensions of cultural values, it was revealed that ChatGPT's cultural value patterns are dissimilar to those of various countries/regions worldwide.Finally, an analysis of ChatGPT's performance in eight decision-making tasks involving interactions with humans from different countries/regions revealed that ChatGPT exhibits clear cultural stereotypes in most decision-making tasks and shows significant cultural bias in third-party punishment and ultimatum games.The findings indicate that, compared to humans, ChatGPT exhibits a distinct psychological profile and cultural value orientation, and it also shows cultural biases and stereotypes in interpersonal decision-making.Future research endeavors should emphasize enhanced technical oversight and augmented transparency in the database and algorithmic training procedures to foster more efficient cross-cultural communication and mitigate social disparities.

- + link

@@ -124,20 +124,20 @@

Artificial General Intelligen -

2024-05-03

+

2024-05-06

- Learning from Evolution: Improving Collective Decision-Making Mechanisms using Insights from Evolutionary Robotics + Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

-

Collective decision-making enables multi-robot systems to act autonomously in real-world environments.Existing collective decision-making mechanisms suffer from the so-called speed versus accuracy trade-off or rely on high complexity, e.g., by including global communication.Recent work has shown that more efficient collective decision-making mechanisms based on artificial neural networks can be generated using methods from evolutionary computation. 0.836A major drawback of these decision-making neural networks is their limited interpretability.Analyzing evolved decision-making mechanisms can help us improve the efficiency of hand-coded decision-making mechanisms while maintaining a higher interpretability.In this paper, we analyze evolved collective decision-making mechanisms in detail and hand-code two new decision-making mechanisms based on the insights gained.In benchmark experiments, we show that the newly implemented collective decision-making mechanisms are more efficient than the state-of-the-art collective decision-making mechanisms voter model and majority rule.

+

General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems. 0.878Recently, the emergence of the Sora model has attained significant attention due to its remarkable simulation capabilities, which exhibits an incipient comprehension of physical laws.In this survey, we embark on a comprehensive exploration of the latest advancements in world models.Our analysis navigates through the forefront of generative methodologies in video generation, where world models stand as pivotal constructs facilitating the synthesis of highly realistic visual content.Additionally, we scrutinize the burgeoning field of autonomous-driving world models, meticulously delineating their indispensable role in reshaping transportation and urban mobility.Furthermore, we delve into the intricacies inherent in world models deployed within autonomous agents, shedding light on their profound significance in enabling intelligent interactions within dynamic environmental contexts. 0.832At last, we examine challenges and limitations of world models, and discuss their potential future directions.We hope this survey can serve as a foundational reference for the research community and inspire continued innovation.This survey will be regularly updated at: https://github.com/GigaAI-research/General-World-Models-Survey.

- + link

@@ -146,20 +146,20 @@

Artificial General Intelligen -

2024-05-03

+

2024-05-06

- Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models + Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

-

In the field of robotics and computer vision, efficient and accurate semantic mapping remains a significant challenge due to the growing demand for intelligent machines that can comprehend and interact with complex environments. 0.832Conventional panoptic mapping methods, however, are limited by predefined semantic classes, thus making them ineffective for handling novel or unforeseen objects.In response to this limitation, we introduce the Unified Promptable Panoptic Mapping (UPPM) method.UPPM utilizes recent advances in foundation models to enable real-time, on-demand label generation using natural language prompts.By incorporating a dynamic labeling strategy into traditional panoptic mapping techniques, UPPM provides significant improvements in adaptability and versatility while maintaining high performance levels in map reconstruction.We demonstrate our approach on real-world and simulated datasets.Results show that UPPM can accurately reconstruct scenes and segment objects while generating rich semantic labels through natural language interactions.A series of ablation experiments validated the advantages of foundation model-based labeling over fixed label sets.

+

Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks.We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs that achieve full accuracy recovery for fine-tuning tasks at up to 70% sparsity.We achieve this for the LLaMA-2 7B model by combining the SparseGPT one-shot pruning method and sparse pretraining of those models on a subset of the SlimPajama dataset mixed with a Python subset of The Stack dataset.We exhibit training acceleration due to sparsity on Cerebras CS-3 chips that closely matches theoretical scaling.In addition, we establish inference acceleration of up to 3x on CPUs by utilizing Neural Magic's DeepSparse engine and 1.7x on GPUs through Neural Magic's nm-vllm engine.The above gains are realized via sparsity alone, thus enabling further gains through additional use of quantization.Specifically, we show a total speedup on CPUs for sparse-quantized LLaMA models of up to 8.6x.We demonstrate these results across diverse, challenging tasks, including chat, instruction following, code generation, arithmetic reasoning, and summarization to prove their generality. 0.823This work paves the way for rapidly creating smaller and faster LLMs without sacrificing accuracy.

- + link

@@ -167,173 +167,21 @@

Artificial General Intelligen

- - -

Collective Intelligence

- - - -

2024-05-03

- - -
- - Learning from Evolution: Improving Collective Decision-Making Mechanisms using Insights from Evolutionary Robotics - -
-
-

-

Collective decision-making enables multi-robot systems to act autonomously in real-world environments. 0.826Existing collective decision-making mechanisms suffer from the so-called speed versus accuracy trade-off or rely on high complexity, e.g., by including global communication.Recent work has shown that more efficient collective decision-making mechanisms based on artificial neural networks can be generated using methods from evolutionary computation.A major drawback of these decision-making neural networks is their limited interpretability.Analyzing evolved decision-making mechanisms can help us improve the efficiency of hand-coded decision-making mechanisms while maintaining a higher interpretability.In this paper, we analyze evolved collective decision-making mechanisms in detail and hand-code two new decision-making mechanisms based on the insights gained.In benchmark experiments, we show that the newly implemented collective decision-making mechanisms are more efficient than the state-of-the-art collective decision-making mechanisms voter model and majority rule.

-

-

- - link - -

-
-
- - - - -

Complex Systems

- - - -

2024-05-03

- - -
- - Multitask Extension of Geometrically Aligned Transfer Encoder - -
-
-

-

Molecular datasets often suffer from a lack of data.It is well-known that gathering data is difficult due to the complexity of experimentation or simulation involved. 0.824Here, we leverage mutual information across different tasks in molecular data to address this issue.We extend an algorithm that utilizes the geometric characteristics of the encoding space, known as the Geometrically Aligned Transfer Encoder (GATE), to a multi-task setup.Thus, we connect multiple molecular tasks by aligning the curved coordinates onto locally flat coordinates, ensuring the flow of information from source tasks to support performance on target data.

-

-

- - link - -

-
-
- - - -

2024-05-03

- - -
- - Geometric Fabrics: a Safe Guiding Medium for Policy Learning - -
-
-

-

Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states. 0.828In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks.Moreover, policies typically issue actions directly to controllers like Operational Space Control (OSC) or joint PD control, which induces straightline motion towards these action targets in task or joint space.However, straightline motion in these spaces for the most part do not capture the rich, nonlinear behavior our robots need to exhibit, shifting the burden of discovering these behaviors more completely to the agent.Unlike these simpler controllers, geometric fabrics capture a much richer and desirable set of behaviors via artificial, second order dynamics grounded in nonlinear geometry.These artificial dynamics shift the uncontrolled dynamics of a robot via an appropriate control law to form behavioral dynamics.Behavioral dynamics unlock a new action space and safe, guiding behavior over which RL policies are trained.Behavioral dynamics enable bang-bang-like RL policy actions that are still safe for real robots, simplify reward engineering, and help sequence real-world, high-performance policies.We describe the framework more generally and create a specific instantiation for the problem of dexterous, in-hand reorientation of a cube by a highly actuated robot hand.

-

-

- - link - -

-
-
- - - - -

Decision Making Under Uncertainty

- - - -

2024-05-03

- - -
- - A comparative study of conformal prediction methods for valid uncertainty quantification in machine learning - -
-
-

-

In the past decades, most work in the area of data analysis and machine learning was focused on optimizing predictive models and getting better results than what was possible with existing models.To what extent the metrics with which such improvements were measured were accurately capturing the intended goal, whether the numerical differences in the resulting values were significant, or whether uncertainty played a role in this study and if it should have been taken into account, was of secondary importance.Whereas probability theory, be it frequentist or Bayesian, used to be the gold standard in science before the advent of the supercomputer, it was quickly replaced in favor of black box models and sheer computing power because of their ability to handle large data sets.This evolution sadly happened at the expense of interpretability and trustworthiness.However, while people are still trying to improve the predictive power of their models, the community is starting to realize that for many applications it is not so much the exact prediction that is of importance, but rather the variability or uncertainty. 0.823The work in this dissertation tries to further the quest for a world where everyone is aware of uncertainty, of how important it is and how to embrace it instead of fearing it.A specific, though general, framework that allows anyone to obtain accurate uncertainty estimates is singled out and analysed.Certain aspects and applications of the framework -- dubbed `conformal prediction' -- are studied in detail.Whereas many approaches to uncertainty quantification make strong assumptions about the data, conformal prediction is, at the time of writing, the only framework that deserves the title `distribution-free'.No parametric assumptions have to be made and the nonparametric results also hold without having to resort to the law of large numbers in the asymptotic regime.

-

-

- - link - -

-
-
- - - -

2024-05-03

- - -
- - Learning from Evolution: Improving Collective Decision-Making Mechanisms using Insights from Evolutionary Robotics - -
-
-

-

Collective decision-making enables multi-robot systems to act autonomously in real-world environments.Existing collective decision-making mechanisms suffer from the so-called speed versus accuracy trade-off or rely on high complexity, e.g., by including global communication. 0.822Recent work has shown that more efficient collective decision-making mechanisms based on artificial neural networks can be generated using methods from evolutionary computation.A major drawback of these decision-making neural networks is their limited interpretability.Analyzing evolved decision-making mechanisms can help us improve the efficiency of hand-coded decision-making mechanisms while maintaining a higher interpretability.In this paper, we analyze evolved collective decision-making mechanisms in detail and hand-code two new decision-making mechanisms based on the insights gained.In benchmark experiments, we show that the newly implemented collective decision-making mechanisms are more efficient than the state-of-the-art collective decision-making mechanisms voter model and majority rule.

-

-

- - link - -

-
-
- - - - -

Neural Ordinary Differential Equations

- - -

2024-05-03

+

2024-05-06

- An analysis and solution of ill-conditioning in physics-informed neural networks + Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs

-

Physics-informed neural networks (PINNs) have recently emerged as a novel and popular approach for solving forward and inverse problems involving partial differential equations (PDEs). 0.835However, achieving stable training and obtaining correct results remain a challenge in many cases, often attributed to the ill-conditioning of PINNs.Nonetheless, further analysis is still lacking, severely limiting the progress and applications of PINNs in complex engineering problems.Drawing inspiration from the ill-conditioning analysis in traditional numerical methods, we establish a connection between the ill-conditioning of PINNs and the ill-conditioning of the Jacobian matrix of the PDE system.Specifically, for any given PDE system, we construct its controlled system.This controlled system allows for adjustment of the condition number of the Jacobian matrix while retaining the same solution as the original system.Our numerical findings suggest that the ill-conditioning observed in PINNs predominantly stems from that of the Jacobian matrix.As the condition number of the Jacobian matrix decreases, the controlled systems exhibit faster convergence rates and higher accuracy.Building upon this understanding and the natural extension of controlled systems, we present a general approach to mitigate the ill-conditioning of PINNs, leading to successful simulations of the three-dimensional flow around the M6 wing at a Reynolds number of 5,000.To the best of our knowledge, this is the first time that PINNs have been successful in simulating such complex systems, offering a promising new technique for addressing industrial complexity problems.Our findings also offer valuable insights guiding the future development of PINNs.

+

Recent advancements in Large Language Models (LLMs) have led to the development of Video Large Multi-modal Models (Video-LMMs) that can handle a wide range of video understanding tasks.These models have the potential to be deployed in real-world applications such as robotics, AI assistants, medical imaging, and autonomous vehicles. 0.825The widespread adoption of Video-LMMs in our daily lives underscores the importance of ensuring and evaluating their robust performance in mirroring human-like reasoning and interaction capabilities in complex, real-world contexts.However, existing benchmarks for Video-LMMs primarily focus on general video comprehension abilities and neglect assessing their reasoning capabilities over complex videos in the real-world context, and robustness of these models through the lens of user prompts as text queries.In this paper, we present the Complex Video Reasoning and Robustness Evaluation Suite (CVRR-ES), a novel benchmark that comprehensively assesses the performance of Video-LMMs across 11 diverse real-world video dimensions.We evaluate 9 recent models, including both open-source and closed-source variants, and find that most of the Video-LMMs, {especially open-source ones,} struggle with robustness and reasoning when dealing with complex videos.Based on our analysis, we develop a training-free Dual-Step Contextual Prompting (DSCP) technique to enhance the performance of existing Video-LMMs.Our findings provide valuable insights for building the next generation of human-centric AI systems with advanced robustness and reasoning capabilities. 0.837Our dataset and code are publicly available at: https://mbzuai-oryx.github.io/CVRR-Evaluation-Suite/.

- - link - -

-
-
- - - -

2024-05-03

- - -
- - Parameter estimation in ODEs: assessing the potential of local and global solvers - -
-
-

-

We consider the problem of parameter estimation in dynamic systems described by ordinary differential equations. 0.837A review of the existing literature emphasizes the need for deterministic global optimization methods due to the nonconvex nature of these problems.Recent works have focused on expanding the capabilities of specialized deterministic global optimization algorithms to handle more complex problems.Despite advancements, current deterministic methods are limited to problems with a maximum of around five state and five decision variables, prompting ongoing efforts to enhance their applicability to practical problems. Our study seeks to assess the effectiveness of state-of-the-art general-purpose global and local solvers in handling realistic-sized problems efficiently, and evaluating their capabilities to cope with the nonconvex nature of the underlying estimation problems.

-

-

- + link

@@ -347,108 +195,20 @@

Reinforcement Learning

-

2024-05-03

- - -
- - Learning Optimal Deterministic Policies with Stochastic Policy Gradients - -
-
-

-

Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. 0.83They learn stochastic parametric (hyper)policies by either exploring in the space of actions or in the space of parameters.Stochastic controllers, however, are often undesirable from a practical perspective because of their lack of robustness, safety, and traceability.In common practice, stochastic (hyper)policies are learned only to deploy their deterministic version.In this paper, we make a step towards the theoretical understanding of this practice.After introducing a novel framework for modeling this scenario, we study the global convergence to the best deterministic policy, under (weak) gradient domination assumptions.Then, we illustrate how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.Finally, we quantitatively compare action-based and parameter-based exploration, giving a formal guise to intuitive results.

-

-

- - link - -

-
-
- - - -

2024-05-03

- - -
- - Multi-Objective Recommendation via Multivariate Policy Learning - -
-
-

-

Real-world recommender systems often need to balance multiple objectives when deciding which recommendations to present to users.These include behavioural signals (e.g. clicks, shares, dwell time), as well as broader objectives (e.g. diversity, fairness).Scalarisation methods are commonly used to handle this balancing task, where a weighted average of per-objective reward signals determines the final score used for ranking.Naturally, how these weights are computed exactly, is key to success for any online platform.We frame this as a decision-making task, where the scalarisation weights are actions taken to maximise an overall North Star reward (e.g. long-term user retention or growth).We extend existing policy learning methods to the continuous multivariate action domain, proposing to maximise a pessimistic lower bound on the North Star reward that the learnt policy will yield. 0.822Typical lower bounds based on normal approximations suffer from insufficient coverage, and we propose an efficient and effective policy-dependent correction for this.We provide guidance to design stochastic data collection policies, as well as highly sensitive reward signals.Empirical observations from simulations, offline and online experiments highlight the efficacy of our deployed approach.

-

-

- - link - -

-
-
- - - -

2024-05-03

- - -
- - Model-based reinforcement learning for protein backbone design - -
-
-

-

Designing protein nanomaterials of predefined shape and characteristics has the potential to dramatically impact the medical industry.Machine learning (ML) has proven successful in protein design, reducing the need for expensive wet lab experiment rounds.However, challenges persist in efficiently exploring the protein fitness landscapes to identify optimal protein designs.In response, we propose the use of AlphaZero to generate protein backbones, meeting shape and structural scoring requirements.We extend an existing Monte Carlo tree search (MCTS) framework by incorporating a novel threshold-based reward and secondary objectives to improve design precision.This innovation considerably outperforms existing approaches, leading to protein backbones that better respect structural scores.The application of AlphaZero is novel in the context of protein backbone design and demonstrates promising performance.AlphaZero consistently surpasses baseline MCTS by more than 100% in top-down protein design tasks.Additionally, our application of AlphaZero with secondary objectives uncovers further promising outcomes, indicating the potential of model-based reinforcement learning (RL) in navigating the intricate and nuanced aspects of protein design 0.825

-

-

- - link - -

-
-
- - - -

2024-05-03

- - -
- - Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach - -
-
-

-

Robust Reinforcement Learning (RRL) is a promising Reinforcement Learning (RL) paradigm aimed at training robust to uncertainty or disturbances models, making them more efficient for real-world applications. 0.873Following this paradigm, uncertainty or disturbances are interpreted as actions of a second adversarial agent, and thus, the problem is reduced to seeking the agents' policies robust to any opponent's actions.This paper is the first to propose considering the RRL problems within the positional differential game theory, which helps us to obtain theoretically justified intuition to develop a centralized Q-learning approach. 0.824Namely, we prove that under Isaacs's condition (sufficiently general for real-world dynamical systems), the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations.Based on these results, we present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.

-

-

- - link - -

-
-
- - - -

2024-05-03

+

2024-05-06

- Simulating the economic impact of rationality through reinforcement learning and agent-based modelling + Artificial Intelligence in the Autonomous Navigation of Endovascular Interventions: A Systematic Review

-

Agent-based models (ABMs) are simulation models used in economics to overcome some of the limitations of traditional frameworks based on general equilibrium assumptions.However, agents within an ABM follow predetermined, not fully rational, behavioural rules which can be cumbersome to design and difficult to justify.Here we leverage multi-agent reinforcement learning (RL) to expand the capabilities of ABMs with the introduction of fully rational agents that learn their policy by interacting with the environment and maximising a reward function. 0.865Specifically, we propose a 'Rational macro ABM' (R-MABM) framework by extending a paradigmatic macro ABM from the economic literature.We show that gradually substituting ABM firms in the model with RL agents, trained to maximise profits, allows for a thorough study of the impact of rationality on the economy.We find that RL agents spontaneously learn three distinct strategies for maximising profits, with the optimal strategy depending on the level of market competition and rationality. 0.833We also find that RL agents with independent policies, and without the ability to communicate with each other, spontaneously learn to segregate into different strategic groups, thus increasing market power and overall profits.Finally, we find that a higher degree of rationality in the economy always improves the macroeconomic environment as measured by total output, depending on the specific rational policy, this can come at the cost of higher instability.Our R-MABM framework is general, it allows for stable multi-agent learning, and represents a principled and robust direction to extend existing economic simulators.

+

Purpose: Autonomous navigation of devices in endovascular interventions can decrease operation times, improve decision-making during surgery, and reduce operator radiation exposure while increasing access to treatment.This systematic review explores recent literature to assess the impact, challenges, and opportunities artificial intelligence (AI) has for the autonomous endovascular intervention navigation. Methods: PubMed and IEEEXplore databases were queried.Eligibility criteria included studies investigating the use of AI in enabling the autonomous navigation of catheters/guidewires in endovascular interventions.Following PRISMA, articles were assessed using QUADAS-2.PROSPERO: CRD42023392259. Results:Among 462 studies, fourteen met inclusion criteria.Reinforcement learning (9/14, 64%) and learning from demonstration (7/14, 50%) were used as data-driven models for autonomous navigation. 0.826Studies predominantly utilised physical phantoms (10/14, 71%) and in silico (4/14, 29%) models.Experiments within or around the blood vessels of the heart were reported by the majority of studies (10/14, 71%), while simple non-anatomical vessel platforms were used in three studies (3/14, 21%), and the porcine liver venous system in one study.We observed that risk of bias and poor generalisability were present across studies.No procedures were performed on patients in any of the studies reviewed.Studies lacked patient selection criteria, reference standards, and reproducibility, resulting in low clinical evidence levels. Conclusions: AI's potential in autonomous endovascular navigation is promising, but in an experimental proof-of-concept stage, with a technology readiness level of 3.We highlight that reference standards with well-identified performance metrics are crucial to allow for comparisons of data-driven algorithms proposed in the years to come.

- + link

@@ -457,20 +217,20 @@

Reinforcement Learning

-

2024-05-03

+

2024-05-06

- Towards Improving Learning from Demonstration Algorithms via MCMC Methods + Enhancing Q-Learning with Large Language Model Heuristics

-

Behavioral cloning, or more broadly, learning from demonstrations (LfD) is a priomising direction for robot policy learning in complex scenarios. 0.823Albeit being straightforward to implement and data-efficient, behavioral cloning has its own drawbacks, limiting its efficacy in real robot setups.In this work, we take one step towards improving learning from demonstration algorithms by leveraging implicit energy-based policy models.Results suggest that in selected complex robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used neural network-based explicit models, especially in the cases of approximating potentially discontinuous and multimodal functions.

+

Q-learning excels in learning from feedback within sequential decision-making tasks but requires extensive sampling for significant improvements.Although reward shaping is a powerful technique for enhancing learning efficiency, it can introduce biases that affect agent performance. 0.832Furthermore, potential-based reward shaping is constrained as it does not allow for reward modifications based on actions or terminal states, potentially limiting its effectiveness in complex environments.Additionally, large language models (LLMs) can achieve zero-shot learning, but this is generally limited to simpler tasks.They also exhibit low inference speeds and occasionally produce hallucinations.To address these issues, we propose \textbf{LLM-guided Q-learning} that employs LLMs as heuristic to aid in learning the Q-function for reinforcement learning. 0.827It combines the advantages of both technologies without introducing performance bias.Our theoretical analysis demonstrates that the LLM heuristic provides action-level guidance.Additionally, our architecture has the capability to convert the impact of hallucinations into exploration costs.Moreover, the converged Q function corresponds to the MDP optimal Q function.Experiment results demonstrated that our algorithm enables agents to avoid ineffective exploration, enhances sampling efficiency, and is well-suited for complex control tasks.

- + link

@@ -479,20 +239,20 @@

Reinforcement Learning

-

2024-05-03

+

2024-05-06

- Geometric Fabrics: a Safe Guiding Medium for Policy Learning + Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning

-

Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states.In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks. 0.866Moreover, policies typically issue actions directly to controllers like Operational Space Control (OSC) or joint PD control, which induces straightline motion towards these action targets in task or joint space.However, straightline motion in these spaces for the most part do not capture the rich, nonlinear behavior our robots need to exhibit, shifting the burden of discovering these behaviors more completely to the agent.Unlike these simpler controllers, geometric fabrics capture a much richer and desirable set of behaviors via artificial, second order dynamics grounded in nonlinear geometry.These artificial dynamics shift the uncontrolled dynamics of a robot via an appropriate control law to form behavioral dynamics.Behavioral dynamics unlock a new action space and safe, guiding behavior over which RL policies are trained. 0.82Behavioral dynamics enable bang-bang-like RL policy actions that are still safe for real robots, simplify reward engineering, and help sequence real-world, high-performance policies.We describe the framework more generally and create a specific instantiation for the problem of dexterous, in-hand reorientation of a cube by a highly actuated robot hand.

+

Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction, but often requires an infeasible amount of interaction data to solve complex tasks from sparse rewards. 0.873One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data that is difficult to obtain, especially for domains such as robotics.Our approach consists of a reverse curriculum followed by a forward curriculum.Unique to our approach compared to past work is the ability to efficiently leverage more than one demonstration via a per-demonstration reverse curriculum generated via state resets.The result of our reverse curriculum is an initial policy that performs well on a narrow initial state distribution and helps overcome difficult exploration problems.A forward curriculum is then used to accelerate the training of the initial policy to perform well on the full initial state distribution of the task and improve demonstration and sample efficiency.We show how the combination of a reverse curriculum and forward curriculum in our method, RFCL, enables significant improvements in demonstration and sample efficiency compared against various state-of-the-art learning-from-demonstration baselines, even solving previously unsolvable tasks that require high precision and control.

- + link

@@ -502,24 +262,24 @@

Reinforcement Learning

-

Trajectory Optimization

+

Active Inference

-

2024-05-03

+

2024-05-06

- Characterized Diffusion and Spatial-Temporal Interaction Network for Trajectory Prediction in Autonomous Driving + Embedded Distributed Inference of Deep Neural Networks: A Systematic Review

-

Trajectory prediction is a cornerstone in autonomous driving (AD), playing a critical role in enabling vehicles to navigate safely and efficiently in dynamic environments. 0.856To address this task, this paper presents a novel trajectory prediction model tailored for accuracy in the face of heterogeneous and uncertain traffic scenarios. 0.822At the heart of this model lies the Characterized Diffusion Module, an innovative module designed to simulate traffic scenarios with inherent uncertainty.This module enriches the predictive process by infusing it with detailed semantic information, thereby enhancing trajectory prediction accuracy.Complementing this, our Spatio-Temporal (ST) Interaction Module captures the nuanced effects of traffic scenarios on vehicle dynamics across both spatial and temporal dimensions with remarkable effectiveness.Demonstrated through exhaustive evaluations, our model sets a new standard in trajectory prediction, achieving state-of-the-art (SOTA) results on the Next Generation Simulation (NGSIM), Highway Drone (HighD), and Macao Connected Autonomous Driving (MoCAD) datasets across both short and extended temporal spans.This performance underscores the model's unparalleled adaptability and efficacy in navigating complex traffic scenarios, including highways, urban streets, and intersections.

+

Embedded distributed inference of Neural Networks has emerged as a promising approach for deploying machine-learning models on resource-constrained devices in an efficient and scalable manner.The inference task is distributed across a network of embedded devices, with each device contributing to the overall computation by performing a portion of the workload. 0.823In some cases, more powerful devices such as edge or cloud servers can be part of the system to be responsible of the most demanding layers of the network.As the demand for intelligent systems and the complexity of the deployed neural network models increases, this approach is becoming more relevant in a variety of applications such as robotics, autonomous vehicles, smart cities, Industry 4.0 and smart health.We present a systematic review of papers published during the last six years which describe techniques and methods to distribute Neural Networks across these kind of systems.We provide an overview of the current state-of-the-art by analysing more than 100 papers, present a new taxonomy to characterize them, and discuss trends and challenges in the field.

- + link