Skip to content
View Xiao9905's full-sized avatar
  • Tsinghua University
  • Beijing, China

Organizations

@THUDM

Block or report Xiao9905

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Xiao9905/README.md

Hi, welcome to my Github 👋

I am Xiao Liu, a fourth-year PhD student in Tsinghua University since 2021.

  • 🔭 Interested in Machine Learning, Natural Language Processing, and Foundation Models.

  • 🌱 Find my up-to-date publication list in Google Scholar! Some of my proud works as lead authors:

    Large Language Model (LLM) Training and Prompt Learning
    Foundational Agents For Real-world Challenging Missions
    • AgentBench (ICLR'24): the first systematic multi-dimensional benchmark to evaluate LLMs as Agents in 8 distinct environments deriving from real-world practical missions.
    • AutoWebGLM (KDD'24): a strong web navigating agent constructed upon ChatGLM-3-6B, outperforming prompted GPT-4 on Mind2Web, WebArena, and our constructed new dataset AutoWebBench.
    • VisualAgentBench: a comprehensive framework to train and test Large Multimodal Models (LMMs) to serve as visual foundation agents.
    • WebRL: self-evolving online curriculum RL transform open LLMs to outperform GPT-4-Turbo on Web Agent tasks by 160%.
    • AndroidLab: training and systematic benchmarking android autonomous agents.
    • AutoGLM: autonomous foundation agents for GUIs, the first Phone Use and Web Browser Use agent family.
    Alignment and Scalable Oversights over LLMs and Diffusers
    • ImageReward (NeurIPS'23): the first general-purpose text-to-image human preference reward model (RM) for RLHF, outperforming CLIP/BLIP/Aesthetic by 30% in terms of human preference prediction.
    • BPO (Black-box Prompt Optimization, ACL'24): a novel direction to align LLMs via preference-aware prompt optimization. Improving ChatGPT, Claude, LLaMA on human preference's win rates by 20%+ without training them.
    • AlignBench (ACL'24): the first comprehensive benchmark on evaluating LLMs' Chinese alignment, deriving from ChatGLM's online real scenarios. Adopted by top Chinese LLMs (ChatGLM, Qwen, DeepSeek, Yi, Baichuan, Abab, and etc.)
    Self-supervised Learning and Reasoning
  • 🤔 Dedicated to building next-generation of AI systems via both Large Pre-trained Model and Symbolic Agent Reasoning.

  • 💬 Feel free to drop me an email for:

    • Any form of collaboration
    • Any issue about my works or code
    • Interesting ideas to discuss or just chatting

Pinned Loading

  1. THUDM/GLM-130B THUDM/GLM-130B Public

    GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

    Python 7.7k 608

  2. THUDM/ChatGLM-6B THUDM/ChatGLM-6B Public

    ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

    Python 40.7k 5.2k

  3. THUDM/P-tuning THUDM/P-tuning Public

    A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

    Python 924 111

  4. THUDM/P-tuning-v2 THUDM/P-tuning-v2 Public

    An optimized deep prompt tuning strategy comparable to fine-tuning across scales and tasks

    Python 2k 202

  5. THUDM/AgentBench THUDM/AgentBench Public

    A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

    Python 2.2k 162

  6. THUDM/ImageReward THUDM/ImageReward Public

    [NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

    Python 1.2k 65