📢 검색 기능 추가 예정

Aug 13, 2023

23.08.13 (Sun)

zoomg

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) is a technique for trainingAI systems to align with human goals. RLHF has emerged as the central methodused to finetune state-of-the-art large language models (LLMs). Despite thispopularity, there has been relatively little public work systematizi…

arXiv.orgStephen Casper

Med-Flamingo: a Multimodal Medical Few-shot Learner

Medicine, by its nature, is a multifaceted domain that requires the synthesisof information across various modalities. Medical generative vision-languagemodels (VLMs) make a first step in this direction and promise many excitingclinical applications. However, existing models typically have to be…

arXiv.orgMichael Moor

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Despite the advancements of open-source large language models (LLMs) andtheir variants, e.g., LLaMA and Vicuna, they remain significantly limited inperforming higher-level tasks, such as following human instructions to useexternal tools (APIs). This is because current instruction tuning largelyf…

arXiv.orgYujia Qin

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

We introduce OpenFlamingo, a family of autoregressive vision-language modelsranging from 3B to 9B parameters. OpenFlamingo is an ongoing effort to producean open-source replication of DeepMind’s Flamingo models. On sevenvision-language datasets, OpenFlamingo models average between 80 - 89% ofcor…

arXiv.orgAnas Awadalla

The Hydra Effect: Emergent Self-repair in Language Model Computations

We investigate the internal structure of language model computations usingcausal analysis and demonstrate two motifs: (1) a form of adaptive computationwhere ablations of one attention layer of a language model cause another layerto compensate (which we term the Hydra effect) and (2) a counterbal…

arXiv.orgThomas McGrath

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

The recent progress in large language models (LLMs), especially the inventionof chain-of-thoughts (CoT) prompting, makes it possible to solve reasoningproblems. However, even the strongest LLMs are still struggling with morecomplicated problems that require non-linear thinking and multi-step reas…

arXiv.orgNing Miao

Learning to Model the World with Language

To interact with humans in the world, agents need to understand the diversetypes of language that people use, relate them to the visual world, and actbased on them. While current agents learn to execute simple languageinstructions from task rewards, we aim to build agents that leverage diversela…

arXiv.orgJessy Lin

LLM RLHF Medical Tool Self-repair SelfCheck Prompting Reasoning

Read next