📢 검색 기능 추가 예정

23.08.13 (Sun)

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Reinforcement learning from human feedback (RLHF) is a technique for trainingAI systems to align with human goals. RLHF has emerged as the central methodused to finetune state-of-the-art large language models (LLMs). Despite thispopularity, there has been relatively little public work systematizi…
Med-Flamingo: a Multimodal Medical Few-shot Learner
Medicine, by its nature, is a multifaceted domain that requires the synthesisof information across various modalities. Medical generative vision-languagemodels (VLMs) make a first step in this direction and promise many excitingclinical applications. However, existing models typically have to be…
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Despite the advancements of open-source large language models (LLMs) andtheir variants, e.g., LLaMA and Vicuna, they remain significantly limited inperforming higher-level tasks, such as following human instructions to useexternal tools (APIs). This is because current instruction tuning largelyf…
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
We introduce OpenFlamingo, a family of autoregressive vision-language modelsranging from 3B to 9B parameters. OpenFlamingo is an ongoing effort to producean open-source replication of DeepMind’s Flamingo models. On sevenvision-language datasets, OpenFlamingo models average between 80 - 89% ofcor…
The Hydra Effect: Emergent Self-repair in Language Model Computations
We investigate the internal structure of language model computations usingcausal analysis and demonstrate two motifs: (1) a form of adaptive computationwhere ablations of one attention layer of a language model cause another layerto compensate (which we term the Hydra effect) and (2) a counterbal…
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
The recent progress in large language models (LLMs), especially the inventionof chain-of-thoughts (CoT) prompting, makes it possible to solve reasoningproblems. However, even the strongest LLMs are still struggling with morecomplicated problems that require non-linear thinking and multi-step reas…
Learning to Model the World with Language
To interact with humans in the world, agents need to understand the diversetypes of language that people use, relate them to the visual world, and actbased on them. While current agents learn to execute simple languageinstructions from task rewards, we aim to build agents that leverage diversela…

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to zoomg.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.