Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment
Ensuring alignment, which refers to making models behave in accordance withhuman intentions [1,2], has become a critical task before deploying largelanguage models (LLMs) in real-world applications. For instance, OpenAI devotedsix months to iteratively aligning GPT-4 before its release [3]. Howev…