#training
References tagged "training"
Constitutional AI
Anthropic's approach to AI alignment using a constitution of explicit principles to guide model behavior — training the AI to follow principles rather than just maximize human approval.
RLHF
Reinforcement Learning from Human Feedback — training AI to produce outputs that humans prefer, with known failure modes around sycophancy and approval-seeking.