#training

References tagged "training"

Constitutional AI

Anthropic's approach to AI alignment using a constitution of explicit principles to guide model behavior — training the AI to follow principles rather than just maximize human approval.

RLHF

Reinforcement Learning from Human Feedback — training AI to produce outputs that humans prefer, with known failure modes around sycophancy and approval-seeking.

← All References