Applied AI Safety and Steering

Foundations

Guardrails and Defense

Preventing harmful inputs and outputs

Prompt Injection

Evaluation and Robustness

Measuring safety

Benchmarks

Red Teaming

Steering and Control

Fine-tuning

Activation

Agents

[Optional] Practical intro to steering vectors. (alignmentforum.org)

Deployment and Monitoring

Observability

Governance

Interpretability

Privacy-Preserving ML

Multi-Modal

Case Studies & Post-Mortems