Job Description

Join to apply for the Model Behavior Architect, Alignment Finetuning role at Anthropic

About Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About The Role

As a Model Behavior Architect at Anthropic, you'll be at the forefront of shaping AI system behavior to ensure it aligns with human values. Working within the Alignment Finetuning team, you'll combine your expertise in model evaluation, prompt engineering, and ethical judgment to help create AI systems that respond with good judgment across diverse scenarios.

Responsibilities

Design and implement subtle prompting strategies and data generation pipelines that improve model responses
Identify and fix edge case behaviors through rigorous testing of your data generation pipelines
Interact with models to carefully identify where model behavior and judgment can be improved
Gather internal and external feedback on model behavior to document areas for improvement
Develop evaluations of language model behaviors across judgment-based domains like honesty, character, and ethics
Work collaboratively with researchers on related teams like Trust and Safety, Alignment Science, and Applied Finetuning

You May Be a Good Fit If You

Have extensive experience with prompt engineering and chaining for language models
Demonstrate strong skills in evaluating AI system outputs on subtle or fuzzy tasks
Have a background in philosophy, psychology, data science, or related fields
Care about AI safety and the ethical implications of both current and future AI behaviors
Are comfortable using basic Python and running basic scripts
Have a keen eye for identifying subtle issues in AI outputs
Understand how LLMs are trained and are familiar with concepts in reinforcement learning
Have experience finetuning large language models
Are happy to engage in test-driven development and to carefully analyze data and data pipelines

Strong Candidates May Also Have

Formal training in ethics or moral philosophy or moral psychology
Experience in data science with emphasis on data verification
Conceptual understanding of language model training and finetuning techniques
Previous experience developing evaluation frameworks for large language models
Background in AI safety research or similar fields
Experience with RLHF, constitutional AI, or other alignment techniques
Published work related to AI ethics or safety
Knowledge of model behavior benchmarking

Additional Information

Join us in our mission to ensure advanced AI systems behave reliably and ethically while staying aligned with human values.

Salary Range: $280,000 - $425,000 USD

Logistics

Education requirements: Bachelor’s degree in a related field or equivalent experience.

Location policy: Hybrid, with at least 25% in-office presence. Some roles may require more.

Visa sponsorship: Available, with efforts made to assist in visa acquisition upon offer.

We encourage applicants even if they do not meet every qualification. Diversity and inclusion are valued, and we believe diverse perspectives enhance our work.

Why Join Us?

We believe impactful AI research is collaborative and large-scale. We value impact and empirical science, akin to physics or biology, and foster open communication and impactful work.

Our recent research includes GPT-3, interpretability, multimodal neurons, scaling laws, and AI safety.

Join Us!

Anthropic offers competitive compensation, benefits, equity options, generous leave, flexible hours, and a collaborative office environment.

#J-18808-Ljbffr

Job Tags

Full time, Immediate start, Visa sponsorship, Flexible hours,

Similar Jobs

University of New Mexico - Hospitals

QUALITY SAFETY SPECIALIST - PERINATAL Job at University of New Mexico - Hospitals

Job Description Relocation Assistance Available **This position will work within the Labor and Delivery Department and align closely with the Process Improvement Team within Quality Outcomes.** Minimum Offer $ 36.38/hr. Maximum Offer $ 51.15/hr. Compensation...

Cook Ford

Quick Lube Technician Job at Cook Ford

...timePay: $18.00 - $21.00 per hourBenefits:* 401(k)* 401(k) matching* Dental insurance* Employee discount* Health insurance* Life insurance* Opportunities for advancement* Paid time off* Paid training* Tuition reimbursement* Vision insurance

Papa John's

Delivery Driver - DRIVE YOUR OWN CAR-Race Track Job at Papa John's

...Health Benefits Promotion opportunities Locally owned Papa John's Pizza locations seeking Drivers who like to take home cash daily working for a... ...available. Please have the following: Reliable vehicle Decent driving record (Criminal Background and MVR...

Hall Associates, Inc.

Call Center Representative - Work From Home Job at Hall Associates, Inc.

...while navigating and entering account information). Work from home only. Virtual Paid Training Flexible Work Schedules Work... ...required Promotional Options Available Job Types: Full-time, Part-time Pay: $16-21 Per hour Benefits ~401(k) ~...

CenterWell and Conviva Senior Primary Care

Internal Medicine Physician Job at CenterWell and Conviva Senior Primary Care

...a timely manner working with a quality- based coder to optimize coding specificity. Follows policy and protocol defined by Clinical... ...required supporting documentation Medical, religious, state and remote-only work exemptions are available. Scheduled Weekly Hours...

Model Behavior Architect, Alignment Finetuning (Hiring Immediately) Job at Anthropic, San Francisco, CA

ZW1jc0RKWnUwL21QOW5STW1od0R6K3JuWFE9PQ==