hero

AV Portfolio Jobs

Join some of teams that are shaping the digital future

Red Team harmful Manipulation Evaluation AI Trainer, $100-$120/hour

LinkedIn

LinkedIn

Software Engineering, Data Science
United States
USD 100-120 / hour
Posted on Mar 25, 2026

Project Overview:

Join a growing community of professionals advancing the next wave of AI. As an AI Trainer, you’ll play a hands-on role by analyzing and providing feedback on data to improve LLM performance, helping ensure that the next generation of AI technology is accurate and trustworthy.

We are seeking a skilled Behavioral Science, Trust & Safety, or Human-Computer Interaction expert to work as a project consultant in our AI Labor Marketplace. This is not a full-time employment position — you will be engaged as an expert project consultant on a contract basis.

Location: U.S.-based experts only

Engagement: Part-time, project-based expert evaluation work

Work Type: Remote

Project Summary:

Contributors will design adversarial prompts targeting harmful manipulation scenarios, evaluate model responses, and apply structured annotations to assess risk. The work combines behavioral insight, analytical judgment, and structured evaluation, along with peer review responsibilities to support quality and consistency.

Consultant Engagement Terms:

This is a project-based consultant role. Consultants will be paid on a per-project basis; hourly rates are estimates based on anticipated completion time. Consultants control their own schedule, provide their own tools, and may simultaneously provide services to other vendors/employers (subject to those vendors’ allowances).

Responsibilities:

  • Design realistic adversarial prompts reflecting manipulation and influence risks
  • Execute prompts against AI systems and capture outputs
  • Apply structured annotation rubrics to evaluate model behavior
  • Provide clear written justifications for evaluations
  • Review peer submissions for quality and consistency
  • Identify edge cases and nuanced failure modes
  • Incorporate feedback and maintain calibration over time

Expected Outcomes:

  • High-quality adversarial prompt sets
  • Consistent, well-reasoned annotations aligned with rubric standards
  • Constructive peer review feedback
  • Reliable contribution to overall dataset quality and evaluation goals



Qualifications:

  • Background in behavioral science, social psychology, trust & safety, HCI, disinformation research, or related field
  • 3–10+ years of relevant professional or research experience
  • Strong analytical writing and decision-making under ambiguity
  • Experience with AI evaluation, red teaming, or content policy preferred
  • Ability to apply structured guidelines consistently across tasks