Bilingual Japanese AI Evaluation Specialist.
About the role
Bilingual Japanese AI Evaluation Specialist.
Bilingual Japanese AI Evaluation Specialist is a remote evaluation track for reviewing japanese generalist evaluation prompts and responses against AuraOne's quality rubric.
Apply now Browse open roles
REL 24.09 signed REGRESSION caught
cached OVERRIDE 19 of 142 INCIDENT gate halted TRACK
Evaluation & annotation
Aligned to the AuraOne specialist routing.
TYPE
Contractor
Remote-first specialist work, paid per accepted task.
LOCATION
Remote
Independent specialist contractor
Remote — US-eligible
About The Role
Bilingual Japanese AI Evaluation Specialist is a remote evaluation track for reviewing japanese generalist evaluation prompts and responses against AuraOne's quality rubric. Reviewers compare paired outputs, label edge cases, and write the kind of structured feedback the modeling team can use to retrain.
AI data reviewers help turn japanese generalist evaluation outputs into auditable labels, rationales, and regression cases for AuraOne Human Data.
Review model outputs, label edge cases, and improve training quality across high-volume AI workflows.
Responsibilities
↳Evaluate japanese generalist evaluation model outputs against a versioned rubric and assign severity tags for Bilingual Japanese AI Evaluation Specialist assignments. ↳Compare paired responses and pick the stronger answer with a written rationale. ↳Label hallucinations, instruction-following failures, and unsafe content with structured tags. ↳Capture ambiguous prompts and route them back to the program team for rubric updates. ↳Maintain reviewer-quality scores by calibrating against gold-standard examples each week. ↳Document recurring failure modes so the modeling team can target them in the next training run.
Requirements
↳Prior evaluation, annotation, or human-rater experience on japanese generalist evaluation or adjacent content for Bilingual Japanese AI Evaluation Specialist work. ↳Comfort applying multi-page rubrics consistently across long batches. ↳Clear written reasoning that names the issue and the rubric clause being applied. ↳Strong attention to detail and the ability to flag when a prompt itself is the problem. ↳Reliable async availability for at least 10 hours per week.
EXAMPLE TASKS
↳Compare two japanese generalist evaluation model responses to the same prompt and pick the stronger one with rationale. ↳Tag an unsafe response with the correct policy category and severity. ↳Audit a 50-row batch for rubric consistency and report drift to the program lead. ↳Propose a rubric clarification after spotting a recurring failure mode.
NICE TO HAVE
↳Background in linguistics, content moderation, or trust & safety review. ↳Experience with inter-rater agreement metrics and calibration cycles. ↳Domain expertise that lets you spot subject-matter errors automated checks miss.
Compensation
$49–$98 / hr
Expected schedule: contractor, remote specialist work with program-defined task volume and review pacing.
Skills Used In Matching
Model output evaluation Rubric-based annotation Severity tagging Inter-rater calibration Japanese generalist evaluation
How To Apply
AuraOne uses a shared specialist intake to confirm track fit, review readiness, and the best queue for your profile. Applications submitted from partner job boards carry the source, role, and category on the apply URL.
Apply now Browse other roles
EXAMPLE TASKS
↳Compare two japanese generalist evaluation model responses to the same prompt and pick the stronger one with rationale. ↳Tag an unsafe response with the correct policy category and severity. ↳Audit a 50-row batch for rubric consistency and report drift to the program lead. ↳Propose a rubric clarification after spotting a recurring failure mode.
NICE TO HAVE
↳Background in linguistics, content moderation, or trust & safety review. ↳Experience with inter-rater agreement metrics and calibration cycles. ↳Domain expertise that lets you spot subject-matter errors automated checks miss.