AI Benchmark Engineer – Research & Knowledge
Turing · Ghana
Job description
About the role
Turing is seeking a highly analytical AI Benchmark Engineer to design and evaluate multi‑agent benchmark tasks for frontier AI research. This remote contractor role blends deep research expertise with practical engineering to create rigorous evaluation datasets and tools.
Key responsibilities
- Build multi‑agent benchmark tasks that require reading, analyzing, and synthesizing large document collections.
- Curate real‑world research corpora—including academic papers, case studies, and technical reports—and design comprehensive analysis questions.
- Write structured ground‑truth oracles in JSON with verifiable answers to ensure agents truly read source material.
- Design LLM judge prompts that evaluate agent outputs field‑by‑field against the oracle.
- Create decomposition guides that split research across parallel sub‑agents for document‑level and domain‑level processing, followed by synthesis.
Required profile
- 5+ years of research experience (academic or industry) in any scientific domain.
- Strong reading comprehension with ability to extract structured data from unstructured text.
- High attention to detail for precise evaluation oracle creation.
Required skills
- Proficiency in Python scripting for data processing and evaluation.
- Strong experience with JSON schema design and output validation.
- Hands‑on experience writing Dockerfiles, building images, and debugging containers.
What we offer
- Fully remote work environment.
- Opportunity to contribute to cutting‑edge AI projects with leading LLM companies.
- Potential contract extension based on performance and project needs.
Questions fréquentes
Why are you reporting this job?
Apply in 30 seconds
Enter your email to apply. An account will be created automatically.
By continuing, you accept our terms of use.
Already have an account? Login
Published 2 hours ago
Expires 1 month from now
7 views · 0 applications
Boost your chances
Upload your CV — we will match you with relevant openings.
Analyzing your CV...
Turing
Ghana
Related job offers
-
Remote Machine Learning Engineer (Benchmark Evaluation)
Turing Ghana -
Senior Project Manager – Remote Software Development
Tech9 Colombie -
Senior Software Engineer – Decision Management Platform
mastercard Missouri -
Software Engineer II – Cloud‑Native Microservices
mastercard -
IT Service Desk Agent
uasys Little Rock