Academic Researcher – Remote (30+ hrs/week)
Crossing Hurdles · États-Unis
Job description
About the role
We are launching a frontier model evaluation initiative to advance next‑generation large language model systems across technical and professional domains. The role involves designing and validating real‑world benchmark tasks, building executable Python problem sets, and evaluating model outputs.
Key responsibilities
- Design and validate challenging, real‑world benchmark tasks rooted in your academic or professional expertise.
- Build executable Python‑based problem sets with clear specifications and verifiable test cases for agentic workflows.
- Evaluate model outputs to identify reasoning, logic, and problem‑solving failures in complex scenarios.
- Develop structured gold‑standard solutions and rubrics to support consistent evaluation frameworks.
- Analyze system behavior to surface capability gaps and failure modes in advanced reasoning tasks.
- Collaborate with domain experts across STEM and quantitative disciplines to improve evaluation quality and rigor.
Required profile
- Current or retired professor, or PhD candidate, in STEM or professional disciplines such as computer science, mathematics, physics, engineering, statistics, economics, finance, law, or related fields.
- Strong academic background from a top‑tier university or equivalent research environment.
- Working proficiency in Python applied in research, coursework, or professional settings.
- Ability to design and implement executable problem‑solving tasks and computational workflows.
- Strong analytical thinking with ability to assess logical correctness and system behavior.
- Ability to work independently for at least 30 hours per week on weekdays.
Required skills
- Python
Questions fréquentes
Why are you reporting this job?
Apply in 30 seconds
Enter your email to apply. An account will be created automatically.
By continuing, you accept our terms of use.
Already have an account? Login
Published 4 hours ago
Expires 1 month from now
6 views · 0 applications
Boost your chances
Upload your CV — we will match you with relevant openings.
Analyzing your CV...
Crossing Hurdles
États-Unis
Related job offers
-
Director, AP French Language and Culture
The College Board États-Unis -
Adjunct Faculty – Doctor of Public Health (DrPH) Program
Walden University États-Unis -
Senior Manager, Customer and Product Training
GrowthZone États-Unis -
Part-Time Chinese Language Teacher (Contractor)
Department of Operational Support New York -
Part-Time Spanish Language Teacher – UN Candidate Pool
Department of Operational Support New York