AI/ML Evaluation Engineer
hace 3 días
At Truelogic we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we've been delivering top-tier technology solutions to companies of all sizes, from innovative startups to industry leaders, helping them achieve their digital transformation goals.
Our team of 600+ highly skilled tech professionals, based in Latin America, drives digital disruption by partnering with U.S. companies on their most impactful projects. Whether collaborating with Fortune 500 giants or scaling startups, we deliver results that make a difference.
By applying for this position, you're taking the first step in joining a dynamic team that values your expertise and aspirations. We aim to align your skills with opportunities that foster exceptional career growth and success while contributing to transformative projects that shape the future.
Our ClientA global technology organization with a balanced engineering–creative model, focused on solving complex challenges across emerging technologies, AI, and modern consumer behavior. With multidisciplinary teams and a worldwide footprint, the company delivers secure, high-performance, and accessible digital experiences at scale.
Job Summary
We're looking for an AI/ML Evaluation Engineer to drive the accuracy, reliability, and performance of next-generation AI systems. You'll build evaluation pipelines, metrics, datasets, and automation that ensure model outputs are consistent, safe, and aligned with real-world expectations. This role is fully technical and highly collaborative, working closely with AI engineers, QA, data scientists, and product leaders.
ResponsibilitiesWrite Python and SQL scripts to evaluate outputs from large language models (LLMs).
Design and implement LLM-as-Judge evaluations with clear scoring rubrics (faithfulness, relevance, completeness, correctness).
Define and calculate metrics such as exact match, token-level F1, ROUGE, cosine similarity, and subjective rubric scores.
Build and maintain ground-truth datasets for benchmarking and regression testing.
Automate evaluation workflows and integrate them into CI/CD pipelines.
Analyze large unstructured datasets to identify inconsistencies, anomalies, biases, and missing values.
Diagnose failure modes such as hallucinations, irrelevant answers, and formatting issues.
Produce clear reports summarizing evaluation findings and quality trends.
Collaborate with AI engineers, QA, data scientists, and product managers to define quality standards and release criteria.
Document all processes, evaluation setups, specifications, and architecture diagrams.
Maintain reproducibility and traceability for all evaluation runs and datasets.
Advanced Python skills, including writing, debugging, and automating scripts.
Strong SQL proficiency and experience manipulating large datasets.
Hands-on experience with Python libraries such as Pandas and NumPy.
Ability to clean, standardize, and analyze structured and unstructured data.
Experience inspecting datasets, visualizing distributions, and preparing data for analysis.
Solid understanding of large language models, prompt behavior, hallucinations, and grounding concepts.
Knowledge of retrieval-augmented generation (RAG) flows and embedding-based search.
Awareness of vector similarity concepts such as cosine similarity and dot product.
Experience with at least one LLM evaluation framework (RAGAS, TruLens, LangSmith, etc.) or ability to quickly learn one.
Ability to design or implement custom LLM-as-Judge evaluation systems.
Applied understanding of statistical concepts such as variance, confidence intervals, precision/recall, and correlation.
Ability to translate ambiguous quality expectations into measurable metrics.
Familiarity with cloud-run services and automation pipelines, preferably on Google Cloud Platform (GCP).
Ability to learn new infrastructure tools quickly.
Strong analytical and problem-solving abilities for open-ended technical challenges.
Excellent communication skills for collaborating with cross-functional teams and presenting technical findings.
100% Remote Work: Enjoy the freedom to work from the location that helps you thrive. All it takes is a laptop and a reliable internet connection.
Highly Competitive USD Pay: Earn an excellent, market-leading compensation in USD, that goes beyond typical market offerings.
Paid Time Off: We value your well-being. Our paid time off policies ensure you have the chance to unwind and recharge when needed.
Work with Autonomy: Enjoy the freedom to manage your time as long as the work gets done. Focus on results, not the clock.
Work with Top American Companies: Grow your expertise working on innovative, high-impact projects with Industry-Leading U.S. Companies.
A Culture That Values You: We prioritize well-being and work-life balance, offering engagement activities and fostering dynamic teams to ensure you thrive both personally and professionally.
Diverse, Global Network: Connect with over 600 professionals in 25+ countries, expand your network, and collaborate with a multicultural team from Latin America.
Team Up with Skilled Professionals: Join forces with senior talent. All of our team members are seasoned experts, ensuring you're working with the best in your field.
Apply now
-
Senior AI/ML Engineer
hace 3 días
Ciudad de México, Ciudad de México Oracle A tiempo completoDescriptionSenior AI Developer Location: Mexico (REMOTE)About the Role:Are you early in your career and passionate about artificial intelligence, machine learning, and building impactful real-world applications? Join the Oracle Applications Labs (OAL) team as a Senior AI/ML Developer, where you'll work alongside experienced engineers to design, build, and...
-
AI Engineer
hace 3 días
Ciudad de México, Ciudad de México MTS Moose Tech Solutions A tiempo completoPLEASE READ THOROUGHLY TO AVOID REJECTION.Eligibility:Location: Open to candidates from LATIN AMERICAN countries (e.g., Mexico, Uruguay, Colombia, Peru, Argentina, Chile, Brazil, Costa Rica, Puerto Rico, Nicaragua, Dominican Republic, El Salvador, Honduras, Panama).Full-time, remote position.4+ years of applied Machine Learning (ML) experience in a...
-
AI Engineer
hace 3 días
Ciudad de México, Ciudad de México Agent A tiempo completoTitle: AI EngineerLocation: (Remote – Mexico or Canada)Salary: $120K to $150KAbout Our ClientOur client is a neuroscience-based technology company dedicated to transforming health behavior change through a simple, positive, and evidence-based approach. Their platform leverages the science of habit formation—focusing on mindset, practice, and...
-
AI Engineer
hace 3 días
Ciudad de México, Ciudad de México Blindajes Autosafe A tiempo completoÁrea: Tecnología & Seguridad de la InformaciónModalidad: Híbrido | L–V · 8:00 a 17:00 hrs¿Qué harás en este rol?Serás responsable de diseñar, desarrollar e implementar soluciones de Inteligencia Artificial que impulsen nuestro ecosistema de salud digital. Colaborarás con equipos de producto, ingeniería, datos y negocio, desarrollando modelos...
-
REMOTE Senior ML/AI Developer
hace 3 días
Ciudad de México, Ciudad de México Oracle A tiempo completoDescriptionSenior AI Developer Location: Mexico (REMOTE)About the Role:We're looking for highly experienced AI Developers to join our team and help shape the future of AI at Oracle. In this role, you'll lead the design and development of advanced AI applications, particularly those powered by large language models (LLMs) and help integrate them across...
-
REMOTE Principal AI/ML Developer
hace 1 día
Ciudad de México, Ciudad de México Oracle A tiempo completoDescriptionPrincipal AI Developer Location: Mexico (REMOTE)About the Role:We're looking for highly experienced AI Developers to join our team and help shape the future of AI at Oracle. In this role, you'll lead the design and development of advanced AI applications, particularly those powered by large language models (LLMs) and help integrate them across...
-
AI/ML Architect
hace 4 días
Ciudad de México, Ciudad de México PepsiCo A tiempo completoOverviewWe Are PepsiCoJoin PepsiCo and Dare for Better We are the perfect place for curious people, thinkers and change agents. From leadership to front lines, we're excited about the future and working together to make the world a better place.Being part of PepsiCo means being part of one of the largest food and beverage companies in the world, with our...
-
AI Operations Manager
hace 4 días
Ciudad de México, Ciudad de México SuperAnnotate AI A tiempo completoAbout SuperAnnotateSuperAnnotate is a fast-growing, Series B startup revolutionizing the field of AI-data Infrastructure. We specialize in providing cutting-edge data pipeline solutions for Machine Learning, LLM, and GenAI solutions to large enterprise clients, helping them leverage the power of AI to transform their businesses. SuperAnnotate has a fully...
-
Team Lead AI Engineer
hace 1 semana
Ciudad de México, Ciudad de México Typescouts A tiempo completoSalary:USD $4,500 – $6,000 / month + PTOContract:Independent Contractor AgreementWorking Hours:9am – 5pm ESTLocation:Canada (Remote)About The CompanyOur client is a growing Canadian startup that creates innovative tools to simplify and consolidate business operations. They achieve this by integrating applications and data into a single, intelligent...
-
AI Operations Manager
hace 1 día
Ciudad de México, Ciudad de México SuperAnnotate AI A tiempo completoAbout SuperAnnotate SuperAnnotate is a fast-growing, Series B startup revolutionizing the field of AI-data Infrastructure. We specialize in providing cutting-edge data pipeline solutions for Machine Learning, LLM, and GenAI solutions to large enterprise clients, helping them leverage the power of AI to transform their businesses. SuperAnnotate has a fully...