Location
montreal
Posted
June 01, 2026
Commute
Local Area
Local Opportunity Near You!
This job is in your area. Enjoy a short commute and work close to home.
Job Description
Elevate adaptive AI capabilities with APPIT Software Solutions as a Reinforcement Learning Engineer in Montreal, Canada. Build cutting-edge systems for optimization and autonomous decision-making using advanced reinforcement learning techniques.
In this role, you will design and implement algorithms that tackle enterprise optimization challenges. With a focus on RLHF alignment for large language models, the position requires at least 5 years of machine learning experience, including 2 years in reinforcement learning. You will also develop simulation environments to train and evaluate RL agents while collaborating with research teams to bring RL innovations into production.
Key Responsibilities:
β’ Design reinforcement learning algorithms for optimization
β’ Build RLHF and reward modeling pipelines
β’ Develop environments for RL agent training
β’ Implement multi-agent systems for coordination tasks
β’ Optimize RL training stability and ...
In this role, you will design and implement algorithms that tackle enterprise optimization challenges. With a focus on RLHF alignment for large language models, the position requires at least 5 years of machine learning experience, including 2 years in reinforcement learning. You will also develop simulation environments to train and evaluate RL agents while collaborating with research teams to bring RL innovations into production.
Key Responsibilities:
β’ Design reinforcement learning algorithms for optimization
β’ Build RLHF and reward modeling pipelines
β’ Develop environments for RL agent training
β’ Implement multi-agent systems for coordination tasks
β’ Optimize RL training stability and ...