强化学习是机器学习中的一个领域,强调如何基于环境而行动,以取得最大化的预期利益。
从最初的RLHF(人类反馈强化学习),转向RLVR(基于可验证反馈强化学习),再转向前沿的“自然语言奖励”。
Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Interv...
活性物质是指由自我驱动实体组成的系统,这些实体消耗能量以产生运动,表现出复杂的非平衡动力学,挑战了传统模型。随着机器学习的快速发展,强化学习(RL)已成为应对活...
分层强化学习(HRL)通过分解复杂的决策过程,在长时域和稀疏奖励任务中表现出有效性,但由于层级间不稳定、子目标调度效率低下、响应延迟以及可解释性差等问题,其在现...
Quantum Reinforcement Learning by Adaptive Non-local Observables
Deeply felt affect- the emergence of valence in deep active inference
A Survey of Reinforcement Learning for Large Reasoning Models
元强化学习教程 A Tutorial on Meta-Reinforcement Learning
A Survey of Reinforcement Learning for Optimization in Automation
Lifelong Reinforcement Learning with Similarity-Driven Weighting by Large Models
Are Reasoning Models More Prone to Hallucination?
https://www.nature.com/articles/s42256-025-00983-2
在后期训练中,我们超越了标准的有监督微调。我们实施了一个顺序的强化学习流程——从推理强化学习开始,接着是智能体强化学习,最后是通用强化学习。
Epistemically-guided forward-backward exploration
基于人类反馈的强化学习(RLHF)已成为部署最新机器学习系统的重要技术和叙事工具。在本书中,我们希望为具备一定量化背景的读者提供对核心方法的简明介绍。本书首先回...
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmark...
Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Random...
DIME:Diffusion-Based Maximum Entropy Reinforcement Learning DIME:基于扩散的最大熵强化学习
Reinforcement Learning for Reasoning in Large Language Models with One Training ...
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-trai...