科研队伍
首页 » 科研队伍 » 讲师 » 马亿
马亿
  • 马亿

    最终学历:

    研究方向:强化学习、具身智能

    电子邮箱:

  • 最终学位:博士

    研究生导师:

    联系电话:0351-7010566

  • 个人简介
  • 主持或参与项目
  • 发表论文

马亿,博士,讲师,师从郝建业教授,2024年博士毕业于天津大学智能与计算学部强化学习实验室,研究方向为强化学习、具身智能及强化学习的应用。近年来,在NeurIPS、ICML、ICLR、AAAI、IJCAI、KDD、CIKM等人工智能和数据挖掘国际顶级会议上发表论文20余篇,并担任各类国际会议和期刊审稿人。曾获华为2012实验室创新先锋二等奖、NeurIPS 2022自动驾驶比赛双赛道冠军、军科委全国兵棋推演大赛一等奖、天津大学优秀博士论文、NeurIPS Top Reviewer等荣誉,研究成果在阿里妈妈广告竞价、华为物流运输、问界自动驾驶、支付宝协商推荐场景等多个场景进行试点和应用。

[1] 通用博弈智能决策关键技术,科技创新2030“新一代人工智能”重大项目课题,参加

[2] 数据与知识双驱动的智能决策理论与方法研究,国家自然科学重大研究计划培育项目,参加

[3] 基于多智能体强化学习的多XXX协同策略研究,国防科技创新特区项目课题,参与

[4] 临机场景下XXXXX技术,航天某院项目,学生负责人

[5] 面向XXXXX的智能在线策略调度技术,航天某院项目,学生负责人


[1] Ma Y, Hao J, Liang H, Xiao C. Rethinking Decision Transformer via Hierarchical Reinforcement Learning. ICML.2024.(CCF A类会议)

[2] Ma Y, Tang H, Li D, Meng Z. Reining Generalization in Offline Reinforcement Learning via Representation Distinction. NeurIPS.2023.(CCF A类会议)

[3] Ma Y, Wang C, Chen C, Liu J, Meng Z, Zheng Y, Hao J. OSCAR: OOD State-Conservative Offline Reinforcement Learning for Sequential Decision Making. CAAI Artificial Intelligence Research.2023.

[4] Ma Y, Hao X, Hao J, Lu J, Liu X, Tong X, Yuan M, Li Z, Tang J, Meng Z. A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems. NeurIPS.2021.(CCF A类会议)

[5] Liang H(共同一作), Ma Y(共同一作), Cao Z, Liu T, Ni F, Li Z, Hao J. SplitNet: A Reinforcement Learning Based Sequence Splitting Method for the MinMax Multiple Travelling Salesman Problem. AAAI.2023.(CCF A类会议)

[6] Hao X(共同一作), Peng Z(共同一作), Ma Y(共同一作), Wang G, Jin J, Hao J, Chen S, Bai R, Xie M, Xu M, Zheng Z. Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising. ICML.2020.(CCF A类会议)

[7] Liu L(共同一作), Ma Y(共同一作), Zhu X, Yang Y, Hao X, Wang L, Peng J. Integrating Sequence and Network Information to Enhance Protein-Protein Interaction Prediction Using Graph Convolutional Networks. BIBM.2019.(CCF B类会议)

[8] Liu J, Hao J, Ma Y, Xia S. Imagine Big from Small: Unlock the Cognitive Generalization of Deep Reinforcement Learning from Simple Scenarios. ICML.2024.(CCF A类会议)

[9] Zhao K, Hao J, Ma Y, Liu J, Zheng Y, Meng Z. ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles. IJCAI.2024.(CCF A类会议)

[10] Yuan Y, Hao J, Ma Y, Dong Z, Liang H, Liu J, Feng Z, Zhao K, Zheng Y. Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback. ICLR.2024.(CAAI & 清华A类会议)

[11] Liu J, Ma Y, Hao J, Hu Y, Zheng Y, Lv T, Fan C. A Trajectory Perspective on the Role of Data Sampling Techniques in Offline Reinforcement Learning. AAMAS.2024.(CCF B类会议)

[12] Liang H, Dong Z, Ma Y, Hao X, Zheng Y, Hao J. A Hierarchical Imitation Learning-based Decision Framework for Autonomous Driving. CIKM.2023.(CCF B类会议)

[13] Sang T, Tang H, Ma Y, Hao J, Zheng Y, Meng Z, Li B, Wang Z. PAnDR: Fast Adaptation to New Environments from Offine Experiences via Decoupling Policy and Environment Representations. IJCAI.2022.(CCF A类会议)

[14] Ni F, Hao J, Lu J, Tong X, Yuan M, Duan J, Ma Y, He K. A Multi-Graph Attributed Reinforcement Learning based Optimization Algorithm for Large-scale Hybrid Flow Shop Scheduling Problem. KDD.2021.(CCF A类会议)

[15] Wang H, Tang H, Hao J, Hao X, Fu Y, Ma Y. Large Scale Deep Reinforcement Learning in War-games. BIBM.2020.(CCF B类会议)

[16] Zhang P, Hao J, Wang W, Tang H, Ma Y, Duan Y, Zheng Y. KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge. IJCAI.2020.(CCF A类会议)