[1]龙诺亚,李子鹏,张 猛,等.基于PPO 的Serverless平台自动伸缩策略研究[J].机械与电子,2026,44(03):102-110.
 LONG Nuoya,LI Zipeng,ZHANG Meng,et al.Research on Auto-scaling Strategy for Serverless Platforms Based on PPO[J].Machinery & Electronics,2026,44(03):102-110.
点击复制

基于PPO 的Serverless平台自动伸缩策略研究()
分享到:

《机械与电子》[ISSN:1001-2257/CN:52-1052/TH]

卷:
44
期数:
2026年03期
页码:
102-110
栏目:
电力控制
出版日期:
2026-03-25

文章信息/Info

Title:
Research on Auto-scaling Strategy for Serverless Platforms Based on PPO
文章编号:
1001-2257(2026)03-0102-09
作者:
龙诺亚1李子鹏12张 猛1郑元伟1张 菡1童 勇3王喜宾4
(1.贵州电网有限责任公司,贵州 贵阳 550002;2.贵州大学计算机科学与技术学院,
贵州 贵阳 550025;3.联通(贵州)产业互联网有限公司,贵州 贵阳 550003;
4.贵州理工学院大数据学院,贵州 贵阳 550025)
Author(s):
LONG Nuoya1LI Zipeng12ZHANG Meng1ZHENG Yuanwei1ZHANG Han1TONG Yong3WANG Xibin4
(1.Guizhou Power Grid Co.,Ltd.,Guiyang 550002,China;
2.College of Computer Science and Technology,Guizhou University,Guiyang 550025,China;
3.China Unicom (Guizhou) Industrial Internet Co.,Ltd.,Guiyang 550003,China;
4.College of Data Science,Guizhou Institute of Technology,Guiyang 550025,China)
关键词:
Serverless自动伸缩近端策略优化马尔科夫决策过程
Keywords:
Serverlessautomatic scalingproximal policy optimization (PPO)Markov decision process(MDP)
分类号:
TP393
文献标志码:
A
摘要:
为提升Serverless平台自动伸缩的资源效率与服务质量稳定性,提出一种基于近端策略优化(PPO)的自动伸缩策略。首先,结合Knative弹性伸缩架构,将自动伸缩问题建模为马尔科夫决策过程,构建包含集群多维资源状态与负载特征的状态空间,设计融合吞吐量、响应时间及资源利用率阈值的复合奖励函数,并定义连续动作空间以适配Knative的参数配置特性。然后,基于Actor Critic框架设计PPO 算法,通过策略梯度优化与重要性采样机制实现稳定训练,解决传统强化学习方法在连续动作空间下的控制精度不足问题。最后,在Knative平台实现该策略,通过实时采集环境状态数据更新模型参数,动态调整资源分配与实例数量。实验结果表明,基于PPO 的自动伸缩策略在平均吞吐量上相较基于Q Learning的自动伸缩策略和平台默认策略KPA 分别有19.3%和106.1%的提升,平均响应延迟相较其他2种对比策略分别减少12 ms和108 ms,P90响应延迟相较其他2种对比策略分别减少50 ms和223 ms,在并发场景下可以为Serverless云计算平台提供更好的服务质量水平。
Abstract:
To enhance the resource efficiency and service quality stability of automatic scaling in Serverless platform,an automatic scaling strategy based on Proximal Policy Optimization (PPO) is proposed.Firstly,by integrating with the Knative elastic scaling architecture,the automatic scaling problem is modeled as a Markov Decision Process (MDP).A state space incorporating multi dimensional cluster resource states and load characteristics is constructed,and a compound reward function integrating throughput,response time,and resource utilization thresholds is designed.A continuous action space is defined to accommodate the Knative’s parameter configuration characteristics.Subsequently,a PPO algorithm is designed based on the Actor Critic framework.Stable training is achieved through policy gradient optimization and importance sampling mechanisms,addressing the insufficient control precision of traditional reinforcement learning methods in continuous action spaces.Finally,the strategy is implemented on the Knative platform,where model parameters are updated by collecting real time environmental state data to dynamically adjust resource allocation and instance counts.Experimental results demonstrate that compared to Q Learning based and the platform’s default KPA strategies,the PPO based strategy achieves improvements in average throughput by 19.3% and 106.1%,respectively.The average response latency is reduced by 12 ms and 108 ms,and the P90 response latency is reduced by 50 ms and 223 ms,respectively.In concurrent scenarios,it provides superior service quality for Serverless cloud computing platforms.

参考文献/References:

[1] 陈耿.深入浅出Serverless[M].北京:机械工业出版社,2019.
[2] 刘宇.Serverless工程实践:从原理入门到实战应用[M].北京:机械工业出版社,2021.
[3] BALDINI I,CASTRO P,CHANG K,et al.Serverless computing:current trends and open problems[M] ∥Research advances in cloud computing.Singapore:Springer,2017:1 20.
[4] Knative.Knative official documentation[EB/OL].[2025 05 14].https:∥knative.dev/.[5] Knative.Knative serving autoscaling system[EB/OL].[2025 05 14].https:∥github.com/knative/serving/blob/23f7e2bc5ce41a5b49242de6c97b943e73506fe7/docs/scaling/SYSTEM.md.
[6] 李志伟,游杨.Knative实战:基于Kubernetes的无服务器架构实践[M].北京:机械工业出版社,2021.
[7] TRAN M,KIM Y.Optimized resource usage with hybrid auto scaling system for knative serverless edge computing[J].Future generation computer systems,2024,152:304 316.
[8] AWS.Lambda scaling behavior AWS Lambda[EB/OL].[2025 05 15].https:∥docs.aws.amazon.com/lambda/latest/dg/scaling behavior.html.
[9] SCHULER L,JAMIL S,KÜHL N.AI based resource allocation:reinforcement learning for adaptive auto scaling in serverless environments[C]∥2021 IEEE/ACM 21st International Symposium on Cluster,Cloud and Internet Computing (CCGrid).New York:IEEE,2021:804 811.
[10] AGARWAL S,RODRIGUEZ M A,BUYYA R.A reinforcement learning approach to reduce serverless function cold start frequency[C]∥2021 IEEE/ACM 21st International Symposium on Cluster,Cloud and Internet Computing (CCGrid).New York:IEEE,2021:797 803.
[11] GARI Y,MONGE D A,MATEOS C.A Q learning approach for the autoscaling of scientific workflows in the cloud[J].Future generation computer systems,2022,127:168 180.
[12] 孙恩昌,袁永仪,吴兵,等.深度强化学习与移动通信资源管理:算法、进展与展望[J].北京工业大学学报,2023,49(1):71 88.

备注/Memo

备注/Memo:
收稿日期:2025-06-12
作者简介:龙诺亚 (1991-),男,贵州贵阳人,研究方向为通信软交换技术、语音平台等;张 猛 (1983-),男,辽宁鞍山人,研究方向为电力通信网络、电力物联网技术应用等;郑元伟 (1988-),男,贵州盘州人,硕士,高级工程师,研究方向为电力通信技术、光纤传感技术等,通信作者,E-mail:z214292461@126.com;张 菡 (1984-),女,贵州瓮安人,硕士,高级工程师,研究方向为电网信息通信;王喜宾 (1985-),男,河南洛阳人,博士,教授,研究方向为机器学习、服务计算。
更新日期/Last Update: 2026-04-29