Skip to content
Devil King's Blog
Posts
Tags
About
Archives
/
Posts
Tags
About
Archives
Search
Dark mode
Light mode
Go back
posts
2021
02-04-q-learnning
Q-learning
4 Feb, 2021
// content
原文链接
通过reward值,可以形成矩阵
将agent的每一次探索称为一个episode,即从任意初始状态到达目标状态
// end
#
ML
Previous
风控系统
Next
borrow checker
top