Skip to content
gqlxj1987's Blog
Go back

Q-learning

Edit page

原文链接

image-20210204112953710 image-20210204113757023

通过reward值,可以形成矩阵

image-20210204113837798

将agent的每一次探索称为一个episode,即从任意初始状态到达目标状态

image-20210204114509095

Edit page
Share this post on:

Previous Post
风控系统
Next Post
borrow checker