Skip to content
gqlxj1987's Blog
Posts
Tags
About
Archives
Search
Go back
Q-learning
4 Feb, 2021
|
Edit page
原文链接
通过reward值,可以形成矩阵
将agent的每一次探索称为一个episode,即从任意初始状态到达目标状态
Edit page
ML
Back To Top
Share this post on:
Share this post via WhatsApp
Share this post on Facebook
Share this post on X
Share this post via Telegram
Share this post on Pinterest
Share this post via email
Previous Post
风控系统
Next Post
borrow checker