TY - GEN
T1 - Reinforcement learning in episodic non-stationary markovian environments
AU - Choi, Samuel Ping Man
AU - Zhang, Nevin L.
AU - Yeung, Dit Yan
PY - 2004
Y1 - 2004
N2 - Reinforcement learning in non-stationary environments is generally regarded as a very difficult problem. Without any prior knowledge about the environment, this problem can be unsolvable in the worst case. In this paper, we attempt to partially address this grand challenge by formalizing a broad class of non-stationary Markovian environments, of which the state space, action space, transition function, and reward (or cost) function may change over time but with some regularities. We call these environments episodic non-stationary Markovian environments (ENME), which form a fairly common class of non-stationary environments for characterizing many real-world decision problems. We begin with a special subclass of ENMEs called periodic non-stationary Markovian environments (PNME) and then generalize this subclass to more general and realistic forms. Afterwards, we show how the episodic property can be exploited to make the problems solvable by combining conventional reinforcement learning algorithms with the state augmentation method.
AB - Reinforcement learning in non-stationary environments is generally regarded as a very difficult problem. Without any prior knowledge about the environment, this problem can be unsolvable in the worst case. In this paper, we attempt to partially address this grand challenge by formalizing a broad class of non-stationary Markovian environments, of which the state space, action space, transition function, and reward (or cost) function may change over time but with some regularities. We call these environments episodic non-stationary Markovian environments (ENME), which form a fairly common class of non-stationary environments for characterizing many real-world decision problems. We begin with a special subclass of ENMEs called periodic non-stationary Markovian environments (PNME) and then generalize this subclass to more general and realistic forms. Afterwards, we show how the episodic property can be exploited to make the problems solvable by combining conventional reinforcement learning algorithms with the state augmentation method.
UR - http://www.scopus.com/inward/record.url?scp=12744269934&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:12744269934
SN - 1932415335
SN - 9781932415339
T3 - Proceedings of the International Conference on Artificial Intelligence, IC-AI'04
SP - 752
EP - 758
BT - Proceedings of the International Conference on Artificial Intelligence, IC-AI'04 and Proceedings of the International Conference on Machine Learning; Models, Technologies and Applications, MLMTA'04)
A2 - Arabnia, H.R.
A2 - Youngsong, M.
T2 - Proceedings of the International Conference on Artificial Intelligence, IC-AI'04
Y2 - 21 June 2004 through 24 June 2004
ER -