An environment model for nonstationary reinforcement learning

Samuel P.M. Choi, Dit Yan Yeung, Nevin L. Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Citations (Scopus)

Abstract

Reinforcement learning in nonstationary environments is generally regarded as an important and yet difficult problem. This paper partially addresses the problem by formalizing a subclass of nonstationary environments. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always confined to a small number of hidden modes. A mode basically indexes a Markov decision process (MDP) and evolves with time according to a Markov chain. While HM-MDP is a special case of partially observable Markov decision processes (POMDP), modeling an HM-MDP environment via the more general POMDP model unnecessarily increases the problem complexity. A variant of the Baum-Welch algorithm is developed for model learning requiring less data and time.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 12 - Proceedings of the 1999 Conference, NIPS 1999
Pages307-313
Number of pages7
Publication statusPublished - 2000
Event13th Annual Neural Information Processing Systems Conference, NIPS 1999 - Denver, CO, United States
Duration: 29 Nov 19994 Dec 1999

Publication series

NameAdvances in Neural Information Processing Systems
ISSN (Print)1049-5258

Conference

Conference13th Annual Neural Information Processing Systems Conference, NIPS 1999
Country/TerritoryUnited States
CityDenver, CO
Period29/11/994/12/99

Fingerprint

Dive into the research topics of 'An environment model for nonstationary reinforcement learning'. Together they form a unique fingerprint.

Cite this