Natural gradient policy for average cost SMDP problem

Anh Vien Ngo, Tae Choong Chung

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

Semi-Markov decision processes (SMDP) are continuous time generlizations of discrete time Markov Decision Process. A number of value and policy iteration algorithms have been developed for the solution of SMDP problem. But solving SMDP problem requires prior knowledge of the deterministic kernels, and suffers from the curse of dimensionality. In this paper, we present the steepest descent direction based on a family of parameterized policies to overcome those limitations. The update rule is based on stochastic policy gradients employing Amari's natural gradient approach that is moving toward choosing a greedy optimal action. We then show considerable performance improvements of this method in the simple two-state SMDP problem and in the more complex SMDP of call admission control problem.

Original languageEnglish
Title of host publicationProceedings 19th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2007
Pages11-18
Number of pages8
DOIs
Publication statusPublished - 2007
Event19th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2007 - Patras, Greece
Duration: 29 Oct 200731 Oct 2007

Publication series

NameProceedings - International Conference on Tools with Artificial Intelligence, ICTAI
Volume1
ISSN (Print)1082-3409

Conference

Conference19th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2007
Country/TerritoryGreece
CityPatras
Period29/10/0731/10/07

Fingerprint

Dive into the research topics of 'Natural gradient policy for average cost SMDP problem'. Together they form a unique fingerprint.

Cite this