Record Details

Incorporating and Learning Behavior Constraints for Sequential Decision Making

ScholarsArchive at Oregon State University

Field Value
Title Incorporating and Learning Behavior Constraints for Sequential Decision Making
Names Pinto, Jervis (creator)
Fern, Alan P. (advisor)
Date Issued 2015-06-12 (iso8601)
Note Graduation date: 2015
Abstract Writing a program that performs well in a complex environment is a challenging task. In such problems, a method of deterministic programming combined with reinforcement learning (RL) can be helpful. However, current systems either force developers to encode knowledge in very specific forms (e.g., state-action features), or assume advanced RL knowledge (e.g., ALISP).

This thesis explores techniques that make it easier for developers, who may not be RL experts, to encode their knowledge in the form of behavior constraints. We begin with the framework of adaptation-based programming (ABP) for writing self-optimizing programs. Next, we show how a certain type of conditional independency called "influence information" arises naturally in ABP programs. We propose two algorithms for learning reactive policies that are capable of leveraging this knowledge. Using influence information to simplify the credit assignment problem produces significant performance improvements.

Next, we turn our attention to problems in which a simulator allows us to replace reactive decision-making with time-bounded search, which often outperforms purely reactive decision-making at significant computational cost. We propose a new type of behavior constraint in the form of partial policies, which restricts behavior to a subset of good actions. Using a partial policy to prune sub-optimal actions reduces the action branching factor, thereby speeding up search. We propose three algorithms for learning partial policies offline, based on reducing the learning problem to i.i.d. supervised learning and we give a reduction-style analysis for each one. We give concrete implementations using the popular framework of Monte-Carlo tree search. Experiments on challenging problems demonstrates large performance improvements in search-based decision-making generated by the learned partial policies.

Taken together, this thesis outlines a programming framework for injecting different forms of developer knowledge into reactive policy learning algorithms and search-based online planning algorithms. It represents a few small steps towards a programming paradigm that makes it easy to write programs that learn to perform well.
Genre Thesis/Dissertation
Access Condition http://creativecommons.org/licenses/by-nc-nd/3.0/us/
Topic online sequential decision making
Identifier http://hdl.handle.net/1957/56129

© Western Waters Digital Library - GWLA member projects - Designed by the J. Willard Marriott Library - Hosted by Oregon State University Libraries and Press