I’m a third year MS/PhD student at the University of Massachusetts. I am also a member of the Autonomous Learning Lab (ALL) and am fortunate to be advised by Prof. Philip Thomas. My primary interest is in continual learning, a branch of Artificial Intelligence, which aims at teaching machines new concepts over time. My research is mostly at the intersection of reinforcement learning, optimization, and machine learning. I enjoy reading and looking out for inspirations from neuroscience as well.



Recent

  • Our papers on (a) Optmizing for the future in non-stationary MDPs, and (b) Evaluating the performance of RL algorithms, got accepted at ICML’20!
  • Received Outstanding Student Paper Honorable Mention by AAAI’20 for our paper on Lifelong Learning with a Changing Action Set.
  • Our papers on (a) lifelong learning with a changing action set, and (b) RL when all actions are not always available, got accepted at AAAI’20!
  • Our paper on learning action representations got accepted at ICML’19!
  • I will be interning at Adobe Research under Sridhar Mahadevan and Georgios Theocharous during Summer 2019.



Publications

2020

Future

Optimizing for the Future in Non-Stationary MDPs
Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip Thomas
Thirty-seventh International Conference on Machine Learning (ICML 2020)

Abstract | Arxiv

Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary. However, in many real-world applications, this assumption is violated, and using existing algorithms may result in a performance lag. To proactively search for a good future policy, we present a policy gradient algorithm that maximizes a forecast of future performance. This forecast is obtained by fitting a curve to the counter-factual estimates of policy performance over time, without explicitly modeling the underlying non-stationarity. The resulting algorithm amounts to a non-uniform reweighting of past data, and we observe that minimizing performance over some of the data from past episodes can be beneficial when searching for a policy that maximizes future performance. We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques, on three simulated problems motivated by real-world applications.

eval

Evaluating the Performance of Reinforcement Learning Algorithms
Scott Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip Thomas
Thirty-seventh International Conference on Machine Learning (ICML 2020)

Abstract | Arxiv

Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning. Recent reproducibility analyses have shown that reported performance results are often inconsistent and difficult to replicate. In this work, we argue that the inconsistency of performance stems from the use of flawed evaluation metrics. Taking a step towards ensuring that reported results are consistent, we propose a new comprehensive evaluation methodology for reinforcement learning algorithms that produces reliable measurements of performance both on a single environment and when aggregated across environments. We demonstrate this method by evaluating a broad class of reinforcement learning algorithms on standard benchmark tasks.

SAS

Lifelong Learning with a Changing Action Set
Yash Chandak, Georgios Theocharous, Chris Nota, Philip Thomas
(Oral) Thirty-fourth Conference on Artificial Intelligence (AAAI 2020)
Outstanding Student Paper Honorable Mention.

Abstract | Arxiv

In many real-world sequential decision making problems, the number of available actions (decisions) can vary over time. While problems like catastrophic forgetting, changing transition dynamics, changing rewards functions, etc. have been well-studied in the lifelong learning literature, the setting where the action set changes remains unaddressed. In this paper, we present an algorithm that autonomously adapts to an action set whose size changes over time. To tackle this open problem, we break it into two problems that can be solved iteratively: inferring the underlying, unknown, structure in the space of actions and optimizing a policy that leverages this structure. We demonstrate the efficiency of this approach on large-scale real-world lifelong learning problems.

SAS

Reinforcement Learning When All Actions are Not Always Available
Yash Chandak, Georgios Theocharous, Blossom Metevier, Philip Thomas
Thirty-fourth Conference on Artificial Intelligence (AAAI 2020)

Abstract | Arxiv

The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not capture the setting where the set of available decisions (actions) at each time step is stochastic. Recently, the stochastic action set Markov decision process (SAS-MDP) formulation has been proposed, which captures the concept of a stochastic action set. In this paper we argue that existing RL algorithms for SAS-MDPs suffer from divergence issues, and present new algorithms for SAS-MDPs that incorporate variance reduction techniques unique to this setting, and provide conditions for their convergence. We conclude with experiments that demonstrate the practicality of our approaches using several tasks inspired by real-life use cases wherein the action set is stochastic.

2019

action_representations

Learning Action Representations for Reinforcement Learning
Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, Philip Thomas
Thirty-sixth International Conference on Machine Learning (ICML 2019)

Abstract | Arxiv

Most model-free reinforcement learning methods leverage state representations (embeddings) for generalization, but either ignore structure in the space of actions or assume the structure is provided a priori. We show how a policy can be decomposed into a component that acts in a low-dimensional space of action representations and a component that transforms these representations into actual actions. These representations improve generalization over large, finite action sets by allowing the agent to infer the outcomes of actions similar to actions already taken. We provide an algorithm to both learn and use action representations and provide conditions for its convergence. The efficacy of the proposed method is demonstrated on large-scale real-world problems.

action_generalization

Improving Generalization over Large Action Sets
Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, Philip Thomas
(Oral) 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2019)

eval

Evaluating Reinforcement learning Algorithms Using Cumulative Distributions of Performance
Scott Jordan, Yash Chandak, Mengxue Zhang, Daniel Cohen, Philip Thomas
4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2019)

Bellman

Classical Policy Gradient: Preserving Bellman’s Principle of Optimality
Philip Thomas, Scott Jordan, Yash Chandak, Chris Nota, James Kostas,
Technical Report.

Abstract | Arxiv

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

2018

dynamic_actions

Reinforcement Learning with a Dynamic Action Set
Yash Chandak, Georgios Theocharous, James Kostas, Philip Thomas
Continual Learning workshop at the Thirty-second Conference on Neural Information Processing Systems (NeurIPS 2018)

FGCN

Fusion Graph Convolutional Networks
Priyesh Vijayan Yash Chandak, Mitesh Khapra, Balaraman Ravindran
14th International Workshop on Machine Learning with Graphs, 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2018).

Abstract | Arxiv | Code

Semi-supervised node classification in attributed graphs, i.e., graphs with node features, involves learning to classify unlabeled nodes given a partially labeled graph. Label predictions are made by jointly modeling the node and its' neighborhood features. State-of-the-art models for node classification on such attributed graphs use differentiable recursive functions that enable aggregation and filtering of neighborhood information from multiple hops. In this work, we analyze the representation capacity of these models to regulate information from multiple hops independently. From our analysis, we conclude that these models despite being powerful, have limited representation capacity to capture multi-hop neighborhood information effectively. Further, we also propose a mathematically motivated, yet simple extension to existing graph convolutional networks (GCNs) which has improved representation capacity. We extensively evaluate the proposed model, F-GCN on eight popular datasets from different domains. F-GCN outperforms the state-of-the-art models for semi-supervised learning on six datasets while being extremely competitive on the other two.

HOPF

HOPF: Higher Order Propagation Framework for Deep Collective Classification
Priyesh Vijayan Yash Chandak, Mitesh Khapra, Balaraman Ravindran
Eighth International Workshop on Statistical Relational AI at the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018).

Abstract | Arxiv | Code

Given a graph where every node has certain attributes associated with it and some nodes have labels associated with them, Collective Classification (CC) is the task of assigning labels to every unlabeled node using information from the node as well as its neighbors. It is often the case that a node is not only influenced by its immediate neighbors but also by higher order neighbors, multiple hops away. Recent state-of-the-art models for CC learn end-to-end differentiable variations of Weisfeiler-Lehman (WL) kernels to aggregate multi-hop neighborhood information. In this work, we propose a Higher Order Propagation Framework, HOPF, which provides an iterative inference mechanism for these powerful differentiable kernels. Such a combination of classical iterative inference mechanism with recent differentiable kernels allows the framework to learn graph convolutional filters that simultaneously exploit the attribute and label information available in the neighborhood. Further, these iterative differentiable kernels can scale to larger hops beyond the memory limitations of existing differentiable kernels. We also show that existing WL kernel-based models suffer from the problem of Node Information Morphing where the information of the node is morphed or overwhelmed by the information of its neighbors when considering multiple hops. To address this, we propose a specific instantiation of HOPF, called the NIP models, which preserves the node information at every propagation step. The iterative formulation of NIP models further helps in incorporating distant hop information concisely as summaries of the inferred labels. We do an extensive evaluation across 11 datasets from different domains. We show that existing CC models do not provide consistent performance across datasets, while the proposed NIP model with iterative inference is more robust.

2015

Human-Machine

On Optimizing Human-Machine Task Assignment
Andreas Veit, Michael Wilber, Rajan Vaish, Serge Belongie, James Davis, Others
The thrid AAAI Conference on Human Computation and Crowdsourcing (wip) (HCOMP 2015).

Abstract | Arxiv

When crowdsourcing systems are used in combination with machine inference systems in the real world, they benefit the most when the machine system is deeply integrated with the crowd workers. However, if researchers wish to integrate the crowd with "off-the-shelf" machine classifiers, this deep integration is not always possible. This work explores two strategies to increase accuracy and decrease cost under this setting. First, we show that reordering tasks presented to the human can create a significant accuracy improvement. Further, we show that greedily choosing parameters to maximize machine accuracy is sub-optimal, and joint optimization of the combined system improves performance.



Service

  • Reviewer for ICML (2020, 2019)
  • Reviewer for NeurIPS (2020, 2019)