I work as a member of the technical staff at Cohere. Previously, I worked as a postdoc for Prof. Emma Brunskill at Stanford University. I received my PhD at the University of Massachusetts, where I was fortunate to be advised by Prof. Philip Thomas.


Research Interests

Click here for all the publications.

Formal Reasoning & Decision-Making with Foundation Models

lean

Information Directed Tree Search: Reasoning and Planning with Language Agents
Yash Chandak, HyunJi Nam, Allen Nie, Jonathan Lee, Emma Brunskill
Bayesian Decision-making and Uncertainty Workshop at Neural Information Processing Systems (BDU@NeurIPS 2024)

Abstract: Formal reasoning tasks are challenging to solve but often there is availability of rich feedback, unlike a scalar feedback in the classical RL setting. How do we combine LLMs and RL to obtain the best of both for long-horizon (formal) reasoning tasks like theorem proving and code generation?

DPT

Supervised Pretraining Can Learn In-Context Reinforcement Learning
Jonathan N. Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill.
(Spotlight) Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023) | Arxiv

Abstract: Can supervised pre-training provide in-context capabilities to solve decision-making problems? Perhaps surprisingly, drawing formal connections to posterior sampling, in-context interaction with the same model can result in conservative behavior in the offline setting, and also optimistic exploration in the online setting.

Strategic Data Collection & Reward Design

DIA

Adaptive Instrument Design for Indirect Experiments
Yash Chandak, Shiv Shankar, Vasilis Syrgkanis, Emma Brunskill.
Twelfth International Conference on Learning Representations (ICLR 2024) | Arxiv

Abstract: In human-AI systems, AI can only be suggestive and not prescriptive about what a human should do (e.g., how should a student interact with LLMs to learn quicker). In such cases, how should AI systems interact strategically to quickly estimate what would have happened had the human complied to its suggestions?

BARFI

Behavior Alignment via Reward Function Optimization
Dhawal Gupta*, Yash Chandak*, Scott Jordan, Philip Thomas, Bruno Castro da Silva. *Equal contribution
(Spotlight) Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023) | Arxiv

Abstract: How should we leverage side-information to design reward functions that are dense, yet aligned with a user's goal? We show that the classic approach of reward shaping has several limitations, and propose a new bi-level reward alignment procedure to address the challenges.