Research
My research is on the normative design of general-purpose AI agents: what objectives should generalist AI agents pursue, how can we evaluate their success, and what does it mean for an agent to be “aligned” with humanity? My work draws on language modeling, reinforcement learning, decision theory, social choice, and causal modeling.
My PhD thesis was on Leveraging Structure to Represent Tasks in Sequential Decision Making (University of Toronto, 2024).
Research statement (2024) Research questions
Normative Goal Design
The axiomatic approach starts with a set of simple properties and derives powerful conclusions about the agents that satisfy them; e.g., that a “rational” agent is an expected-utility maximizer, or that an agent serving multiple principals must be able to compare and aggregate their utilities. A core focus of my work has been to apply and extend normatively satisfying results from decision theory and social choice to reinforcement-learning agents, where the existence of “rewards” specifying general-purpose objectives had previously been assumed without justification.
- Rationalizing Boltzmann Rationality: An Axiomatic Characterization of Entropy-Regularized PoliciesAn axiomatic characterization of the soft Bellman equation: separating environmental chance from agent choice reconciles entropy bonuses with expected utility, and independence-style axioms at decision nodes pin down the softmax form.RLC 2026
- Consistent Aggregation of Objectives with Diverse Time Preferences Requires Non-Markovian RewardsWhen an agent’s principals discount the future at different rates, no Markovian reward can faithfully aggregate their objectives; we derive a practical approach to the resulting non-Markovian reward aggregation.NeurIPS 2023 arXiv
-
- Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic ApproachRationality axioms imply a more expressive RL objective and reward structure than the default fixed-discount MDP return, including state-action-dependent, but not (s, a, s’)-dependent, discounting.
Language-Based Specification & Evaluation
General-purpose language models give AI agents a powerful, human-compatible interface for specifying and interpreting goals, but natural language is inherently underspecified, which can lead to incomplete instructions, disagreement between principals, and misunderstandings on the part of agents. How can deployers express their requirements for AI agents in an author-legible way, how can those requirements be evaluated or enforced at runtime, and how should we evaluate whether current alignment and verification methods are measuring the right things?
- Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language ModelsVivaBench: a multi-turn benchmark of 1762 physician-curated clinical vignettes that exposes brittle sequential reasoning in medical LMs.NeurIPS 2025 arXiv
- Improving Context-Aware Preference Modeling for Language ModelsSplits reward modeling into context selection and context-conditioned preference, and shows that this can increase annotator agreement. Constructs a reasonable-preference-reversal dataset for training context-aware preference and reward models.NeurIPS 2024 arXiv
- Report Cards: Qualitative Evaluation of Language Models Using Natural Language SummariesUsing language models to write fine-grained qualitative report cards of a model’s strengths and weaknesses.Workshop 2024 arXiv
- Failure Modes of Learning Reward Models for LLMs and Other Sequence ModelsSurveys how learned reward models for LLMs fail (model misspecification, ambiguous preferences, and unidentifiable rewards) and what is needed to fix each.Workshop 2023 PDF
Structured Generalization in RL
Long-horizon tasks can be broken into smaller, more tractable sub-parts, and the structure of an agent’s environment tells us how. A causal approach decomposes a task into subprocesses that are sufficiently independent that reasoning about them separately improves sample efficiency and enables out-of-distribution generalization. A geometric approach instead embeds tasks in a space whose structure lets an agent reason about subgoals and about the frontier of its own knowledge. This is the empirical and methodological strand of my work.
- ProtoGE: Prototype Goal Encodings for Multi-goal Reinforcement LearningPrototype goal encodings use a finer goal topology to solve coarse multi-goal tasks more efficiently.RLDM 2019 PDF