PhD Student, Machine Learning
University of Toronto
Vector Institute
I am a final-year PhD student at the University of Toronto and Vector Institute, working with Jimmy Ba. My research focuses on the normative design of goals, rewards and abstractions for intelligent agents, including reinforcement learning agents and large language models.
My research has been funded by a Schwartz Reisman Graduate Fellowship, an NSERC CGS-D award, a Vector Research Grant, as well as OGS and UofT FAST scholarships.
I completed my master’s in computer science at Georgia Tech. Before this, I was a lawyer at Kirkland & Ellis in New York, where I worked on big corporate transactions (e.g., this and this). Before becoming a lawyer I was a fairly successful online poker player.
I received my J.D. in 2014 from Harvard Law School, where I was a fellow at the Olin Center for Law, Economics, and Business. My undergrad was in finance and economics at the Schulich School of Business in Toronto.
My ultimate research interest lies in the normative design of general purpose artificial agency: how should we design AIs that solve general tasks and contribute positively to society? I’m currently working toward a normatively justified framework for reasoning about ideal preferences and goal.
My current research statement:
Prior research statements:
Or check out my papers below. If we share research interests or you have an idea you’d like to collaborate on, I’d be excited to talk to you!
For a complete list, please see my Google Scholar.
From a set of intuitively appealing axioms, I show that Markovian aggregation of Markovian reward functions is not possible when the time preference for each objective may vary. It follows that optimal multi-objective agents must admit rewards that are non-Markovian with respect to the individual objectives. Our work offers new insights into sequential, multi-objective agency and intertemporal choice, and has practical implications for the design of AI systems deployed to serve multiple generations of principals with varying time preference.
Can all “rational” preference structures be represented using the standard RL model (the MDP)? This paper presents a minimal axiomatic framework for rationality in sequential decision making and shows that the implied cardinal utility function is of a more general form than the discounted additive utility function of an MDP. In particular, the developed framework allows for a state-action dependent “discount” factor that is not constrained to be less than 1 (so long as there is eventual long run discounting).
We propose context-specific preference datasets and conduct experiments to investigate the potential of context-specific preference modeling.
We propose to use LMs to generate Report Cards, which are fine-grained qualitative evaluations of a model’s behaviors, including its strengths and weaknesses, with respect to specific topics or datasets.
Can RL agents generalize to new tasks w/ unseen states? We extend our local causal model framework to model-based RL and show that this is possible, both theoretically and empirically. See the twitter thread for a summary.
We propose a local causal model (LCM) framework that captures the benefits of decomposition in settings where the global causal model is densely connected. We used our framework to design a local Counterfactual Data Augmentation (CoDA) algorithm that expands available training data with counterfactual samples by stitching together locally independent subsamples from the environment. Empirically, we showed that CoDA can more than double the sample efficiency and final performance of reinforcement learning agents in locally factored environments.
What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? Our MEGA and OMEGA agents set achievable goals in sparsely explored areas of the goal space to maximize the entropy of the historical achieved goal distribution. This lets them learn to navigate mazes and manipulate blocks with a fraction of the samples used by prior approaches.
We propose novel neural network architectures, guaranteed to satisfy the triangle inequality, for purposes of (asymmetric) metric learning and modeling graph distances.
I was course instructor for the first virtual iteration of Introduction to Machine Learning (CSC 311) in Fall 2020, together with Roger Grosse, Chris Maddison, and Juhan Bae.
I have advised a number of students on research. If you are an aspiring researcher and find my work interesting, please reach out.
Blogging. I used to keep an academic ML/AI blog at r2rt.com. I will restart this sometime… hopefully soon? Before that, I used to keep an economics blog.
A random tax artifact. If you’re a hedge fund manager you may be interested in this triple tax arbitrage scheme I came up with.
When I have time, I enjoy connecting over video (or coffee if you’re in Toronto). I’m interested in discussing ideas related to:
You can reach me at: