Silviu Pitis

PhD Student, Machine Learning
University of Toronto
Vector Institute


I am a final-year PhD student at the University of Toronto and Vector Institute, working with Jimmy Ba. My research focuses on the normative design of goals, rewards and abstractions for intelligent agents, including reinforcement learning agents and large language models.

My research has been funded by a Schwartz Reisman Graduate Fellowship, an NSERC CGS-D award, a Vector Research Grant, as well as OGS and UofT FAST scholarships.

I completed my master’s in computer science at Georgia Tech. Before this, I was a lawyer at Kirkland & Ellis in New York, where I worked on big corporate transactions (e.g., this and this). Before becoming a lawyer I was a fairly successful online poker player.

I received my J.D. in 2014 from Harvard Law School, where I was a fellow at the Olin Center for Law, Economics, and Business. My undergrad was in finance and economics at the Schulich School of Business in Toronto.


My ultimate research interest lies in the normative design of general purpose artificial agency: how should we design AIs that solve general tasks and contribute positively to society? I’m currently working toward:

My current research statement:

Prior research statements:

Or check out my papers below. If we share research interests or you have an idea you’d like to collaborate on, I’d be excited to talk to you!

Selected Papers

For a complete list, please see my Google Scholar.

Axiomatic Design

Consistent Aggregation of Time-Differing Objectives Requires Non-Markovian Rewards

Silviu Pitis. [Under Review; Arxiv Soon]

From a set of intuitively appealing axioms, I show that Markovian aggregation of Markovian reward functions is not possible when the time preference for each objective may vary. It follows that optimal multi-objective agents must admit rewards that are non-Markovian with respect to the individual objectives. Our work offers new insights into sequential, multi-objective agency and intertemporal choice, and has practical implications for the design of AI systems deployed to serve multiple generations of principals with varying time preference.

Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

Silviu Pitis. AAAI 2019. (Paper, Slides, Poster)

Can all “rational” preference structures be represented using the standard RL model (the MDP)? This paper presents a minimal axiomatic framework for rationality in sequential decision making and shows that the implied cardinal utility function is of a more general form than the discounted additive utility function of an MDP. In particular, the developed framework allows for a state-action dependent “discount” factor that is not constrained to be less than 1 (so long as there is eventual long run discounting).

Compositional Reasoning for Generalization

MoCoDA: Model-based Counterfactual Data Augmentation

Silviu Pitis, Elliot Creager, Ajay Mandlekar, Animesh Garg. NeurIPS 2022. (Arxiv, Website)

Can RL agents generalize to new tasks w/ unseen states? We extend our local causal model framework to model-based RL and show that this is possible, both theoretically and empirically. See the twitter thread for a summary.

Counterfactual Data Augmentation using Locally Factored Dynamics

Silviu Pitis, Elliot Creager, Animesh Garg. NeurIPS 2020. Object-Oriented Learning Workshop at ICML 2020 (Outstanding Paper). (Arxiv, Talk, Code, Poster, OOL Workshop)

We propose a local causal model (LCM) framework that captures the benefits of decomposition in settings where the global causal model is densely connected. We used our framework to design a local Counterfactual Data Augmentation (CoDA) algorithm that expands available training data with counterfactual samples by stitching together locally independent subsamples from the environment. Empirically, we showed that CoDA can more than double the sample efficiency and final performance of reinforcement learning agents in locally factored environments.

Leveraging Structure in Reinforcement Learning

Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

Silviu Pitis*, Harris Chan*, Stephen Zhao, Bradly Stadie, Jimmy Ba. ICML 2020. Adaptive and Learning Agents Workshop at AAMAS 2020 (Best Paper). (Arxiv, Talk, Code)

What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? Our MEGA and OMEGA agents set achievable goals in sparsely explored areas of the goal space to maximize the entropy of the historical achieved goal distribution. This lets them learn to navigate mazes and manipulate blocks with a fraction of the samples used by prior approaches.

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

Silviu Pitis*, Harris Chan*, Kiarash Jamali, Jimmy Ba. ICLR 2020. (Arxiv, OpenReview, Talk, Code)

We propose novel neural network architectures, guaranteed to satisfy the triangle inequality, for purposes of (asymmetric) metric learning and modeling graph distances.


I was course instructor for the first virtual iteration of Introduction to Machine Learning (CSC 311) in Fall 2020, together with Roger Grosse, Chris Maddison, and Juhan Bae.

I have advised a number of students on research. If you are an aspiring researcher and find my work interesting, please reach out.



When I have time, I enjoy connecting over video (or coffee if you’re in Toronto). I’m interested in discussing ideas related to:

You can reach me at: