Silviu Pitis

PhD Student, Machine Learning
University of Toronto
Vector Institute

Bio

I am a final-year PhD student at the University of Toronto and Vector Institute, working with Jimmy Ba. My research focuses on the normative design of goals, rewards and abstractions for intelligent agents, including reinforcement learning agents and large language models.

My research has been funded by a Schwartz Reisman Graduate Fellowship, an NSERC CGS-D award, a Vector Research Grant, as well as OGS and UofT FAST scholarships.

I completed my master’s in computer science at Georgia Tech. Before this, I was a lawyer at Kirkland & Ellis in New York, where I worked on big corporate transactions (e.g., this and this). Before becoming a lawyer I was a fairly successful online poker player.

I received my J.D. in 2014 from Harvard Law School, where I was a fellow at the Olin Center for Law, Economics, and Business. My undergrad was in finance and economics at the Schulich School of Business in Toronto.

Research

My ultimate research interest lies in the normative design of general purpose artificial agency: how should we design AIs that solve general tasks and contribute positively to society? I’m currently working toward a normatively justified framework for reasoning about ideal preferences and goal.

My current research statement:

Research Statement (November 2024)

Prior research statements:

Or check out my papers below. If we share research interests or you have an idea you’d like to collaborate on, I’d be excited to talk to you!

Selected Papers

For a complete list, please see my Google Scholar.

Axiomatic Design

Consistent Aggregation of Objectives with Diverse Time Preferences Requires Non-Markovian Rewards

Silviu Pitis. NeurIPS 2023. (Arxiv)

From a set of intuitively appealing axioms, I show that Markovian aggregation of Markovian reward functions is not possible when the time preference for each objective may vary. It follows that optimal multi-objective agents must admit rewards that are non-Markovian with respect to the individual objectives. Our work offers new insights into sequential, multi-objective agency and intertemporal choice, and has practical implications for the design of AI systems deployed to serve multiple generations of principals with varying time preference.

Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

Silviu Pitis. AAAI 2019. (Paper, Slides, Poster)

Can all “rational” preference structures be represented using the standard RL model (the MDP)? This paper presents a minimal axiomatic framework for rationality in sequential decision making and shows that the implied cardinal utility function is of a more general form than the discounted additive utility function of an MDP. In particular, the developed framework allows for a state-action dependent “discount” factor that is not constrained to be less than 1 (so long as there is eventual long run discounting).

Language Modeling

Improving context-aware preference modeling for language models

Silviu Pitis, Ziang Xiao, Nicolas Le Roux, Alessandro Sordoni. NeurIPS 2024. (Arxiv)

We propose context-specific preference datasets and conduct experiments to investigate the potential of context-specific preference modeling.

Report Cards: Qualitative Evaluation of Language Models Using Natural Language Summaries

Blair Yang, Fuyang Cui, Keiran Paster, Jimmy Ba, Pashootan Vaezipoor, Silviu Pitis, Michael R Zhang. (Arxiv)

We propose to use LMs to generate Report Cards, which are fine-grained qualitative evaluations of a model’s behaviors, including its strengths and weaknesses, with respect to specific topics or datasets.

Compositional Reasoning for Generalization

MoCoDA: Model-based Counterfactual Data Augmentation

Silviu Pitis, Elliot Creager, Ajay Mandlekar, Animesh Garg. NeurIPS 2022. (Arxiv, Website)

Can RL agents generalize to new tasks w/ unseen states? We extend our local causal model framework to model-based RL and show that this is possible, both theoretically and empirically. See the twitter thread for a summary.

Counterfactual Data Augmentation using Locally Factored Dynamics

Silviu Pitis, Elliot Creager, Animesh Garg. NeurIPS 2020. Object-Oriented Learning Workshop at ICML 2020 (Outstanding Paper). (Arxiv, Talk, Code, Poster, OOL Workshop)

We propose a local causal model (LCM) framework that captures the benefits of decomposition in settings where the global causal model is densely connected. We used our framework to design a local Counterfactual Data Augmentation (CoDA) algorithm that expands available training data with counterfactual samples by stitching together locally independent subsamples from the environment. Empirically, we showed that CoDA can more than double the sample efficiency and final performance of reinforcement learning agents in locally factored environments.

Leveraging Structure in Reinforcement Learning

Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

Silviu Pitis, Harris Chan, Stephen Zhao, Bradly Stadie, Jimmy Ba. ICML 2020. Adaptive and Learning Agents Workshop at AAMAS 2020 (Best Paper). (Arxiv, Talk, Code)

What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? Our MEGA and OMEGA agents set achievable goals in sparsely explored areas of the goal space to maximize the entropy of the historical achieved goal distribution. This lets them learn to navigate mazes and manipulate blocks with a fraction of the samples used by prior approaches.

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

Silviu Pitis, Harris Chan, Kiarash Jamali, Jimmy Ba. ICLR 2020. (Arxiv, OpenReview, Talk, Code)

We propose novel neural network architectures, guaranteed to satisfy the triangle inequality, for purposes of (asymmetric) metric learning and modeling graph distances.

Teaching

I was course instructor for the first virtual iteration of Introduction to Machine Learning (CSC 311) in Fall 2020, together with Roger Grosse, Chris Maddison, and Juhan Bae.

I have advised a number of students on research. If you are an aspiring researcher and find my work interesting, please reach out.

Misc

The great replacement. Many people ask why I quit law for AI research. A better question might be why I went into law in the first place. In any case, there are a few reasons, but one is that lawyers will be largely replaced by technology in the coming years. A lot of jobs will. This is inevitable.

Blogging. I used to keep an academic ML/AI blog at r2rt.com. I will restart this sometime… hopefully soon? Before that, I used to keep an economics blog.
A random tax artifact. If you’re a hedge fund manager you may be interested in this triple tax arbitrage scheme I came up with.

Connect

When I have time, I enjoy connecting over video (or coffee if you’re in Toronto). I’m interested in discussing ideas related to:

axiomatic alignment (normative social choice)
AI safety & governance
applications & capabilities (RL x LLMs)

You can reach me at:

silviu.pitis@gmail.com

@silviupitis