Research Questions | Silviu Pitis

A selection of questions I find interesting.

What objectives should a generalist agent pursue when the humans it serves are uncertain, inconsistent, and disagree with each other?
Often alignment work treats the empirical human as the target, but empirical humans are deeply flawed. Axiomatic decision theory and social choice can help us reason about “ideal” humans, but they remain largely absent from the standard pipeline.

What does it mean to aggregate principals who disagree about facts, not just values?
Social choice mostly assumes agreement on the state of the world; subjective expected utility mostly assumes a single decision-maker. How should disagreement about facts propagate into the aggregate objective?

Relatedly, what causes the preferences people actually express, and how does that structure constrain generalization across people?
See my Failure Modes of Learning Reward Models and Context-Aware Preference Modeling (§2.1) papers.

How should rewards and time preferences be structured for agents that operate over long horizons?
How much should an agent respect precedent, and when should past commitments give way to the preferences of future principals?
See my Consistent Aggregation paper (§5.2).

When is natural language enough to specify a goal, and when does it stop being enough?
Natural language is underspecified by construction, and many disagreements about LM behavior are really disagreements about context that nobody wrote down. Which goals can be pinned down by language given rich enough context, and which need a formal object underneath?
See my Canonical AI paper.

How do you evaluate an agent whose space of possible behaviors is essentially unbounded?
Benchmarks miss the long tail; adversarial testing finds failures but not their structure. How do you combine the two into evaluation rigorous enough to drive deployment decisions?

And more recently:

As AI shifts value creation from labor to capital, how do you rebuild taxation and redistribution around capital, and what fills the role that employment plays in leading a purposeful life?
See my Who Owns the AI Economy? blog post.

Interesting Papers

A selection of papers I like (~chronological).

Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparisons of Utility (Harsanyi, 1955). VNM + Pareto implies social utility can be represented as a weighted sum of individual utilities.
Collective Choice and Social Welfare (Sen, 1970). The foundations of social choice theory, including some nice discussions about the value of axiomatic work.
Dynamic Consistency and Non-Expected Utility Models of Choice Under Uncertainty (Machina, 1989). Shows the tension between non-expected utility, dynamic consistency, and consequentialism.
Algorithms for Inverse Reinforcement Learning (Ng & Russell, 2000). The canonical “learn the reward from behavior” formulation.
Probability Theory: The Logic of Science (Jaynes, 2003). Nice discussion that derives (Bayesian) probability as a system of extended logic.
Thinking, Fast and Slow (Kahneman, 2011). An accessible synthesis of behavioral economics and Kahneman & Tversky’s work.
Horde (Sutton et al., 2011) and Universal Value Function Approximators (Schaul et al., 2015). On representing multiple goals, subgoals, and more general measurements using “general” value functions.
Learning the Preferences of Ignorant, Inconsistent Agents (Evans et al., 2016) and Occam’s Razor Is Insufficient to Infer the Preferences of Irrational Agents (Armstrong & Mindermann, 2018). Preference inference requires assumptions about how beliefs and planning jointly produce behavior.
Reward-Rational (Implicit) Choice (Jeon, Milli & Dragan, 2020). A nice framework that unifies reward inference from demonstrations, comparisons, corrections, and other feedback modalities.
On the Expressivity of Markov Reward (Abel et al., 2021) and On the Limitations of Markovian Rewards (Skalse & Abate, 2024). There exist tasks that no scalar Markov reward can capture, including multi-objective and risk-sensitive tasks; these results sharpen the case that the standard MDP formulation is too narrow for general-purpose agents.
Constitutional AI (Bai et al., 2022). Alignment via written principles.
Beyond Preferences in AI Alignment (Tan Zhi-Xuan et al., 2024). The case that the preference-maximization framing is itself a constraint on alignment.
AI Can Help Humans Find Common Ground in Democratic Deliberation (Tessler et al., 2024). The “Habermas Machine”: AI-facilitated deliberation that generates common-ground statements among diverse participants.