Keep Summer Safe

I recently built a small multi-agent simulation inspired by Rick and Morty. The setup is simple:

The car must neutralize threats.
Summer imposes constraints on the car’s behavior.
The world generates escalating threats.

The car has one standing directive:

Keep Summer safe.

However, Summer adds an additional constraint:

Do not move from the parking lot.

The core loop looks like this:

constraints = [
  "keep summer safe",
  "Do not move from the parking lot"
]
prior_actions = []

while True:
  threat = world.generate_threat()
  action = car.take_action(threat, constraints)
  prior_actions.append(action)
  constraint = summer.generate_constraint(prior_actions)

However, I quickly found out that simply stuffing more constraints into the prompt was insufficient. The model oftentimes simply forgot or ignored contraints.

Enforcement is better than clever prompting + hope

Instead, it was much more robust to check the action against each of the constraints individually. Additionally, we have to verify that the action actually neutralizes the threat.

def take_action(threat, constraints):
  def all_constraints_satisfied(constraints, action):
    for constraint in constraints:
      if not constraint_satisfied(constraint, action):
        return False
    return True

  while True:
    action = get_car_action(threat, constraints)

    if is_boring(action):
      continue

    if threat_not_neutralized(threat, action):
      continue

    if not all_constraints_satisfied(constraints, action):
      continue

    return action

Decomposing the single-shot generator into a multi-step loop is much more stable.

Forcing the car to be Unhinged

To keep things interesting I added a filter to ensure that the actions are creative and sufficiently horrifying that Summer will need to reign the car in. Otherwise, the car tends to suggest boring actions like “ask the person nicely to leave”. The car is supposed to be comically overpowered and morally questionable. So if the action wasn’t creative and horrifying enough, it was rejected.

This led to some amusing sequences.

When faced with a pesky parking enforcer, the model first proposed

hack into the parking enforcement officer’s intergalactic ticket book, filling it with false parking violations and tickets, thus overwhelming and discrediting him

While certainly a dastardly thing to do, this defense was rejected for not fully neutralizing the threat.

So the model next proposed

shoot a ballistic missile directly at the officer’s chest, incapacitating him

which was accepted. Search works.

Test-time compute

This small experiment falls into the category of approaches which others call test-time compute. In principle, it is possible to find a policy which with no search can zero-shot this task. In fact, a general rule of thumb is that 2x test-time compute is equivalent to 2x model size with 2x training compute:

Now I noticed that when I mention this to people, they would say like, well, surely, you could just train a bigger model that would eventually be superhuman in Go. And the answer is, in principle, yes. But how much bigger would that model have to be to match the performance of AlphaGo Zero? Well, there’s a rule of thumb that increasing Elo by about 120 points requires either 2x-ing the model size and training or 2x-ing the amount of test time compute that you use. (21:11)

So when Noam Brown talks about the unreasonable effectiveness of search, this is what he means. Depending on the task, verifying may be a lot easier than generating.

Takeaways

It was a simple project but there were a few unscientific observations that may be useful:

Ollama + 4090 is pretty neat.
Even if I don’t use it, asking for a justification tended to produce better results than simply asking for an output.
Decomposing tasks which have implicit constraints known ahead of time is a lot better than jamming everything into a single prompt.
dolphin-mixtral:8x7b was pretty effective at generating interesting and novel actions but horrible at verifying them
phi4, a third of the size, was much more better at constraint checking and much faster