How to tackle simulations too big to simulate

I was asked to elaborate on how I arrived at my solution for day 8 of Advent of Code 2023, so that's this post.

Let's start with a brief philosophical note: Advent of Code problems exist to be solved. That means there definitely is a reasonable solution. Working backwards from that, any solution that is unreasonable is wrong, but there definitely is one that will work.

Computationally intensive problems are the ones that are gated not by implementation difficulty, but by runtime. You can easily recognize them by the fact that the part 2 asks you to do the same thing but with a much larger number or input.

There are roughly two major categories of computationally intensive AoC problems: optimization searches and simulations.

The nice thing about simulation problems is that your goal is always the same: Not simulating it. Why? Because we know that simulating it takes an unreasonable amount of time, so it cannot be the solution, as per the wanting-to-be-solved-ness of AoC problems.

A practical example

My favourite go-to is 2021-06: Lanternfish, because of its simple nature. You have a population of lanternfish, each one represented by a number. The number says how many simulation steps until it spawns another, new, lanternfish. The initial population is given as a list of numbers, and every new fish is added as an 8. Lanternfish that spawned offspring roll back over to a 6. The example given on the page is this:

Initial state: 3,4,3,1,2
After  1 day:  2,3,2,0,1
After  2 days: 1,2,1,6,0,8
After  3 days: 0,1,0,5,6,7,8
After  4 days: 6,0,6,4,5,6,7,8,8
After  5 days: 5,6,5,3,4,5,6,7,7,8
After  6 days: 4,5,4,2,3,4,5,6,6,7
After  7 days: 3,4,3,1,2,3,4,5,5,6
After  8 days: 2,3,2,0,1,2,3,4,4,5
After  9 days: 1,2,1,6,0,1,2,3,3,4,8
After 10 days: 0,1,0,5,6,0,1,2,2,3,7,8
After 11 days: 6,0,6,4,5,6,0,1,1,2,6,7,8,8,8
After 12 days: 5,6,5,3,4,5,6,0,0,1,5,6,7,7,7,8,8
After 13 days: 4,5,4,2,3,4,5,6,6,0,4,5,6,6,6,7,7,8,8
After 14 days: 3,4,3,1,2,3,4,5,5,6,3,4,5,5,5,6,6,7,7,8
After 15 days: 2,3,2,0,1,2,3,4,4,5,2,3,4,4,4,5,5,6,6,7
After 16 days: 1,2,1,6,0,1,2,3,3,4,1,2,3,3,3,4,4,5,5,6,8
After 17 days: 0,1,0,5,6,0,1,2,2,3,0,1,2,2,2,3,3,4,4,5,7,8
After 18 days: 6,0,6,4,5,6,0,1,1,2,6,0,1,1,1,2,2,3,3,4,6,7,8,8,8,8

The true input list is about 300 fish long, and part 1 asks you to simulate 80 steps. The result is 6 digits long; reachable with a naive simulation approach:

fn simulation(mut fish: Vec<i32>, steps: usize) -> usize {
    for _ in 0..steps {
        let mut new_fish = 0;
        for n in 0..fish.len() {
            fish[n] -= 1;
            if fish[n] < 0 {
                new_fish += 1;
                fish[n] = 6;
            }
        }
        fish.extend(repeat(8).take(new_fish));
    }

    fish.len()
}

It finishes in about 70-80ms, not efficient, but a reasonable time for this take.

Part 2 then asks you to simulate 256 steps.

That may not seem like that many more; 3 digits of simulation steps is still fine, right?

Pasted image 20231212190907.png|Windows task manager readout saying my solution program is using up 564 megabytes of RAM

The memory usage keeps going up as I write this. It's at a gigabyte of RAM now, and still going. There's a good chance it will just... crash, soon. I've also been waiting for over a minute, so we know we're probably on the wrong path.

Back to the (it's 2gb now) drawing board!

Well, if you think about it, the lanternfish growth is exponential. Each additional step is so many more than the previous one. So simulating it is (it's 3gb now) unreasonable. We know there's a trick we have to find that lets us not simulate.

There's not really a systematic way to do this that guarantees you definitely arrive at the right idea, I don't think. However, we have some tools at our disposal that give us better odds of coming up with something.

Well, let's ask ourselves: (I haven't been checking, but it's 5.3gb now) Which part of the simulation is necessary? Our naive simulation implementation based on the example tracks more information than we need. For example, a list has an order. Does the order of our lanternfish matter? (it's 8gb now) Well, once you have this thought, it's pretty clear that: No, it doesn't. All we NEED to know about a fish is its number.

Even without gathering any additional information, once we abolish order, you realize you can simply keep track of the amount of fish with (it's 10gb now. I'm killing it) each number. For example, consider a naive implementation like this:

fn simplified_simulation(initial_fish: Vec<i32>, steps: usize) -> usize {
	// Turn list of fish into a map where the key is time until next spawn,
	// and the value, how many of that fish there are.
    let mut fish = HashMap::new();
    for f in initial_fish {
        *fish.entry(f).or_default() += 1;
    }

    for _ in 0..steps {
        let mut new_fish = HashMap::new();

        // Each spawning fish adds 1 time 8 fish.
        new_fish.insert(8, fish.get(&0).copied().unwrap_or(0));

        // All other fish have their time decreased by one.
        for (&key, &value) in &fish {
            let new_key = if key == 0 { 6 } else { key - 1 };
            *new_fish.entry(new_key).or_default() += value;
        }

        fish = new_fish;
    }

    fish.into_values().sum()
}

Even without messy Computer Magic[1], this runs in like 2 milliseconds for me. All we had to do was not actually simulate each fish.

Ok, but what about the ghosts

With that in mind, what can we do when it's less easy to figure out the idea, like in the main subject of this post? As I mentioned in the original post, I kinda just saw what to do. That's because I did a similar problem in the past.

But let's say I didn't guess right at the solution. What would I have done? Well, the next item on the list I presented is gathering information.

We know that we're interested in the time it takes to reach a node ending in Z. So... let's just print out every time a ghost does that.

let mut ghosts: Vec<_> = map.keys().filter(|k| k.ends_with('A')).collect();
let mut steps = std::iter::repeat(steps.bytes()).flatten();

for time in 1..100000 {
	let step = steps.next().unwrap();

	for (ghost_number, ghost) in ghosts.iter_mut().enumerate() {
		*ghost = &map[*ghost][usize::from(step == b'R')];
		if ghost.ends_with('Z') {
			println!("Ghost #{ghost_number} reached {} at {time}", *ghost);
		}
	}
}

Something like that. The specific code doesn't really matter; just to illustrate that you can vandalize your simulation code to extract information.

I ran it for a couple hundred steps, had no output, so I ran it for a hundred thousand steps, and got this:

Ghost #3 reached XQZ at 13201
Ghost #2 reached CHZ at 16271
Ghost #4 reached ZZZ at 18113
Ghost #5 reached KRZ at 18727
Ghost #0 reached LSZ at 20569
Ghost #1 reached KMZ at 22411
Ghost #3 reached XQZ at 26402
Ghost #2 reached CHZ at 32542
Ghost #4 reached ZZZ at 36226
Ghost #5 reached KRZ at 37454
Ghost #3 reached XQZ at 39603
Ghost #0 reached LSZ at 41138
Ghost #1 reached KMZ at 44822
Ghost #2 reached CHZ at 48813
Ghost #3 reached XQZ at 52804
Ghost #4 reached ZZZ at 54339
Ghost #5 reached KRZ at 56181
Ghost #0 reached LSZ at 61707
Ghost #2 reached CHZ at 65084
Ghost #3 reached XQZ at 66005
Ghost #1 reached KMZ at 67233
Ghost #4 reached ZZZ at 72452
Ghost #5 reached KRZ at 74908
Ghost #3 reached XQZ at 79206
Ghost #2 reached CHZ at 81355
Ghost #0 reached LSZ at 82276
Ghost #1 reached KMZ at 89644
Ghost #4 reached ZZZ at 90565
Ghost #3 reached XQZ at 92407
Ghost #5 reached KRZ at 93635
Ghost #2 reached CHZ at 97626

Let's look at JUST ghost number 0:

Ghost #0 reached LSZ at 20569
Ghost #0 reached LSZ at 41138
Ghost #0 reached LSZ at 61707
Ghost #0 reached LSZ at 82276

Two things spring out at me immediately. Ghost number 0 always arrives at the same Z node. And the numbers are suspicious too, aren't they? Out of curiosity, what's 20569 * 2...

It's 41138.

Isn't that interesting. Once you see a pattern like this emerge, you know you're onto something. Invoking, once again, the wanting-to-be-solved-ness of AoC problems: this cannot be a coincidence.

This is, in fact, how I originally figured out the solution to the similar problem I linked earlier. You see a pattern like this, you know you're on the right path. It's just a matter of figuring out how to combine those numbers to get your result.

This might be trivial or very hard, depending on how much maths you know, but even if you don't, no need to despair: AoC usually provides smaller examples where you know the answer; so you can tinker around with the numbers, maybe figure out how they correlate with the known answer. Maybe do some searching on the internet.

In the case of making periodic events line up, I happened to already know that it's the least common multiple of their periods; but I think it's possible to figure that out tinkering with numbers.

Conclusion

For simulation-type problems, you want to skip as much of the actual simulating as you can. For that, figure out which parts you don't need to simulate, or look for patterns you can exploit to extrapolate the full answer from smaller simulations.

I hope this helped!


  1. Please look at my messy Computer Magic version here and praise me for it. Thanks! ↩︎