There’s already been much ink spilled over Newcomb’s paradox, so I’ll keep it short.
The two-boxers argue that because the AI has already placed the boxes, nothing can change what’s inside the boxes, so the best strategy is always to take both.
The one-boxers argue that taking both almost always results in only getting $ 1k and that taking only one box will yield $ 1m. If you want to reliably maximize the amount you collect, always one-box.
Put in a diagram:
AI guesses what Person will select
| |
|One-box |Two-box
|[$1m] [$1k] |[0] [$1k]
| |
Person chooses strategy |
|__________ |__________
| | | |
|One-box |Two-box |One-box |Two-box
| | | |
|$1m |$1m+$1k |$0 |$1k
The tricky part of this is that we’ve allowed in a prescient AI, which tangles up the actions and consequences back and forth through time, allowing the Person’s strategy decision to influence the AI’s money allocation. We’re modelling the Person as an agent with free will, whose can choose anything totally independent of the past, but then smuggling in a deterministic universe where an AI can predict the Person’s choices with high accuracy. The argument that the strategy decision can’t change what’s in the box doesn’t apply because the AI is prescient and can change its decision based on the future.
If we rearrange the order of events to remove the need for prescience, the decision becomes clearer. Let us rearrange the order of events as follows:
1. Person decides to one-box or two-box
2. AI guesses what person decided
3. AI places money in boxes based on guess
This produces an isomorphic outcome map:
Person chooses strategy |
| |
|One-box |Two-box
|[?] [?] |[?] [?]
| |
AI guesses what Person will select & allocates
|____________ |____________
| | | |
|One-box |Two-box |One-box |Two-box
|[$1m] [$1k] |[0] [$1k] |[$1m] [$1k] |[0] [$1k]
| | | |
|$1m |$0 |$1m+$1k |$1k
We can then add in the probability that the AI has guessed correctly to work out the expected values of each strategy, P_c
(correct).
Person chooses strategy |
| |
|One-box |Two-box
|[?] [?] |[?] [?]
| |
AI guesses what Person will select & allocates
|____________ |____________
|(P_c) |(1-P_c) |(1-P_c) |(P_c)
| | | |
|One-box |Two-box |One-box |Two-box
|[$1m] [$1k] |[0] [$1k] |[$1m] [$1k] |[0] [$1k]
| | | |
|$1m |$0 |$1m+$1k |$1k
We can then work out the expected value of one-boxing vs two-boxing:
One-boxing = (P_c * $1m) + ((1 - P_c) * $0)
Two-boxing = ((1 - P_c) * ($1m + $1k) + (P_c * $1k)
By setting these two strategies equal, we can then work out the critical prediction accuracy, below which it’s optimal to two-box and above which it’s optimal to one-box. We can also substitute in symbols A and B for the $ 1m and $ 1k prizes to solve given any prize amounts.
P_c*A = (1 - P_c) * (A + B) + P_c * B
P_c * A = A + B - P_c * (A + B) + P_c * B
P_c * (A + (A + B) - B) = A + B
P_c = (A + B) / 2A
For the $ 1m and $ 1k amounts, the AI need only be slightly better than 50/50 to make one-boxing the ideal strategy - see WolframAlpha Calculation, which lines up with the one-boxers’ argument that one-boxing is most likely to maximize the money received.
The key to the disentanglement is to recognize that we’re mixing models and that the prescient AI causes information to flow backwards in time, violating the assumption that the strategy chosen (i.e. one-box vs two-box) can’t change what’s inside the boxes.