How a Random Forest Spots a Pure Interaction
1. Introduction
We will build the simplest possible decision tree with two binary switches: Weekend and Sunny. The target is whether someone went to the park. In the tiny world we create, the park trip happens only when exactly one switch is on. That setting forces the tree to learn a pure interaction—no single switch is useful by itself.
2. What Exactly Is a Pure Interaction?
An interaction means the joint state of two (or more) features affects the target beyond the sum of their solo effects. A pure interaction goes further: each feature alone has zero predictive power, yet together they have substantial (often perfect) power. Think of a light wired with two switches in series—flip either one and nothing happens; flip both and the light turns on.
3. Entropy, the Impurity Meter
For a binary target the entropy of a node containing proportions pYes and pNo is:
H = - pYes · log₂(pYes) - pNo · log₂(pNo)
The lower the entropy, the purer the node. Information Gain for a split is:
Gain = Hparent - Σ (nchild / nparent) · Hchild
4. A Synthetic Population of 100 (the XOR world)
We place 25 people in each of the four logical cases. The park rule is “Weekend XOR Sunny”.
Contingency Table
SUNNY YES SUNNY NO | TOTAL
───────────────────────────────────────────┼──────
WEEKEND YES Park Yes 25 | Park No 0 Park Yes 0 | Park No 25 | 50
WEEKEND NO Park Yes 0 | Park No 25 Park Yes 25 | Park No 0 | 50
───────────────────────────────────────────┼──────
TOTALS 50 50 50 50 100
5. Root‑Node Entropy Worked Out
# Overall class counts
Yes = 50, No = 50
pYes = 50 / 100 = 0.5
pNo = 0.5
H₀ = -(0.5 · log₂ 0.5) -(0.5 · log₂ 0.5)
= -(0.5 · -1) -(0.5 · -1)
= 1 bit
6. Zero Main‑Effect Splits – Full Maths
6.1 Split on Weekend
# Left child (Weekend = Yes)
Yes = 25, No = 25 → p = 0.5/0.5 → HL = 1 bit
# Right child (Weekend = No)
Yes = 25, No = 25 → HR = 1 bit
Gain = H₀ - [(50/100)·HL + (50/100)·HR]
= 1 - [(0.5·1) + (0.5·1)]
= 1 - 1 = 0 bits (no improvement)
6.2 Split on Sunny
Exactly the same numbers → Gain = 0 bits.
Therefore each feature alone is useless: no impurity reduction, no predictive power.
7. The Pure Interaction Split – Full Maths
Suppose the tree tries Weekend first anyway (Gain = 0). Inside the branch Weekend = Yes (50 people) we test Sunny:
# Parent entropy (we already know)
Hparent = 1 bit
## Child 1 – Sunny = Yes
Yes = 25, No = 0 → pYes = 1 → H1 = 0 bits
## Child 2 – Sunny = No
Yes = 0, No = 25 → pNo = 1 → H2 = 0 bits
Gain = Hparent - [(25/50)·0 + (25/50)·0]
= 1 - 0 = 1 bit (the maximum possible!)
ASCII Tree Diagram
[Root] Weekend?
/ \
Yes (50) No (50)
/ \
Sunny?
/ \
Yes25 No25
0Y25N 25Y0N
This two‑step path reduces impurity only when both conditions are evaluated together. That is exactly the signature of a pure interaction.
8. How the Forest Learns It
- Bootstrap samples keep the XOR pattern, so many trees include the two‑step path.
- Because single features never help, the forest relies entirely on paths containing both tests.
- Averaging predictions across trees preserves the interaction signal and generalises.
9. SHAP & Permutation Proof
9.1 Shapley intuition
SHAP distributes credit by exploring all coalitions. Here, the credit goes into an interaction value because neither feature adds value alone.
9.2 Permutation importance demo
- Shuffle
Weekend→ error barely changes (main effect 0). - Shuffle
Sunny→ error barely changes. - Shuffle both together → error spikes → strong joint effect.