How a Random Forest Spots a <em>Pure</em> Interaction

How a Random Forest Spots a Pure Interaction

1. Introduction

We will build the simplest possible decision tree with two binary switches: Weekend and Sunny. The target is whether someone went to the park. In the tiny world we create, the park trip happens only when exactly one switch is on. That setting forces the tree to learn a pure interaction—no single switch is useful by itself.

2. What Exactly Is a Pure Interaction?

An interaction means the joint state of two (or more) features affects the target beyond the sum of their solo effects. A pure interaction goes further: each feature alone has zero predictive power, yet together they have substantial (often perfect) power. Think of a light wired with two switches in series—flip either one and nothing happens; flip both and the light turns on.

3. Entropy, the Impurity Meter

For a binary target the entropy of a node containing proportions pYes and pNo is:

H = - pYes · log₂(pYes) - pNo · log₂(pNo)

The lower the entropy, the purer the node. Information Gain for a split is:

Gain = Hparent - Σ (nchild / nparent) · Hchild

4. A Synthetic Population of 100 (the XOR world)

We place 25 people in each of the four logical cases. The park rule is “Weekend XOR Sunny”.

Contingency Table

                       SUNNY YES              SUNNY NO          | TOTAL
                   ───────────────────────────────────────────┼──────
WEEKEND YES   Park Yes 25 | Park No 0    Park Yes 0 | Park No 25 | 50
WEEKEND NO    Park Yes 0  | Park No 25   Park Yes 25 | Park No 0 | 50
                   ───────────────────────────────────────────┼──────
TOTALS                50              50             50            50   100

5. Root‑Node Entropy Worked Out

# Overall class counts
Yes = 50,  No = 50

pYes = 50 / 100 = 0.5
pNo  = 0.5

H₀ = -(0.5 · log₂ 0.5) -(0.5 · log₂ 0.5)
   = -(0.5 · -1) -(0.5 · -1)
   = 1 bit
    

6. Zero Main‑Effect Splits – Full Maths

6.1 Split on Weekend

# Left child (Weekend = Yes)
Yes = 25, No = 25 → p = 0.5/0.5 → HL = 1 bit

# Right child (Weekend = No)
Yes = 25, No = 25 → HR = 1 bit

Gain = H₀ - [(50/100)·HL + (50/100)·HR]
     = 1 - [(0.5·1) + (0.5·1)]
     = 1 - 1 = 0 bits (no improvement)

6.2 Split on Sunny

Exactly the same numbers → Gain = 0 bits.

Therefore each feature alone is useless: no impurity reduction, no predictive power.

7. The Pure Interaction Split – Full Maths

Suppose the tree tries Weekend first anyway (Gain = 0). Inside the branch Weekend = Yes (50 people) we test Sunny:

# Parent entropy (we already know)
Hparent = 1 bit

## Child 1 – Sunny = Yes
Yes = 25, No = 0 → pYes = 1 → H1 = 0 bits

## Child 2 – Sunny = No
Yes = 0, No = 25 → pNo = 1 → H2 = 0 bits

Gain = Hparent - [(25/50)·0 + (25/50)·0]
     = 1 - 0 = 1 bit (the maximum possible!)

ASCII Tree Diagram


                [Root] Weekend?
               /                 \
     Yes (50)                       No (50)
      /                                 \
  Sunny?                             
   /     \
Yes25     No25
0Y25N     25Y0N
    

This two‑step path reduces impurity only when both conditions are evaluated together. That is exactly the signature of a pure interaction.

8. How the Forest Learns It

  • Bootstrap samples keep the XOR pattern, so many trees include the two‑step path.
  • Because single features never help, the forest relies entirely on paths containing both tests.
  • Averaging predictions across trees preserves the interaction signal and generalises.

9. SHAP & Permutation Proof

9.1 Shapley intuition

SHAP distributes credit by exploring all coalitions. Here, the credit goes into an interaction value because neither feature adds value alone.

9.2 Permutation importance demo

  1. Shuffle Weekend → error barely changes (main effect 0).
  2. Shuffle Sunny → error barely changes.
  3. Shuffle both together → error spikes → strong joint effect.

The green‑on‑black style is automatic; add your own sections and link them above.