I’ve always been fascinated by casinos. Understanding things like roulette and blackjack was a real motivation for me when I was studying things like probability theory. Even though I now think it’s almost trivial, it was eye-opening to me when I first understood the “magic” of roulette by realizing that the odds of winning essentially don’t weigh up against the multiplier that you get when you win. From that moment on, I understood what “the house always wins” meant in a very concrete way.
Needless to say, when I discovered that, for a while I thought that people playing roulette must be among the dumbest people ever. Only later did I realize that (i) it really wasn’t that impressive of me to figure out a relatively simple piece of math and (ii) people play stuff like that for other reasons than picking the optimal bet.1
Anyway, I think I’ve been to a casino like once or twice after understanding probability (I don’t think I’ve ever gone there before) but I don’t recall ever having played. I guess I just didn’t think it’d be nice to face an unfair (i.e. negative expected value) bet. One possible exception to that might be Blackjack, which, depending on your strategy, the payoffs and the “entry price” might actually be a fair bet or even have a positive value in expectation.
In this post, I want to to mathematically analyze Blackjack using Dynamic Programming (DP). I’ll model the game as a Markov Decision Process (MDP), assuming that card probabilities remain constant (i.e. sampling with replacement).2 I’ll also assume, to keep things simple, that the dealer follows a fixed strategy, described below.
1. Game Setup and Notation
Let the set of card values be \(C = \{2, 3, \dots, 9, 10, 11\}\), where 10 represents \(\{10, J, Q, K\}\) and 11 represents an Ace.
The probability of drawing card \(c\), denoted \(p(c)\), is: \[
p(c) = \begin{cases}
1/13 & \text{if } c \in \{2, \dots, 9, 11\} \\
4/13 & \text{if } c = 10
\end{cases}
\]
The State Space
A state \(S\) is defined by the tuple \((x, u, d)\), where:
\(x\): The player’s current sum (\(4 \leq x \leq 21\)). (You get dealt two cards).
\(u\): Dummy variable for a usable ace (1 if the player has an Ace counted as 11, 0 otherwise).
\(d\): The dealer’s visible up-card (\(2 \leq d \leq 11\)).
Definitions: Soft and Hard Hands
A hand is Soft if it contains an Ace that is currently counted as 11 without the total exceeding 21.
A hand is Hard if either:
It contains no Aces.
It contains Aces, but they are all forced to count as 1 to prevent the total from exceeding 21.
2. The Dealer’s Terminal Distribution
I said before that the dealer plays a fixed strategy: “Hit until 17.” We calculate the dealer’s outcome independently before solving for the player’s optimal policy.
Let \(D(h, u)\) be the probability that the dealer ends up with a final total \(h\), given they currently hold a sum \(s\) and usable ace status \(u\). However, for the player, we only observe the up-card \(d\). We need to compute \(P(d_{final} = k \mid d)\), where \(k \in \{17, 18, 19, 20, 21, \text{Bust}\}\).
This is a sub-problem soluble via recursion or a system of linear equations. Let \(\phi(s, u)\) be the probability distribution vector of the dealer’s final result starting from state \((s, u)\) (i.e. sum, ace status).
The Dealer’s transition rules are as follows:
Stand Condition: If \(s \geq 17\) (assuming Dealer stands on Soft 17), the dealer stops. \[\phi(s, u) = \mathbf{e}_s \quad (\text{vector with 1 at index } s)\]
Hit Condition: If \(s < 17\), the dealer draws a card \(c\). \[\phi(s, u) = \sum_{c \in C} p(c) \cdot \phi(s', u')\]
Where the transition logic \((s, u) \xrightarrow{c} (s', u')\) is:
\(s_{temp} = s + c\)
If \(u=1\) and \(s_{temp} > 21\): \(s' = s_{temp} - 10\), \(u' = 0\) (Soft ace becomes hard).
Else: \(s' = s_{temp}\).
\(u'\) becomes 1 if \(u=0\) and \(c=11\) (drawing an Ace), otherwise remains \(u\).
We calculate \(\phi(d, \text{has\_ace}(d))\) for all up-cards \(d\) to get the probability the dealer ends with total \(k\), denoted as \(\mathbb{P}(D_k | d)\).
Implementation in Python
Show the code.
import numpy as npfrom functools import lru_cache# Outcome Indices: [17, 18, 19, 20, 21, Bust]# 0 -> 17# 1 -> 18# ...# 4 -> 21# 5 -> Bust# Card probabilities (Infinite Deck)# 2-9: 1/13, 10: 4/13, Ace: 1/13CARD_PROBS = {c: 1/13for c inrange(2, 12)} # 2 to 11CARD_PROBS[10] =4/13@lru_cache(maxsize=None)def get_dealer_dist(s, u):""" Returns probability vector [P(17), ..., P(21), P(Bust)] s: current sum u: boolean (1 if usable ace, 0 otherwise) """# 1. Base Case: Bustif s >21:# Vector: [0, 0, 0, 0, 0, 1] res = np.zeros(6) res[5] =1.0return res# 2. Base Case: Stand (Dealer stands on >= 17)if s >=17:# Map 17->0, 18->1, ... 21->4 res = np.zeros(6) res[s -17] =1.0return res# 3. Recursive Step (Hit)# The dealer must hit. We sum the probabilities of the next states. final_dist = np.zeros(6)for card, prob in CARD_PROBS.items(): s_next = s + card u_next = u# Logic: Transition Rules# If we drew an Ace, we now have a usable ace (if we didn't before)if card ==11and u ==0: u_next =1# Logic: Soft Ace becomes Hard# If sum > 21 and we have a usable ace, count ace as 1 (subtract 10)if s_next >21and u_next ==1: s_next -=10 u_next =0 final_dist += prob * get_dealer_dist(s_next, u_next)return final_dist# Execution and Formatting# 1. Save object for later use# Possible Dealer Up-Cards: 2 through 11 (Ace)cards =list(range(2, 12))prob_vectors = [] for d in cards:# Initial state: # Sum is the card value. # Usable ace is True only if the up-card is 11 (Ace). has_ace =1if d ==11else0 probs = get_dealer_dist(d, has_ace) prob_vectors.append(probs)probabilities = np.array(prob_vectors)# 2. Print table for displaydef print_dealer_table():print(f"{'Up-Card':<8} | {'17':<7}{'18':<7}{'19':<7}{'20':<7}{'21':<7}{'Bust':<7}")print("-"*60)# Possible Dealer Up-Cards: 2 through 11 (Ace) cards =list(range(2, 12))for d in cards:# Initial state: # Sum is the card value. # Usable ace is True only if the up-card is 11 (Ace). has_ace =1if d ==11else0 probs = get_dealer_dist(d, has_ace)# Formatting for display card_name ="Ace"if d ==11elsestr(d) row_str =f"{card_name:<8} | "+" ".join([f"{p:.4f} "for p in probs])print(row_str)print_dealer_table()
Where the reward function \(R(x, k)\) is: \[
R(x, k) = \begin{cases}
1 & \text{if } k = \text{Bust} \\
1 & \text{if } x > k \\
-1 & \text{if } x < k \\
0 & \text{if } x = k
\end{cases}
\]
2. Value of Hitting (\(V_{\text{hit}}\))
If the player hits, they draw a card \(c\). We average the value of the next state over all possible cards.
Where the next state logic handles busting: \[
V_{\text{next}}(x', u', d) = \begin{cases}
-1 & \text{if } x' > 21 \text{ (Bust)} \\
V(x', u', d) & \text{otherwise}
\end{cases}
\]
The update logic for \(x'\) and \(u'\) is the same as the dealer’s logic defined in Section 2.
4. State-Transition Matrix formulation
Since the game is acyclic (score always increases unless a soft hand becomes hard, in which case the “hard” state is closer to busting than the “soft” state), we can represent the hitting phase as a linear system or solve via backward induction.
Let \(\mathbf{V}\) be a vector representing the value of every possible player state \((x, u)\) for a fixed dealer card \(d\), and order the states such that higher totals appear later.
The Bellman equation in vector form is: \[ \mathbf{V} = \max \left( \mathbf{R}_{\text{stand}}, \mathbf{T} \mathbf{V} + \mathbf{B} \right) \]
Where:
\(\mathbf{R}_{\text{stand}}\) is the vector of expected returns if we stand immediately in each state.
\(\mathbf{T}\) is the State-Transition Matrix for the “Hit” action. \(T_{ij}\) is the probability of moving from state \(i\) to state \(j\) upon drawing a card.
\(\mathbf{B}\) is the “Bust” vector, containing \(-1 \times (\text{Probability of busting from state } i)\).
Deriving Matrix \(\mathbf{T}\) is done as follows.
For a state \(i = (x, u)\) and target state \(j = (y, v)\):
Compute Dealer Distribution: Calculate \(\mathbb{P}(D_k | d)\) for all up-cards \(d\) using the recursive relations for the dealer’s fixed rules.
Compute Stand Values: For every player state \((x, u)\) and dealer card \(d\), compute \(V_{\text{stand}}(x, d)\).
Backward Induction: Iterate backwards from \(x=21\) down to 4.
For \(x=21\), \(V(21, \dots) = V_{\text{stand}}(21, \dots)\) (Hitting is guaranteed bust or suboptimal).
For \(x < 21\): \[ V(x, u, d) = \max \left( V_{\text{stand}}, \sum p(c)V(\text{next state}) \right) \]
Policy Extraction: The optimal policy \(\pi^*(x, u, d)\) is simply the argument of the maximum operator (Hit or Stand).
This DP formulation provides the exact expected value of every hand.
Python Implementation
Show the code.
import numpy as npimport pandas as pd# 1. Constants and Dealer Distribution# Rows: Dealer Up-card 2-9, 10, A. Cols: 17, 18, 19, 20, 21, BustP_DEALER = probabilities# Card probabilities: 2-9=1/13, 10=4/13, A=1/13# We treat index 0 as card value 2, index 9 as card value 11 (Ace)CARD_PROBS = np.array([1/13]*8+ [4/13] + [1/13])CARD_VALS = np.arange(2, 12) # State Space Config# Sums 0 to 31 (to handle bust calculation easily).# 0: Hard, 1: Soft (Usable Ace)# Shape: (Sum, Soft/Hard, DealerCard)V = np.zeros((32, 2, 10)) # 2. Compute R_Stand (Vector with V_Stands)# R_stand[player_sum, dealer_card_idx]# We calculate this for all player sums 4-21.# Logic: +1 if P_sum > D_sum, -1 if P_sum < D_sum, 0 if Tie, +1 if D busts.R_stand = np.zeros((32, 10)) # Independent of Soft/Hard, only Sum mattersoutcomes = np.array([17, 18, 19, 20, 21]) # Dealer non-bust totalsfor s inrange(4, 22):# Vectorized calculation against all dealer cards at once# 1. Win by Dealer Bust ev =1.0* P_DEALER[:, 5] # 2. Compare sums against dealer non-bust outcomes# Broadcasting: (1, 5) compared to scalar s win = (s > outcomes).astype(float) lose =-1.0* (s < outcomes).astype(float)# tie is 0, implicit# Dot product of outcomes prob and win/loss vectors# P_DEALER[:, :5] is (10, 5). (win+lose) is (5,) ev += P_DEALER[:, :5] @ (win + lose) R_stand[s] = ev# Initialize Value function with Stand values (and -1 for bust states > 21)V[:22, :, :] = R_stand[:22, np.newaxis, :] # Broadcast to Hard/SoftV[22:, :, :] =-1.0# BUST states always return -1# 3. Value Iteration (Solving the Bellman Equation)# Iterate until the Value matrix stops changing. # Since graph is acyclic-ish, this converges instantly (2-3 passes).for _ inrange(10): V_prev = V.copy()# Calculate Expected Value of Hitting for all states 4..21# We do this for Hard (u=0) and Soft (u=1)for s inrange(4, 22):for u in [0, 1]: # 0=Hard, 1=Soft# Expected value of next state (averaged over cards) ev_hit_vector = np.zeros(10) # For all dealer cardsfor c_idx, card inenumerate(CARD_VALS): prob = CARD_PROBS[c_idx]# Next State Logic s_next = s + card u_next = u# Transition: Draw Ace (if hard) -> becomes Softif u ==0and card ==11: u_next =1# Transition: Soft Busts -> becomes Hard (minus 10)if s_next >21and u_next ==1: s_next -=10 u_next =0# Accumulate value (if s_next > 21 and u=0, V is -1)# s_next is capped at 31 to prevent index error, though >21 is -1 s_idx =min(s_next, 31) ev_hit_vector += prob * V_prev[s_idx, u_next, :]# Bellman Equation: V = max(R_stand, E[V_next]) V[s, u, :] = np.maximum(R_stand[s], ev_hit_vector)# 4. Extract Optimal Policy# Compare V_stand vs V_hit one last time to define action# Action 1 = Hit, 0 = Standpolicy_hard = np.zeros((18, 10), dtype=int) # 4 to 21policy_soft = np.zeros((10, 10), dtype=int) # 12 to 21# Hard Hands (4 to 21)for s inrange(4, 22):# Recalculate hit value (cleanest way to compare) ev_hit = np.zeros(10)for c_idx, card inenumerate(CARD_VALS):# ... (same transition logic as above) ... s_n, u_n = s + card, 0if card ==11: u_n =1if s_n >21and u_n ==1: s_n -=10; u_n =0 ev_hit += CARD_PROBS[c_idx] * V[min(s_n, 31), u_n, :]# If Hit > Stand, Policy = 1 (Hit) policy_hard[s-4] = (ev_hit > R_stand[s] +1e-9).astype(int)# Soft Hands (12 to 21)for s inrange(12, 22): ev_hit = np.zeros(10)for c_idx, card inenumerate(CARD_VALS):# ... (transition logic from soft) ... s_n, u_n = s + card, 1if s_n >21: s_n -=10; u_n =0 ev_hit += CARD_PROBS[c_idx] * V[min(s_n, 31), u_n, :] policy_soft[s-12] = (ev_hit > R_stand[s] +1e-9).astype(int)# 5. Display resultsdef print_policy(name, data, start_idx):print(f"\n--- {name} Strategy (H=Hit, .=Stand) ---")print(" | D: 2 3 4 5 6 7 8 9 T A")print("------+-----------------------")for i, row inenumerate(data): player_sum = start_idx + i row_str =" ".join(["H"if x else"."for x in row])print(f"{player_sum:>5} | {row_str}")print_policy("Hard Hand", policy_hard, 4)print_policy("Soft Hand", policy_soft, 12)
--- Hard Hand Strategy (H=Hit, .=Stand) ---
| D: 2 3 4 5 6 7 8 9 T A
------+-----------------------
4 | H H H H H H H H H H
5 | H H H H H H H H H H
6 | H H H H H H H H H H
7 | H H H H H H H H H H
8 | H H H H H H H H H H
9 | H H H H H H H H H H
10 | H H H H H H H H H H
11 | H H H H H H H H H H
12 | H H . . . H H H H H
13 | . . . . . H H H H H
14 | . . . . . H H H H H
15 | . . . . . H H H H H
16 | . . . . . H H H H H
17 | . . . . . . . . . .
18 | . . . . . . . . . .
19 | . . . . . . . . . .
20 | . . . . . . . . . .
21 | . . . . . . . . . .
--- Soft Hand Strategy (H=Hit, .=Stand) ---
| D: 2 3 4 5 6 7 8 9 T A
------+-----------------------
12 | H H H H H H H H H H
13 | H H H H H H H H H H
14 | H H H H H H H H H H
15 | H H H H H H H H H H
16 | H H H H H H H H H H
17 | H H H H H H H H H H
18 | . . . . . . . H H H
19 | . . . . . . . . . .
20 | . . . . . . . . . .
21 | . . . . . . . . . .
6. Rule of Thumb
The rule of thumb that you should use when playing Blackjack in practice is this:
If you don’t have an ace:
And the dealer’s card is < 7:
You should hit up until 11, and stand at 12.
And the dealer’s card is \(\geq\) 7:
You should hit up until 16, and stand from 17.
If you have an ace:
Hit up until 17.
Footnotes
Even though to be fair, I still think a lot of people honestly think that they’re facing a fair bet or something of the like.↩︎