Smoking and Demand Elasticity

Ian McCarthy | Emory University

Some History

Timeline for Smoking in U.S.

Widespread smoking began in late 1800s
Lung cancer becoming more common after 1930s
First evidence of link in 1950s
Surgeon general’s report in 1964
Very important in causal inference! (Section 5.1.1 of Causal Inference Mixtape)

Why it matters

Extreme public health concerns
- Lung cancer prevalence
- Fetal and baby health
Economic questions
- Is it an information problem?
- Externalities (second-hand smoke)
- Moral hazard due to insurance

In our case

We want to focus on estimating demand for cigarettes. By this, I mean estimating price elasticity of demand.

We’ll show that standard OLS isn’t going to do this very well.

Demand Estimation

Basics of Demand Estimation

In its simplest form, demand estimation is about estimating the relationship between price and quantity demanded:

\[q_{it} = \alpha + \beta p_{it} + \epsilon_{it}\]

where \(q_{it}\) is quantity sold, \(p_{it}\) is price, and \(\epsilon_{it}\) is an error term.

Problems

But there are some problems when running this simple regression…namely, that price is endogenous. To see this, let’s look at both demand and supply.

\[\begin{align} q &= D(x, p, \epsilon) \\ p &= C(z, q, \eta) \end{align}\]

quantity demanded is a function of price, some “demand shifters” \(x\), and an error term \(\epsilon\)
price is a function of quantity supplied, some “cost shifters” \(z\), and an error term \(\eta\)

Endogeneity I

\[\begin{align} q &= D(x, p, \epsilon) \\ p &= C(z, q, \eta) \end{align}\]

Standard econometric endogeneity problem stems from relationship between \(\epsilon\) and \(\eta\)
If such a relationship exists, then OLS regression of \(q\) against \(x\) and \(p\) will yield a biased and inconsistent price elasticity estimate

Endogeneity II

\[\begin{align} q_{i} &= D(x_{i}, p_{1}, ... , p_{N}, \epsilon_{1}, ... \epsilon_{N}) \\ p_{i} &= C(z_{i}, q_{1}, ... , p_{N}, \eta_{1}, ... \eta_{N}) \end{align}\]

Even if \(\epsilon\) and \(\eta\) are uncorrelated for a given product, observed prices and quantities for a given product are affected by all other products.

Our approach?

We’ll largely ignore the interrelationships across products and focus on the traditional endogeneity problem
Inuitively, this problem arises because price is determined by supply and demand, and we’re only observing the demand side of the market
We need to isolate movements along the demand curve

Example with simulated data

Let’s simulate some data and see what happens when we run a simple OLS regression
We’ll generate data where both supply and demand are affected by an external variable (e.g., cost factors for supply and income levels for demand), but we’ll only observe the equilibrium price and quantity
This setup mirrors common real-world scenarios where the observed price and quantity are outcomes of both supply and demand curves intersecting.

Example with simulated data

We will assume linear relationships for simplicity:

Demand curve: \(Q_d = \alpha_d + \beta_d P + \gamma_d Y + \epsilon_d\)
Supply curve: \(Q_s = \alpha_s + \beta_s P + \gamma_s C + \epsilon_s\)

where

\(Q_d\) and \(Q_s\) are the quantity demanded and supplied, respectively.
\(P\) is the price.
\(Y\) is the income level affecting demand.
\(C\) is the cost factor affecting supply.
\(\alpha\), \(\beta\), and \(\gamma\) are parameters (\(\beta\) is the price elasticity of demand).
\(\epsilon_d\) and \(\epsilon_s\) are error terms.

Example with simulated data

Demand curve: \(Q_d = \alpha_d + \beta_d P + \gamma_d Y + \epsilon_d\)
Supply curve: \(Q_s = \alpha_s + \beta_s P + \gamma_s C + \epsilon_s\)

Equilibrium occurs when \(Q_d = Q_s\). We’ll simulate this scenario in R and perform an OLS regression of quantity on price, neglecting the effects of \(Y\) and \(C\), to show the bias in estimating price elasticity.

Results from OLS

R Code

set.seed(123) # For reproducibility

# Simulate external factors (income and cost)
n = 1000
Y = rnorm(n, mean = 50000, sd = 10000)
C = rnorm(n, mean = 20, sd = 5)

# Parameters for demand and supply curves
alpha_d = 100
beta_d = -2
gamma_d = 0.15
alpha_s = 50
beta_s = 1.5
gamma_s = -0.5

# Equilibrium price P such that Q_d = Q_s, solving for P here requires numerical methods
# For simplification, let's assume we observe an equilibrium P directly influenced by Y and C
P = 100 + gamma_d * Y - gamma_s * C + rnorm(n)

# Simulate demand and supply
epsilon_d = rnorm(n)
epsilon_s = rnorm(n)
Q_d = alpha_d + beta_d * P + gamma_d * Y + epsilon_d
Q_s = alpha_s + beta_s * P + gamma_s * C + epsilon_s

# Assuming equilibrium, we can set Q = Q_d = Q_s
Q = alpha_d + beta_d * P + gamma_d * Y + epsilon_d # Using the demand equation for simplicity

# OLS regression of Q on P
model = lm(Q ~ P, data = data.frame(Q, P))
modelsummary(model, gof_map=NA)

	(1)
(Intercept)	-9.031
	(0.478)
P	-1.000
	(0.000)