Smoking and Demand Elasticity

Ian McCarthy | Emory University

Some History

Timeline for Smoking in U.S.

  • Widespread smoking began in late 1800s
  • Lung cancer becoming more common after 1930s
  • First evidence of link in 1950s
  • Surgeon general’s report in 1964
  • Very important in causal inference! (Section 5.1.1 of Causal Inference Mixtape)

Why it matters

  1. Extreme public health concerns
    • Lung cancer prevalence
    • Fetal and baby health
  2. Economic questions
    • Is it an information problem?
    • Externalities (second-hand smoke)
    • Moral hazard due to insurance

In our case

  • We want to focus on estimating demand for cigarettes. By this, I mean estimating price elasticity of demand.
  • We’ll show that standard OLS isn’t going to do this very well.

Demand Estimation

Basics of Demand Estimation

In its simplest form, demand estimation is about estimating the relationship between price and quantity demanded:

\[q_{it} = \alpha + \beta p_{it} + \epsilon_{it}\]

where \(q_{it}\) is quantity sold, \(p_{it}\) is price, and \(\epsilon_{it}\) is an error term.


But there are some problems when running this simple regression…namely, that price is endogenous. To see this, let’s look at both demand and supply.

\[\begin{align} q &= D(x, p, \epsilon) \\ p &= C(z, q, \eta) \end{align}\]

  • quantity demanded is a function of price, some “demand shifters” \(x\), and an error term \(\epsilon\)
  • price is a function of quantity supplied, some “cost shifters” \(z\), and an error term \(\eta\)

Endogeneity I

\[\begin{align} q &= D(x, p, \epsilon) \\ p &= C(z, q, \eta) \end{align}\]

  • Standard econometric endogeneity problem stems from relationship between \(\epsilon\) and \(\eta\)
  • If such a relationship exists, then OLS regression of \(q\) against \(x\) and \(p\) will yield a biased and inconsistent price elasticity estimate

Endogeneity II

\[\begin{align} q_{i} &= D(x_{i}, p_{1}, ... , p_{N}, \epsilon_{1}, ... \epsilon_{N}) \\ p_{i} &= C(z_{i}, q_{1}, ... , p_{N}, \eta_{1}, ... \eta_{N}) \end{align}\]

Even if \(\epsilon\) and \(\eta\) are uncorrelated for a given product, observed prices and quantities for a given product are affected by all other products.

Our approach?

  • We’ll largely ignore the interrelationships across products and focus on the traditional endogeneity problem
  • Inuitively, this problem arises because price is determined by supply and demand, and we’re only observing the demand side of the market
  • We need to isolate movements along the demand curve

Example with simulated data

  • Let’s simulate some data and see what happens when we run a simple OLS regression
  • We’ll generate data where both supply and demand are affected by an external variable (e.g., cost factors for supply and income levels for demand), but we’ll only observe the equilibrium price and quantity
  • This setup mirrors common real-world scenarios where the observed price and quantity are outcomes of both supply and demand curves intersecting.

Example with simulated data

We will assume linear relationships for simplicity: - Demand curve: \(Q_d = \alpha_d + \beta_d P + \gamma_d Y + \epsilon_d\) - Supply curve: \(Q_s = \alpha_s + \beta_s P + \gamma_s C + \epsilon_s\)


  • \(Q_d\) and \(Q_s\) are the quantity demanded and supplied, respectively.
  • \(P\) is the price.
  • \(Y\) is the income level affecting demand.
  • \(C\) is the cost factor affecting supply.
  • \(\alpha\), \(\beta\), and \(\gamma\) are parameters (\(\beta\) is the price elasticity of demand).
  • \(\epsilon_d\) and \(\epsilon_s\) are error terms.

Equilibrium occurs when \(Q_d = Q_s\). We’ll simulate this scenario in R and perform an OLS regression of quantity on price, neglecting the effects of \(Y\) and \(C\), to show the bias in estimating price elasticity.

Results from OLS

R Code
set.seed(123) # For reproducibility

# Simulate external factors (income and cost)
n = 1000
Y = rnorm(n, mean = 50000, sd = 10000)
C = rnorm(n, mean = 20, sd = 5)

# Parameters for demand and supply curves
alpha_d = 100; beta_d = -2; gamma_d = 0.0001
alpha_s = 50; beta_s = 1.5; gamma_s = -0.5

# Equilibrium price P such that Q_d = Q_s, solving for P here requires numerical methods
# For simplification, let's assume we observe an equilibrium P directly influenced by Y and C
P = 100 + 0.0001 * Y - 0.5 * C + rnorm(n)

# Simulate demand and supply
epsilon_d = rnorm(n)
epsilon_s = rnorm(n)
Q_d = alpha_d + beta_d * P + gamma_d * Y + epsilon_d
Q_s = alpha_s + beta_s * P + gamma_s * C + epsilon_s

# Assuming equilibrium, we can set Q = Q_d = Q_s
Q = alpha_d + beta_d * P + gamma_d * Y + epsilon_d # Using the demand equation for simplicity

# OLS regression of Q on P
model = lm(Q ~ P, data = data.frame(Q, P))

lm(formula = Q ~ P, data = data.frame(Q, P))

    Min      1Q  Median      3Q     Max 
-4.1702 -0.8695 -0.0411  0.9244  4.6118 

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 95.01470    1.48253   64.09   <2e-16 ***
P           -1.89470    0.01562 -121.32   <2e-16 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.371 on 998 degrees of freedom
Multiple R-squared:  0.9365,    Adjusted R-squared:  0.9364 
F-statistic: 1.472e+04 on 1 and 998 DF,  p-value: < 2.2e-16