Even if \(\epsilon\) and \(\eta\) are uncorrelated for a given product, observed prices and quantities for a given product are affected by all other products.
Our approach?
We’ll largely ignore the interrelationships across products and focus on the traditional endogeneity problem
Inuitively, this problem arises because price is determined by supply and demand, and we’re only observing the demand side of the market
We need to isolate movements along the demand curve
Example with simulated data
Let’s simulate some data and see what happens when we run a simple OLS regression
We’ll generate data where both supply and demand are affected by an external variable (e.g., cost factors for supply and income levels for demand), but we’ll only observe the equilibrium price and quantity
This setup mirrors common real-world scenarios where the observed price and quantity are outcomes of both supply and demand curves intersecting.
Example with simulated data
We will assume linear relationships for simplicity: - Demand curve: \(Q_d = \alpha_d + \beta_d P + \gamma_d Y + \epsilon_d\) - Supply curve: \(Q_s = \alpha_s + \beta_s P + \gamma_s C + \epsilon_s\)
Where
\(Q_d\) and \(Q_s\) are the quantity demanded and supplied, respectively.
\(P\) is the price.
\(Y\) is the income level affecting demand.
\(C\) is the cost factor affecting supply.
\(\alpha\), \(\beta\), and \(\gamma\) are parameters (\(\beta\) is the price elasticity of demand).
\(\epsilon_d\) and \(\epsilon_s\) are error terms.
Equilibrium occurs when \(Q_d = Q_s\). We’ll simulate this scenario in R and perform an OLS regression of quantity on price, neglecting the effects of \(Y\) and \(C\), to show the bias in estimating price elasticity.
Results from OLS
R Code
set.seed(123) # For reproducibility# Simulate external factors (income and cost)n =1000Y =rnorm(n, mean =50000, sd =10000)C =rnorm(n, mean =20, sd =5)# Parameters for demand and supply curvesalpha_d =100; beta_d =-2; gamma_d =0.0001alpha_s =50; beta_s =1.5; gamma_s =-0.5# Equilibrium price P such that Q_d = Q_s, solving for P here requires numerical methods# For simplification, let's assume we observe an equilibrium P directly influenced by Y and CP =100+0.0001* Y -0.5* C +rnorm(n)# Simulate demand and supplyepsilon_d =rnorm(n)epsilon_s =rnorm(n)Q_d = alpha_d + beta_d * P + gamma_d * Y + epsilon_dQ_s = alpha_s + beta_s * P + gamma_s * C + epsilon_s# Assuming equilibrium, we can set Q = Q_d = Q_sQ = alpha_d + beta_d * P + gamma_d * Y + epsilon_d # Using the demand equation for simplicity# OLS regression of Q on Pmodel =lm(Q ~ P, data =data.frame(Q, P))summary(model)
Call:
lm(formula = Q ~ P, data = data.frame(Q, P))
Residuals:
Min 1Q Median 3Q Max
-4.1702 -0.8695 -0.0411 0.9244 4.6118
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 95.01470 1.48253 64.09 <2e-16 ***
P -1.89470 0.01562 -121.32 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.371 on 998 degrees of freedom
Multiple R-squared: 0.9365, Adjusted R-squared: 0.9364
F-statistic: 1.472e+04 on 1 and 998 DF, p-value: < 2.2e-16