Q&A 7 How does Econometrics relate to Machine Learning?
7.1 Explanation
Both econometrics and machine learning are about using data to understand or predict outcomes, but they emphasize different goals:
- Econometrics (Economics + Statistics)
- Focuses on causality and interpretation.
- Example: Does education cause higher income? By how much?
- Tools: regression, instrumental variables, time-series models.
- Economists care about coefficients (β values), significance, and theory-based interpretation.
- Focuses on causality and interpretation.
- Machine Learning
- Focuses on prediction and patterns.
- Example: Given past data, can we predict next year’s GDP growth rate?
- Tools: decision trees, random forests, support vector machines, neural networks.
- ML practitioners care about predictive accuracy, not necessarily why variables are related.
- Focuses on prediction and patterns.
7.2 Example with GDP Growth
- Econometrics approach:
- Regression model with GDP growth (%) as the dependent variable.
- Independent variables: investment, inflation, population growth.
- Question: How much does a 1% increase in investment change GDP growth?
- Regression model with GDP growth (%) as the dependent variable.
- Machine Learning approach:
- Train a Random Forest model on the same inputs (investment, inflation, population growth).
- Predict GDP growth for 2024 given 2000–2023 data.
- Question: Can we predict next year’s GDP growth with low error?
- Train a Random Forest model on the same inputs (investment, inflation, population growth).
7.3 Python Code (Econometrics vs ML)
import os
import pandas as pd
import numpy as np
# Ensure data folder exists
os.makedirs("data", exist_ok=True)
# Sample years and country
years = list(range(2000, 2024))
country = "KE" # Kenya (example)
# Generate synthetic factors
np.random.seed(42)
gdp_growth = np.random.normal(5, 2, len(years)) # avg 5%, some noise
investment = np.random.normal(20, 3, len(years)) # % of GDP
inflation = np.random.normal(7, 2, len(years)) # % CPI
population_growth = np.random.normal(2.5, 0.5, len(years)) # % annual
# Create DataFrame
df = pd.DataFrame({
"country": country,
"year": years,
"gdp_growth_rate": gdp_growth,
"investment": investment,
"inflation": inflation,
"population_growth": population_growth
})
# Save to CSV
out_file = "data/gdp_growth_with_factors.csv"
df.to_csv(out_file, index=False)
print(f"Saved -> {out_file}")
print(df.head())Saved -> data/gdp_growth_with_factors.csv
country year gdp_growth_rate investment inflation population_growth
0 KE 2000 5.993428 18.366852 7.687237 2.482087
1 KE 2001 4.723471 20.332768 3.473920 3.282322
2 KE 2002 6.295377 16.547019 7.648168 1.190127
3 KE 2003 8.046060 21.127094 6.229835 2.910951
4 KE 2004 4.531693 18.198084 5.646156 2.543524