Q&A 7 How does Econometrics relate to Machine Learning?

7.1 Explanation

Both econometrics and machine learning are about using data to understand or predict outcomes, but they emphasize different goals:

  • Econometrics (Economics + Statistics)
    • Focuses on causality and interpretation.
    • Example: Does education cause higher income? By how much?
    • Tools: regression, instrumental variables, time-series models.
    • Economists care about coefficients (β values), significance, and theory-based interpretation.
  • Machine Learning
    • Focuses on prediction and patterns.
    • Example: Given past data, can we predict next year’s GDP growth rate?
    • Tools: decision trees, random forests, support vector machines, neural networks.
    • ML practitioners care about predictive accuracy, not necessarily why variables are related.

7.2 Example with GDP Growth

  • Econometrics approach:
    • Regression model with GDP growth (%) as the dependent variable.
    • Independent variables: investment, inflation, population growth.
    • Question: How much does a 1% increase in investment change GDP growth?
  • Machine Learning approach:
    • Train a Random Forest model on the same inputs (investment, inflation, population growth).
    • Predict GDP growth for 2024 given 2000–2023 data.
    • Question: Can we predict next year’s GDP growth with low error?

7.3 Python Code (Econometrics vs ML)

import os
import pandas as pd
import numpy as np

# Ensure data folder exists
os.makedirs("data", exist_ok=True)

# Sample years and country
years = list(range(2000, 2024))
country = "KE"  # Kenya (example)

# Generate synthetic factors
np.random.seed(42)
gdp_growth = np.random.normal(5, 2, len(years))          # avg 5%, some noise
investment = np.random.normal(20, 3, len(years))         # % of GDP
inflation = np.random.normal(7, 2, len(years))           # % CPI
population_growth = np.random.normal(2.5, 0.5, len(years))  # % annual

# Create DataFrame
df = pd.DataFrame({
    "country": country,
    "year": years,
    "gdp_growth_rate": gdp_growth,
    "investment": investment,
    "inflation": inflation,
    "population_growth": population_growth
})

# Save to CSV
out_file = "data/gdp_growth_with_factors.csv"
df.to_csv(out_file, index=False)

print(f"Saved -> {out_file}")
print(df.head())
Saved -> data/gdp_growth_with_factors.csv
  country  year  gdp_growth_rate  investment  inflation  population_growth
0      KE  2000         5.993428   18.366852   7.687237           2.482087
1      KE  2001         4.723471   20.332768   3.473920           3.282322
2      KE  2002         6.295377   16.547019   7.648168           1.190127
3      KE  2003         8.046060   21.127094   6.229835           2.910951
4      KE  2004         4.531693   18.198084   5.646156           2.543524

7.4 Learning Outcome

By the end of this Q&A you will be able to: - Differentiate econometrics (causality, explanation) from ML (prediction, accuracy). - See how the same GDP dataset can be analyzed with both approaches. - Recognize when to use econometrics (policy questions) vs ML (forecasting).

7.5 Takeaway

  • Econometrics: “Why is GDP growth happening? How strong is the relationship?”
  • Machine Learning: “Can we predict GDP growth accurately?”
  • Both are complementary — CDI emphasizes combining them to move from insight to prediction in real-world economics data analysis.