Date: Thursday, 10 July 2025, at 2:00 pm
Venue: Online via Teams
Speaker: Laura Battaglia (University of Oxford)
Title: Inference for Regression with Variables Generated by AI or Machine Learning?
Abstract:
It has become common practice for researchers to use AI-powered information retrieval algorithms or other machine learning methods to estimate variables of economic interest, then use these estimates as covariates in a regression model. We show both theoretically and empirically that naively treating AI-and ML-generated variables as “data” leads to biased estimates and invalid inference. We propose two methods to correct bias and perform valid inference:(i) an explicit bias correction with bias-corrected confidence intervals, and (ii) joint maximum likelihood estimation of the regression model and the variables of interest. Through several applications, we demonstrate that the common approach generates substantial bias, while both corrections perform well.
Seminar organizers: Caterina Giannetti
Download area
Information and contacts