by Giacomo Caterini - Working Papers N.2018/09

Statistics on the births, deaths and survival rates of firms are crucial pieces of information, as they enter as an input in the computation of GDP, the identification of each sector’s contribution to the economy, and the assessment of gross job creation and destruction rates. Official statistics on firm demography are made available only several months after data collection and storage, however. Furthermore, unprocessed and untimely administrative data can lead to a misrepresentation of the life-cycle stage of a firm. In this paper we implement an automated version of Eurostat’s algorithm aimed at distinguishing true startup endeavors from the resurrection of pre-existing but apparently defunct firms. The potential gains from combining machine learning, natural language processing and econometric tools for preprocessing and analyzing granular data are exposed, and a machine learning method predicting reactivations of deceptively dead firms is proposed.

Keywords: Business Demography; Classification; Text Mining.

JEL classification: C01, C52, C53, C55, C80, G33, L11, L25, L26, M13, R11.