Pages

Tuesday, August 14, 2018

In-depth look at income and wealth data (pt. 1.5 of 3): Non-traditional data and machine learning approaches

While this was originally meant to be a three-part series on income and wealth data, it would have been an oversight to not include some discussion of the non-traditional data and machine learning approaches to collecting information on poverty. These data are particularly relevant in developing countries where traditional sources of data - administrative data and survey data - are not collected as widely, regularly, or thoroughly. This can be for several reasons: nationally representative surveys are expensive and the costs of data collection too high, challenges associated with data collection in conflict-affected areas (discussed in greater detail in a previous publication I worked on), and large proportions of the population are employed in the informal economy meaning there is little by way of administrative tax records at the lower end of the income distribution.

Yet, information on poverty is still needed in these countries to inform evidence-based policymaking by governments, international organizations, and non-profits. A brief article by researcher Joshua Blumenstock published a few years ago in Science, "Fighting Poverty with Data", discusses the frontier of research in this area that aims to supplement the traditional sources of data on wealth and inequality with machine learning approaches. Blumenstock discusses, for example, the rise in use of nightlight data to track economic productivity and growth citing one paper which utilizes nightlight based measures to study the impact of sanctions on North Korea. In fact, a paper that I reviewed earlier in the year on the impact of Chinese aid projects on local corruption used nightlight data to proxy for local economic activity in areas around active and inactive Chinese aid sites.

More novel and more interestingly, the author cites research in machine learning that uses satellite imagery in conjunction with nightlight data to identify the visual features of relatively wealthier areas (which have brighter nightlight) that would allow researchers to leverage daytime satellite images to better track poverty in developing countries. There are limitations to this approach for example that nightlight is not an ideal measure of activity at the lower end of the income distribution - where all is dark - but with further research these approaches could be very useful in the developing country context.

Mobile phone data - which was discussed in part in the above article and in greater detail in this other Science piece also by Blumenstock - is also promising. Using mobile phone logs, researchers extract statistics including volume, intensity, and timing of phone calls, the structure of the individual's network of contacts, and mobility and migration information based on geospatial markers and whittle down to the statistics that can be used to predict socioeconomic status. In the case cited in this article, the researchers paired consenting individuals' mobile phone data with survey data that they collected on individual income and wealth in order to train the model. It should be noted that mobile phone data is subject to greater ethical and privacy concerns than publicly available data. While the research cited here aimed to obtain macro level statistics to inform policymaking it is clear that attempting to obtain a more granular understanding for specific demographics will be challenging. ICT access and use is far from universal and, often, those who are excluded from its access are the most vulnerable. This is similar to the challenges with using conflict data wherein the data on those who are the most vulnerable and impacted by conflict is the data that is the most challenging to collect and to collect accurately. This is not, however, meant to generalize, given that some of the poorest regions of the world have reasonably high mobile phone penetration but rather a cautionary note when assessing whether data are representative with respect to specific populations.

For example, with respect to a recent project that I've worked on, there is high mobile phone penetration in sub-Saharan Africa despite low income. Yet, while its neighbors in East Africa have experienced fast growth in mobile phone ownership and usage in the past five years, Ethiopia has fallen behind largely due to government ownership of the nation's telecom monopoly which has limited expansion and service. Further, analyzing the distributional data on mobile phone usage indicates that women are far less likely to own and use mobile phones than men - consistent with the findings in many developing countries - and that any data collected from these devices in a hypothetical scenario would only be representative of a specific demographic.

And yet, despite the challenges, non-traditional sources of data offer promise particularly in geographic areas where recent, traditional data on wealth and poverty are not available. Research in this interdisciplinary area will be interesting to watch in the near future.

No comments:

Post a Comment