Tuesday, July 3, 2018

In-depth look at income and wealth data (pt. 1 of 3): Background

For some time now I have been interested in writing an in-depth post on income and wealth data in order to discuss how the study of inequality - in conjunction with the data and methods that enable this study - has progressed over time. While this was initially intended to be a single post, it quickly became evident that there was too much to discuss within too short a space. In this first post of a three-part series, then, I focus on providing the background for a more granular discussion of wealth and income in the next two parts.

Given the topic at hand, it is noteworthy that several articles on inequality and tax and redistribution policy were published in a recent special issue of the Journal of Public Economics honoring the late Tony Atkinson. For an introduction to that series of papers see here. My previous post on individual and household level inequality is based on a paper within this special issue. Additionally, a recent issue of the Quarterly Journal of Economics features an article that combines national accounts data with micro data to produce estimates of inequality in the U.S. that are consistent at the macro level.

In light of expanding research on inequality, its growing presence in policy debates in developed countries, and the evolution of both data and methods that enable its rigorous study, it is useful to take stock of the existing data sets and methods used by researchers to answer some of the most pressing questions in public economics today: those that deal with the distribution of wealth and income in our societies and the reasons for widening or stagnant inequality levels. We can also assess what types of questions we are now able to answer and how our answers to these and other - yet unasked - questions can become more accurate through improved data collection and methods and how future data collection can fill existing gaps in our knowledge.

To begin, the World Inequality Database - a database of global wealth and income inequality data co-founded by Tony Atkinson - provides a concise description of data and research in this field over the past twenty years. Two important trends:
  1. Most studies on inequality have until very recently focused on income rather than wealth. The key reason is the greater availability of micro data to study income, which is taxed and therefore observable in administrative data, as opposed to wealth, which in most developed countries is not taxed apart from an estate tax upon death. A secondary reason is that it has not been made evident until recently - likely for similar data reasons - that wealth concentration plays a large role in the inequality we see within developed countries. Piketty (2014)'s Capital in the Twenty-First Century was not the first but perhaps the most prominent description of the growing role of capital in widening divisions between haves and have-nots.
  2. Current efforts are aimed at producing distributed national accounts that combine administrative micro data with national accounts macro data - ledgers of assets and liabilities at the national level - in order to reconcile inequality estimates that are created based on micro data with the national accounting. This publication from the founders of the WID discusses the motivation and methodology for the creation of these "distributed national accounts." It notes the historical background, "[by] combining the macro and micro dimensions of economic measurement, we are of course following a very long tradition. In particular, it is worth recalling that Kuznets was both of the founders of the U.S. national accounts and the author of the first national income series and also the first scholar to combine national income series and income tax data in order to estimate the evolution of the share of total income going to top fractiles in the U.S. over the 1913-1948 period (see Kuznets, 1953)." The article cited above from the QJE, Piketty, Saez, and Zucman (2018), presents "distributed national accounts" for the U.S., which they note is distinct from government statistical agencies' work in this area.
Discussion of the main data types and their roles in inequality studies

Administrative micro data

To preface a discussion on administrative tax data for wealth and income studies, I provide context for use of this data for social sciences research more broadly. Administrative data are collected for the purposes of registration, transaction and record keeping, and are often linked to public service delivery. They are typically collected by public sector agencies and can be used in administration systems in education, health, and taxation, among other departments of the public sector. It should be noted that these data are "found" data and are not collected for the purposes of research as survey data are. The social sciences, and economics in particular, have shifted to using administrative data over survey data sources in recent years for several reasons.

Specifically as noted in this white paper to the National Science Foundation: "Administrative data are highly preferable to survey data along three key dimensions. First, since full population files are generally available, administrative records offer much larger sample sizes... Second, administrative files have an inherent longitudinal structure that enables researchers to follow individuals over time and address many critical policy questions, such as the long term effects of job loss (von Wachter, Song, and Manchester, 2009) or the degree of earnings mobility over the life cycle (Kopczuk, Saez, and Song, 2010). Third administrative data provide much higher quality information than is typically available for survey sources, which suffer from high and rising rates of non-response, attrition, and under-reporting."

Access to this data is not without its challenges in many developed countries. Nordic countries have been leaders in enabling researchers to access de-identified administrative or "register" data but other countries, such as the U.S., have been relatively slow to follow. Given the central role that administrative data has come to play in social sciences and economics research in particular (see the two charts on the number of publications in leading economics journals that employed administrative data in this presentation from researcher Raj Chetty, who also co-authored the white paper cited above), it is clear that access to these data has important implications that are outlined in an article published in the Economist last month on the topic.

Administrative tax data are widely used in income and wealth inequality studies. For example, wealth inequality is largely studied through either estate tax records - in order to create wealth distributions of wealth at death and to extrapolate from those records the distribution of wealth among the living using the mortality multiplier method - or through taxable capital income (it should be noted that only one-third of total capital income is reported on tax returns which is why it is challenging to estimate wealth based on this quantity). Similarly, income inequality is studied through income tax records. Given the socioeconomic and demographic data contained in these records we are able to answer (or attempt to answer) a wide range of social science research questions based on micro data. Yet, the missing piece is information on movements in the economy at large over time (e.g. increase in fraction of retired individuals or declines in household size) which could have implications for inequality.

As noted by Piketty, Saez, and Zucman (2018), studies that use micro data exclusively are unable to answer questions such as: (1) what fraction of economic growth accrues to different parts of the income distribution, (2) what fraction of the increase in income inequality is due to changes in share of labor vs. capital in national income as opposed to changes in the distribution within labor or capital earnings, (3) how does government redistribution impact inequality (i.e. we are only able to observe pretax income using micro data series which does not allow us to observe the changes in the income distribution between pre- and posttax). To answer these questions, they argue, merging micro data with national accounts data at the macro level is valuable.

National accounts macro data

On the macro side side, national accounts data aggregate output, expenditure, and income activities of each sector of the economy. While income and consumption measures are important for evaluating standards of living they offer only a static picture of well-being. Specifically, income and consumption reflect current well-being: how much a household or an economy is producing and consuming at present, but they do not provide much insight into a household or economy's long-term or future well-being (beyond making assumptions that current well-being and consumption are highly correlated over time). This is where national accounts data can be useful: data on a household or economy's ownership of marketable assets and contraction of debts can provide insight into long-term or future well-being though it may be cross-sectional rather than longitudinal.

For a valuable introduction to balance sheets and the national accounts data see here for a discussion from the French National Institute of Statistics and Economic Studies (INSEE). It should be noted that the definitions of "assets" and whether or not they provide "economic advantages" refer specifically to those items that have market values. This would exclude, as stated by INSEE, "items that one might expect to see in the accounts (human capital, natural heritage, natural State property, household durables, pension entitlements linked to the allocation system, etc.)" They note as a rule of thumb that only items that are featured in the capital and financial accounts are included as assets in order to maintain internal consistency. The capital account and financial account link the opening and closing balance sheets to one another: they specify what happened to the accumulation of capital based on capital consumption, assets sold and acquired, discoveries and inventions, and nominal holding gains as a result of price fluctuations.

These data, and specifically the national income measures in these data, may be relied upon to fill the gaps in our knowledge from the tax data. Specifically, there are gaps between the reported income and the national income that are not captured in micro studies: imputed rents of homeowners and taxes on top of unreported and untaxed labor income in the form of tax-exempt fringe benefit. Piketty, Saez, and Zucman (2018) estimate that the fraction of national income reported on tax returns in the U.S. has declined from 70 percent in the late 1970s to roughly 60 percent today which indicates that micro data alone may underestimate the level and growth of income in this country and perhaps more so for certain parts of the income distribution than others depending on what exactly is being excluded from the tax data that is present in the national income.

For a more in-depth description of the methods and the process by which these two data are being combined, I would look to the article. The authors effectively illustrate both the motivation and the methods for incorporating national income macro data into inequality studies. In the next part of this three-part series, I will discuss the data and research on wealth inequality specifically to provide greater detail on wealth estimates using estate tax data compared to those using capital income.

  1. Piketty, T., Saez, E., Zucman, G. (2018). Distributional National Accounts: Methods and Estimates for the United States. Quarterly Journal of Economics.
  2. Kleven, H., Luttmer, E. (2018). A Special Issue of the Journal of Public Economics: Honoring the Work of Sir Anthony B. Atkinson (1944-2017). Journal of Public Economics. 
  3. Blundell, R., Joyce, R., Keiller, A.N., Ziliak, J.P. (2017). Income inequality and the labour market in Britain and the US. Journal of Public Economics. 


  1. I just found this blog, because of a Menzie Chinn post discussing female economics bloggers and the supposed underrepresentation thereof. What I have read I like your writing. Do you have many research papers or a "Google Scholar" page where I can access your papers?? Thanks ahead of time.

  2. Hi Ted, thanks for your comment. I'm happy to hear from people who are reading my blog. I'm not an academic researcher and early in my career so I don't have a publication list of that type (but hopefully will one day). Thanks for reading and if you have any feedback or thoughts on past posts/ideas for future posts I always appreciate it.