Sunday, May 22, 2011

Visualizing Inequality

Stemming from a comment on my last post, I've put together some preliminary data on inequality.  The first two figures concern interstate inequality.  The charts are based on the IMF's World Economic Outlook dataset.  I've charted four different measures of data variation:  standard deviation, variance, skew, and range.  I have divided some of the parameters by factors of ten so that all can be charted on the same axis.  The intent is to show the trend in the data.  All measures of inequality are on the increase except for skew, which is relatively static, but positive.  This means that the tail of the distribution is on the right.  That is there is a larger group of low-income countries and a smaller group of higher income countries.  The first figure shows the measures for GDP at purchasing power parity (PPP) for all 185 states in the dataset.  The second figure shows the measures for per capita GDP at PPP.  The statistics are not very fancy.  If anyone has any comments for better measures, I'm all ears.

Causality is a different question, but I would argue that, for many reasons, countries benefit from global competition and trade differently, leading to increases in interstate variation.  Within some countries, economic development lowers inequality, while in others it sharply increases it.
Measures of variation of GDP at PPP
Measures of variation of per capita GDP at PPP

Next are Gini coefficients for various groups.  The data here comes from the UNI-WIDER Income Inequality Database.  The Gini coefficient is a measure of income inequality where 100 is perfectly unequal (one person gets all the income) and 1 is perfectly equal.  The fidelity of this data is much more suspect and far more difficult for researchers to gather, so it is to be taken with a grain of salt.  The groupings I used were key European countries and the U.S. in the first figure; Japan, Malaysia, and the Asian Tigers in the second; key countries in the Americas in the third; and late developers in Asia in the fourth.  These charts are busier, but a few takeaways are immediately apparent.  The U.S. is diverging from Europe, with a much higher level if inequality that approaches Latin American levels.  Japan, South Korea, and Taiwan have converged toward European levels of inequality, while Singapore and Hong Kong are diverging toward higher inequality than their Asian cohort.  The Latin American countries are converging in the 50-60 range, which is the highest of the groups I have plotted, and the U.S. is rapidly approaching this range.  The developing Asian states have lower inequality than Latin America, while China has the most rapidly growing inequality in this group.  India's inequality is relatively low among this cohort.


  1. PART I

    First, by way of preemptive strike, statistics were not nor are "my thing." That said, I have the ability to interpret them with some, if minimal facility. A second preemptive strike, just to clarify and ensure: it certainly isn't my intent to challenge you per se in any way, except on a “secular” and polite and courteous level. I just enjoy seeing the ideas you are "playing" with and engaging and entering into dialogue with you about them.

    Preliminaries dispensed with, here are a few thoughts that come to mind. First, out of curiosity, what statistics program are you using? I was trained (which is probably too strong a word and ought to be in quotation marks) in SPSS, which is derided by the "pros" but is similar to Excel. My quantitatively adept friends use R, which has a steep learning curve but provides a more than commensurate payoff, so far as I understand. It's not my business to tell you how to perform your project; I'm simply trying to make you aware of tools that may be of assistance. And, quite frankly and to be fair, if you are using Excel, these figures are clean and understandable. Also, have you considered box plots? You could, possibly, show visually the data and story you are trying to convey more clearly via them, although there I am not sure – I’m just raising the idea as a suggestion.



  2. I had posted a "PART II" - perhaps it got categorized as spam? (Sadly, I didn't save it. The net-net of it was, No, I can think of any better measure than the Gini coefficient.)


  3. ADTS - This is the part II that I got as an email. Don't know why it disappeared on here. Quick answers to your comments: My stats training in my masters was very, very cursory and all I have at hand to use is excel. I'll either need to self-teach, take a course later, or if I go for a PhD when I retire from the Marine Corps I'll get it then. Not sure. I'm going to work on presenting the data in a few more ways, as well.
    BREAK- Your original text:

    Second, it seems to me your argument is, in part, a temporal one (of course). Increases in globalization over time lead to increases in equality over time lead to increases in security problems over time. (Although now that I think about it, my “of course” may be unwarranted.) You are trying to demonstrate the second element of the causal chain with these graphs. First, the notable aspect of the graphs, to me, is the relatively lack of skewedness – indeed, a decline in skewedness – over time. This, so far as I can reason, suggests the distribution continues to resemble, somewhat, the classic bell-shaped curve, even if the “tails” are longer. Can one describe statistically (or visually) the shape of the bell curve at various points in time in terms of various points of the distribution (“fat” tails, etc.)? Bryan Jones and Frank Baumgartner’s “The Politics of Attention” does a nice job of describing a curve with thin shoulders and thick tails respectively. (The book is about public policy, so it’s probably not very relevant for you, but I reference it solely as a methodological example.) Although I can quite articulate how or way, that fact seems meaningful.

    Second, I assume variance refers to r-squared, and that the scaling on the Y-axis refers to a 0-to-1 scale. I don’t know the significance, but the near-asymptotic rises in variance (Figure 1 and to a lesser extent, Figure 2) are highly provocative.

    Third, you call for better measures. On your intervening variable – inequality – I am, alas, incapable of providing a voice to your “all ears” solicitation. The Gini coefficient seems fine a measure of inequality. I’d simply be interested in seeing the same statistics you provide of inequality within countries that you provide for GDP at PPP and GDP at PPP/per capita.

    Fourth, this is perhaps a lame comment, but perhaps you could give other measures of central tendency – i.e., mode and median? They’d be simple to provide, but informative, I would think.

    Fifth, in a similar vein, perhaps you could include scatter plots and best-fit lines (Ordinary Least Squares regression, I think) that show one time period compared to another? Presumably the scatterplot in 1980 will be more tightly clustered around the best-fit line and a later one the reverse.

    Sixth, a piddling thought: why 1970 or 1980? I imagine the question will be responded to with respect to simple necessity for a cutpoint, or the availability of data, but I’ll make the inquiry.

    Again, though, I’m hard-pressed to find any flaw or critique with respect to your inquiry – the Gini coefficient is satisfactory.