To read more on the methodology of our site's election forecasts, See Here

To see the details on our latest forecast, See Here

To see how well our 2008 election forecasts turned out, See Here. Java Applet here.

Friday, June 19, 2009

Iranian Statistics Round-up - Updated 11:35AM June 21st

In the last couple of days, a dizzying number of different statistical arguments have been put forward regarding the legitimacy of the 2009 Iranian election.

In order to keep track of them all, I've made a summary of all the major quantitative arguments I've heard so far and my thoughts on them. Let me know by email or in the comments if I have missed any.


1) Professor Mebane's work

i) Second digit Benford Law test:

Previous research by the professor has found that testing the frequency of the second digit of vote returns is a more reliable measure of electoral fraud then the more traditional first digit test, which can often produce false positives.

He did not find any statistically significant discrepancies using a second digit Benford law test on city-level election returns.

However, his Second-Digit Benford Analysis was developed for precinct level data which has not been available. At such a high level of aggregation, sheer scale can overwhelm foul play. Because of this, conformity to Benford's law is not suggestive of authenticity in this case.

Update[4:35 pm]: Professor Mebane has obtained ballot-level data and indicated that serious discrepencies were found with Karroubi and Rezaee returns, will update when more information is available

Update[9:23 pm]: A report incorporating Ballot-Box returns can be seen here. To quote:

The initially released polling station data show evidence of signi cant distortions
in the vote counts especially for Karroubi and Rezaei. No signi cant distortions are apparentfor Mousavi's vote counts. There is very marginal evidence of distortions in Ahmadinejad's vote counts. A key to interpreting these results is understanding why the vote counts for Karroubi and Rezaei are typically so small. 

Is it (a) inherently low levels of support, (b) voters strategically abandoning the candidates, or (c) fraudulent counts? If there is goodreason to believe either (a) or (b), then (c) is less likely.


ii) Election models:

Professor Mebane found that when conditioning 2009 data on first and second round 2005 data using simple models, he found behavior that violated "natural election processes". He came to the conclusion that it constituted "moderately strong evidence for fraud".

I don't feel qualified to comment on his assertion, but he is considered an expert in this field, and so I attach weight to the fact that he is convinced. Still, I would like to see his methods applied to other elections as a "control"

2) The work of Dr.Roukema
 

Update 4:50PM 5/21 :Updated to reflect comments from the author

His arguments are the following in order of "strength":

i)That reformist candidate Karroubi's vote returns have a large number of excess 7's assuming Benford's law or empircal variations on Benford's law

ii) That Ahmedinejad had an excess number of 2's and a deficit of 1's, according to Benford's law but not according to some Benford variants

iii) That none of the candidates have log-normal distributed returns except for Mousavi.



To me, i is interesting and merits further study, though even the author wouldn't characterize it as conclusive.

This is particularly interesting because the largest discrepancy between Iranian opinion polling and the actual results was the massive collapse in support of Karroubi and Rezaee. Every opinion poll showed their combined support in the double digits, when they ended up obtaining less than 3% of the vote between them.

The major criticism of ii is that there has been research showing that candidate vote returns often do not always conform to Benford's law.


Histogram showing the distribution of vote totals in voting areas is approximately log-normal, pulled from original paper.

As for iii: Statistically, I don't see why total returns on a district level being log-normally distributed would imply that every candidate would have log-normally distributed votes. On the contrary, Log-Normal distributions are not stable under addition, and so at least one of the candidate's returns would need to deviate from a log-normal distribution in order to maintain the observed total vote's log-normality. (At least if you ignore the fact that different candidate vote returns are not independent).

Nate Silver and Professor Andrew Gelman's commentary on all three points are invaluable. See here, here, and here.

Update 11:35 AM June 21st : 
Dr.Roukema has updated his paper, which can be seen here. He finds that evidence that irregularities might have been concentrated in high population areas. 


"Among the six biggest cities, why should those three cities that voted more for Ahmadinejad be exactly the same ones where there are 70xx votes for Karroubi?" - Dr.Roukema


 To quote the abstract:
Three of the six most populous voting areas have
vote totals for K[Karroubi] that start with 7. All three of these have
greater proportions of votes for A[Ahmedinejad] than the other three voting
areas. Interpreting this as an overestimate of the true
vote assumed to be 50% to match other data, while retaining
constant total vote numbers and increasing votes for the
other three candidates in proportion to the average voting
percentages, would imply that the difference between A’s 
and M’s[Mousavi] vote totals would drop by about one million votes.




3) Samuel Wang's look at Tehran Opinion polls

Professor Samuel Wang asserts that while the public opinion polls conducted for Iran at large were all over the place and of suspect quality (Polls ranged from Ahmedinejad +16 to Mousavi +32), polls for Tehran specifically might be more reliable.

He then looks at Tehran-specific polls and finds that the discrepancy between the Ahmedinejad's poll-forecasted winning-margin in Tehran and his official margin there was large enough to be statistically significant. He concludes "For now, my interpretation is that the official returns in Tehran are unbelievable."

Some thoughts:

i) I'm not sure what the polling area the polls referred to when they polled "Tehran", the translation wasn't clear. If they were polling "Tehran, Tehran", then the results would have been within the margin of error.

ii) Focusing on margins obscures the fact that in terms of actual candidate vote-share, the polls seem to have been massively off. Mousavi+Ahmedinejad was estimated to be around 60%, but ended up at 97%. This could have been due to an abnormal number of undecided voters or some other factor, but it should be explored.

4) Nate Silver's Analysis

Nate has had some interesting qualitative analysis and statistical commentary on other research, but his best piece so far has been this , where he shows that Ahmedinejad did not do very well in rural areas in the first round results of 2005, while doing much better in 2009. He posits that such a radical change in the rural-urban divide in so little time is suspicious.


Before: First Round Iranian Elections in 2005


After: First Round Iranian Elections in 2009

I have to check if this result holds on when I replace the Ahmedinejad variable with a "conservative" variable showing the combined support of the 4 conservative candidates in 2005. But so far, this has been one of the most convincing points in favor of fraud I've heard so far.

5) Miscellaneous criticisms


i) Mousavi lost his home region:

I don't find this suspicious. There were some polls conducted by Western Organizations that showed that Ahmedinejad had much higher support than Mousavi among Azeri's, Mousavi's ethnicity. This could be related to unconfirmed reports that Ahmedinejad was a popular administrator in Azerbaijan for 8earlier in his career.

ii) Ahmedinejad won Tehran, which should have gone for Mousavi:

Some polls showed that Ahmedinejad winning Tehran by around the margins that he did. Not only that, but he was formerly elected Mayor of Tehran. I don't find this necessarily suspicious by itself either.

iii) Counting was done implausibly quickly:

Paper ballots can be counted very quickly. It doesn't take very long to call a 63% lead.

iv) There are numerical discrepancies in the voting data.

To summarize the arguments I've previously made here:

There are about 100,000 missing votes, because Valid votes+Invalid votes is 100,000 less than "Total votes".

Also, the percentage of spoiled ballots in a district is highly correlated with the district's reformist candidate vote share, while being negatively correlated with Ahmedinejad vote-share.


Percent of ballots declared invalid vs Candidate vote-share

One simple explanation, would be that new voters are more likely to make mistakes and produce spoiled ballots. But this would imply that the surge in turnout mainly went to reformist candidates. If this was the case, I don't see how Ahmedinejad could have won.

v) The idea that Ahmedinejad's share of the vote stayed too constant while results were being announced to be real. Shown via following popular graph:



This has been thoroughly debunked by multiple sources, see here and here.

1 comments:

Pythagoras said...

Check out my blog. You missed the one published in the Post.
http://the-bean-stalk.blogspot.com/2009/06/irans-election-statistical-bibliography.html