People are going to get a heck of a replay shock of the 2000 election if the appearance of impropriety isn't ended in these election processes. If you go check out this blog site it has links to various sources on the exposing of corruptable practices in the voting process. If you think voting is something everyone should do, then understand what voting really is in this age.
Date: 2004-02-12 12:18
Subject: New Hampshire Democratic Primary: The State of Things
Security: Public

A week and a half ago, I made an entry in this blog (two entries down if you want to read it) examining the New Hampshire Democratic Primary Vote in terms of how the votes were tabulated - by Diebold computers, by ES&S computers, or by hand. I had previously attempted to interest the Dean campaign in doing this study, since they presumably had greater resources and expertise at their disposal than I. When I got no response from them, I did a quick analysis myself to see if there was something there. I loaded the results into a database and grouped them by counting technology. I found that Kerry’s margin over Dean was approximately ten times greater for Diebold-counted than for hand-counted votes. The results also showed some difference in margin between votes counted by Diebold and those counted by ES&S (about two to one), though I didn’t attach significance to this. For simplicity I looked only at the top five candidates (the New Hampshire ballot had 22 Democrats) and calculated the percentages from that base. I found that Kerry won by 14.7% with Diebold counting, 7.7% with ES&S counting, and 1.4% with hand counts. I posted these results to start a discussion and suggested a recount or at least further investigation could be in order. I made no accusations. I deliberately did not promulgate the material widely, pending further evaluation. For example, I did not trade links with other blogs or try to get the material into many prominent forums. I did, however, make an entry into the comments portion of Dean’s blog to get the attention of the campaign. I didn’t know what Dean’s deadline for a challenge was, but it surely wasn’t long. Also, Kerry was riding the New Hampshire momentum to further victories. Time was of the essence, but I also wanted to be cautious, and it was tricky to try to balance these two demands.

I seem to have succeeded in attracting the attention of the right people. Posters to the thread (assuming all self-identifications are correct) included Anthony Stevens, Assistant Secretary of State of New Hampshire, Andy Stephenson, Democratic candidate for Secretary of State in Washington, and Professor Jonathan Wand of Stanford University, an expert in the statistical analysis of elections.

Wand’s contributions were especially useful. He undertook two analyses of the election using Multinomial Outlier Analysis, a state-of-the-art technique for statistical analysis of this sort. The first looked at the results corrected by location. One argument that posters, including Mr. Stevens, brought up in the discussion is that the more urban part of New Hampshire is proximate to Massachusetts, while the more rural part is proximate to Vermont. It was assumed that more urban areas are more likely to use machine-counted votes, which is correct. Wand’s first paper corrected for this, and found that it accounted for the discrepancy between the votes tabulated by the two different kinds of machine. Although he did not call it out explicitly in his conclusions, it also showed in many counties an apparent correlation between votes for Kerry and votes counted by (any) computerized system, even correcting for location. However, Wand’s first analysis, like my own, was something of a rush job.

One of Wand’s assumptions is that vote tampering that spans machine types is unlikely. It is not clear that this is correct. According to Lynn Landes (, a single company, LHS Associates, does all election-specific programming of vote-tabulation machines in New Hampshire and some other states. She cites as sources Assistant Secretary Stevens and John Silvestro, CEO of LHS. If this is correct, then concerns about the integrity of computerized voting systems probably should be addressed to secondary programming concerns such as LHS, as well as the machine manufacturers. I have emailed the New Hampshire Secretary’s office seeking confirmation of this.

Professor Wand then submitted a second, more thorough, analysis, that looked at multiple demographic factors and at the previous Democratic primary (Gore vs. Bradley). He also used this analysis as an opportunity to introduce further refinement of Multinomial Outlier Analysis itself, of which he is one of the originators. This analysis shows that demographically similar towns have similar margins between Dean and Kerry regardless of voting technique used. In order to make valid comparisons, Wand had to eliminate both the large cities and very small towns from consideration. The former use only computerized voting, and the latter only hand counts, so no direct comparisons within these groups are possible. This is a methodological concern I had brought up earlier. However, I don’t think Wand’s analysis is flawed on this basis. That would only be possible if there were a strong preference in the cities for Dean over Kerry (corrected for location and relative to the state average, not necessarily an absolute preference), and there is no evidence of this that I know of.
Therefore, I accept Wand’s conclusion: the preference for Kerry over Dean is independent of voting technology used. I think this has been a fruitful discussion, and I hope election results this season will continue to be subject to this kind of scrutiny.

There has been some concern that bringing this issue up could cause the circulation of a rumor that will damage Kerry unfairly (actually, even if there had been vote-tampering, I do not consider Kerry by any means the only suspect). This is a valid concern and is why I was relatively circumspect in promoting this. However, I am unambiguously the source of this story, so anyone who accuses Kerry unjustly can be referred here.

By the way, the results of Wand’s first analysis are posted here:

The second analysis is available as a (draft) paper, which you can find here:

Source code (written in R) and data are also available further down the page.

1 comment | post a comment


Date: 2004-02-05 11:56
Subject: Methodology and Code of New Hampshire Analysis
Security: Public

OK, here's where I outline exactly how I did the analysis of the New
Hampshire Democratic primary including code. If you don't know what I'm
talking about, see the previous entry in this blog.

The short version is that I
downloaded the HTML tables of precinct results from the Secretary
of State website and loaded them into Access using its HTML import
facility (yes, yes, I know, but Access is what I had at hand, and I was
in a hurry). I then entered the voting machine data. Finally, I derived
answers using straightforward SQL code (Access has an SQL interface,
and that's pretty much what I used). SQL has built-in functions to
derive aggregates (such as totals) over groups defined by a common
value (like vote counting technique).

I got the voting total per precinct from this URL (click on the county name at the bottom to get the totals for precincts in that county).

I used the following page to determine which precincts were using which counting techniques.

Here are the gory details:

First of all HTML tables are basically creatures of layout, while SQL tables have a logical structure. The HTML tables on the SOS site are broken horizontally for readability, which results in Bateman and Hamm being in the same column (because they are vertically alligned), same for Moseley-Braun and Kerry, etc. (Look at the HTML source to see what I mean). That won't do, so I inserted end and begin table markup to divide these into separate tables. I also deleted the totals, as they would distort my own. Then I imported them into Access. The table structure I used was as follows:

Table votetally
ID autonumber,
municipality text,
VotingTechUsed text,
Bateman number,
Moseley-Braun number,
..etc, for the rest of the candidates.

The ID field is called a "primary key" and is the standard database device to uniquely identify rows in a table. Autonumber means the database generates these values automatically.

This works, but the result is poorly normalized (technical term, look up "database" "normal form" if you're curious). You get multiple rows for each municipality, each having the votes for some of the candidates and zeroes for the others. To clean this up, I coalesced the votes into another table. I also took this opportunity to reduce the field I was considering to the top 5. It wasn't just Kucinich and Sharpton. The New Hampshire ballot had 22 Democrats on it, and I wanted a readable result. So I create a new table, substantially the same, but reduced to the top five:

Table Big5ivecoalesced
ID autonumber,
municipality text,
VotingTechUsed text,
Clarkvotes number,
Deanvotes number,
Edwardsvotes number,
Kerryvotes number,
Liebermanvotes number,
PopThreshold number);

PopThreshold I use later when I examine the vote in terms of town population. I filled this table with a concise version of the data from the first as follows:

Insert into Big5ivecoalesced(municipality, Clarkvotes, Deanvotes, Edwardsvotes, Kerryvotes, Liebermanvotes)
Select municipality, sum(Clark), sum(Dean), sum(Edwards), sum(Kerry), sum(Lieberman)
From votetally
Group by municipality;

This doesn't really sum the votes. It just adds the real votes to the superfluous zeroes generated by the dummy rows. It does, however, give me the real vote totals per city of these candidates. I should note here that some cities have more than one "precinct". I treated these as though they were separate municipalities, which is also how they were listed by the SOS.

Next I put in the VotingTechUsed values. You could save typing by defining a default value for this column of 'hand' in the table design window. I put these in by going into the table in the datasheet view.

Now the fun.

Here's how to calculate the percentages:

SELECT [VotingTechUsed], sum([Kerryvotes]) AS Kerry, ((sum([Kerryvotes])/(sum([Kerryvotes])+sum([Deanvotes])+sum([Edwardsvotes])+sum([Clarkvotes])+sum([Liebermanvotes]))) * 100) AS Kperc,
sum([Deanvotes]) AS Dean, ((sum([Deanvotesvotes])/(sum([Kerryvotes])+sum([Deanvotes])+sum([Edwardsvotes)+sum([Clarkvotes])+sum([Liebermanvotes]))) * 100) AS Dperc,
sum([Edwardsvotes]) AS Edwards, ((sum([Edwardsvotes])/(sum([Kerryvotes])+sum([Deanvotes])+sum([Edwardsvotes])+sum([Clarkvotes])+sum([Liebermanvotes]))) * 100) AS Eperc,
sum([Clarkvotes]) AS Clark, ((sum([Clarkvotes])/(sum([Kerryvotes])+sum([Deanvotes])+sum([Edwardsvotes])+sum([Clarkvotes])+sum([Liebermanvotes]))) * 100) AS Cperc,
sum([Liebermanvotes]) AS Lieberman, ((sum([Liebermanvotes])/(sum([Kerryvotes])+sum([Deanvotes])+sum([Edwardsvotes])+sum([Clarkvotes])+sum([Liebermanvotes]))) * 100) AS Lperc
FROM Big5ivecoalesced
GROUP BY [VotingTechUsed];

Looks scary, but the key is the GROUP BY. This divides the data into groups based on the value of VotingTechUsed. The aggregate functions (all sum in this case) are applied to each such group, i.e., each group of records with the same VotingTechUsed value. Each percentage divides what the particular candidate got by what they all got (again within the groups) to get a decimal ratio. It then multiplies this by 100 to convert it to a percentage.

The query that I used to calculate the percentage by which Kerry beat Dean is as follows:

SELECT [VotingTechUsed], sum([Kerryvotes]) AS Kerry,
( ((sum([Kerryvotes])/(sum([Deanvotes])))-1) * 100) AS Kerrymargin, sum([Deanvotes]) AS Dean
FROM Big5iveCoalesce
GROUP BY [VotingTechUsed];

To eliminate the towns with more than 20000 voters, I first went to

This is a service that provides demographic information on voters. A little material is on their site for free, including the number of voters in each town.

For every town that had a population of over 20,000 voters, I set the population threshold to 20,000. The others I set to zero (all had a population greater than zero, presumably). I may revisit this with a more granular analysis. Here is the code:

SELECT [VotingTechUsed], sum([Kerryvotes]) AS Kerry, (sum([Kerryvotes])/(sum([Kerryvotes])+sum([Deanvotes])+sum([Edwardsvotes])+sum([Clarkvotes])+sum([Liebermanvotes]))) AS Kperc, sum([Deanvotes]) AS Dean, (sum([Deanvotes])/(sum([Kerryvotes])+sum([Deanvotes])+sum([Edwardsvotes])+sum([Clarkvotes])+sum([Liebermanvotes]))) AS Dperc, sum([Edwardsvotes]) AS Edwards, (sum([Edwardsvotes])/(sum([Kerryvotes])+sum([Deanvotes])+sum([Edwardsvotes])+sum([Clarkvotes])+sum([Liebermanvotes]))) AS Eperc, sum([Clarkvotes]) AS Clark, (sum([Clarkvotes])/(sum([Kerryvotes])+sum([Deanvotes])+sum([Edwardsvotes])+sum([Clarkvotes])+sum([Liebermanvotes]))) AS Cperc, sum([Liebermanvotes]) AS Lieberman, (sum([Liebermanvotes])/(sum([Kerryvotes])+sum([Deanvotes])+sum([Edwardsvotes])+sum([Clarkvotes])+sum([Liebermanvotes]))) AS Lperc
FROM Big5iveCoalesce
WHERE PopThreshold < 20000
GROUP BY [VotingTechUsed];

The only thing new here is the WHERE clause near the bottom, which eliminates from consideration records that do not meet this criterion.

Code in this entry, such as it is, is released under the GNU Public License (GPL).

8 comments | post a comment


Date: 2004-02-03 12:39
Subject: Kerry Beat Dean in New Hampshire by Only 1.5% When Computers Weren’t Doing the Counting
Security: Public

Kerry Beat Dean in New Hampshire by Only 1.5% When Computers Were Not Doing the Counting

In the New Hampshire Democratic Primary, exit polls, which are seldom far wrong, indicated a very close race. The final vote was not close. A close race would have constituted a win for Dean, given expectations. There is serious reason to be dubious of computerized vote counting systems (see Verified Voting or Black Box Voting for details). Such systems were used in New Hampshire, especially those of Diebold, the company that has attracted the most controversy, so I decided to analyze the New Hampshire Democratic primary vote in terms of who was doing the tabulation. According to the New Hampshire Secretary of State’s office there are three possibilities:

Some ballots are counted by Diebold machines.

Some ballots are counted by ES&S machines.

Some ballots are counted by hand.

Let me note that neither the Diebold nor the ES&S ballots lack a paper trail in this case. These are optical-scan systems, where the voter marks a paper ballot that is subsequently counted by computer. There is, then, the possibility of a recount, but only if the issue is forced, since the election was not considered close enough to mandate an automatic recount. Given the problems demonstrated with Diebold systems and the serious allegations made against ES&S, perhaps such a recount should be pursued. In any case, here are the vote totals and percentages for the big five candidates, grouped by vote tallying method (percentages are percentages of the big five vote, i.e., it does not include the minor candidates)).

VotingTechUsed Kerry Kperc Dean Dperc Edwards Eperc Clark Cperc Lieberman Lperc
Diebold 59421 40.1% 37589 25.4% 18334 12.4% 19119 12.9% 13549 9.2%
ES&S 5952 37.6% 4415 27.9% 1877 11.8% 2076 13.1% 1516 9.6%
Hand 19004 34.9% 18148 33.3% 6276 11.5% 7217 13.2% 3846 7.1%

To bring the matter into sharper focus, here are the percentages by which Kerry’s vote exceeded Dean’s, grouped by tallying method.

VotingTechUsed % Margin
Diebold 58.1%
ES&S 35.0%
Hand 4.7%

Given that Kerry won by all accounts, does this matter? Yes it does. Had Dean gotten close to winning, as low as he had been the week before, he would have gotten the momentum to remain competitive, but instead New Hampshire seems to have doomed him. This may therefore go down as the pivotal election of this primary. Also, the election is not winner-take-all; delegates are assigned proportionally.

Is there any other explanation for the discrepancy? Well, the computerized systems are mostly used in the larger towns in New Hampshire. Can this be attributed to a rural preference for Dean? If the sample is limited to towns with fewer than 20,000 voters, the results are but slightly different.

VotingTechUsed Kerry Kperc Dean Dperc Edwards Eperc Clark Cperc Lieberman Lperc
Diebold 43428 39.4% 29456 26.8% 13283 12.1% 14632 13.3% 9289 8.44%
ES&S 5952 37.6% 4415 27.9% 1877 11.9% 2076 13.1% 1516 9.57%
Hand 19004 34.9% 18148 33.3% 6276 11.5% 7217 13.2% 3846 7.05%

A dramatic rural preference for Dean would be odd, given that his primary demographic is youth, but odd or not, such is not present in the figures, at least not to the extent necessary to explain the data.

The Dean campaign has cause for a recount, in my opinion. Whether they have a legal case, I don’t know. I think it would be better if a suit demanding recount were brought by a third party, however,rather than the Dean campaign, even though they are the (possibly) offended party.

At the very least, the possibility should be investigated. Someone with access to lawyers should inquire whether the ballots are still available for recount and how long they should remain available, according to law.

EXCERPT ENDS click link for copy verify
No one has commented on this article. Be the first!
» 119
» 0
Sponsored Links