Their data  
Puzzling logic  
The basic logical error  


In an article by BBM (Bar-Hillel, Bar-Natan and McKay) in the last issue of CHANCE, two different kinds of claims against WRR are raised. The first kind purports to attack the lists of appellations and dates. These claims are dealt with in great detail and refuted in documents posted on to which the reader is referred. It is shown there that in every single case their claims of the rules being broken are not justified. On the other hand, it is demonstrated there that the “success” of the "cooked" list in War and Peace was produced entirely by breaking the rules.

Given the limitation of space, we concentrate in this letter on their second kind of claims: allegedly beneficial choices.

BBM define a "fortunate" choice as one which improves the ranking in the permutation race, and an "unfortunate" choice as one which hurt the ranking.
The main claim of BBM in the section "Choices, Choices" is: "Wonder of wonders, however, it turns out that almost always (though not quite always) the allegedly blind choices paid off: Just about anything that could have been done differently from how it was actually done would have been detrimental to the list’s ranking in the race. In particular, all the choices listed in the present section were fortunate for WRR. Had any of them been different, the ranking of the lists in the permutation race would have gone down" (pg. 18). We shall check the validity of this statement, firstly in the light of their own data.

Their Data:
Such data is found in a report by McKay dated the 3rd of April '97, describing 20 variations on the experiments of WRR. For 19 variations he has calculated the raw values of P1 and P2 as well as their rank in a permutations race (for one variation he calculated it for P1 only). His calculations were done regarding both List1 and List2. In Table 1 we classified his results for the ranking, accordingly to their being "fortunate" (the rank for the original choice was the smallest), "unfortunate" (there was smaller rank), and neutral.

Kind of choice "fortunate" "unfortunate" neutral
List1 rank of P1 9 10 1
List1 rank of P2 13 6 0
List2 rank of P1 9 10 1
List2 rank of P2 14 4 1
Total 45 30 3



Thus out of 78 choices, 45 choices were "fortunate".
We cannot understand how this data can be reconciled with BBM's claim "that almost always (though not quite always)" WRR's choices were "fortunate".
On page 18, there are two other statements:
1. "---use of combination of date forms (and also using both forms of the 15th and 16th of the month) is superior to any single date form."
2. "Moreover, the triplet of date forms used by WRR is superior to any of the other 14 choices".
However, it is simply not so. The rank of P2 for List1, using the "triplet of date forms used by WRR" is 36 out of 1,000,000 permutations. But using only the single date form b'alef Tishrey,  the rank becomes smaller: 8 out of 1,000,000 permutations. Using the pair of date forms: b'alef Tishrey and alef b'Tishrey, the rank becomes even smaller: 1 out of 1,000,000. [The calculations were done with the same programs and seed as in the original experiment].

Puzzling Logic:
Until now, we have examined some of BBM's results as they stand, ignoring the discussion if they are at all relevant. But, at this stage, the reader must be aware of two simple facts:
a. The measuring parameters, and the forms of writing the dates, were established even before the first experiment (List1), and were published before the researchers were asked to perform the second experiment (List 2). The calculation of choices, therefore, must be done only with regards to the first experiment.
b. In the first experiment (as well as in the second), the only indicators of success were the raw values of the measures P1 and P2.

The only criterion to judge if a choice was “fortunate” - i.e. improved the result - is according the raw values of the measures P1 and P2. Any analysis of the choices aiming to uncover possible process of optimization, must be done relating these raw values. Therefore, it is extremely strange that BBM's analysis was performed according to an irrelevant test: the test of permutations, which was suggested two years after the publication of all the choices.
BBM understand the lack of logic behind their claim and they write:
“Some might claim that it is not “fair” that the choices were tested with respect to their effect on the permutation race rank, because this statistic had not yet been developed when the choices were made.” – This is an understatement.
Mathematician Robert J. Aumann has already criticized such analysis of Maya Bar Hillel: “For this to make sense, clearly the statistic to be calculated in connection with each choice should be the one with which WRR were working at the time that the choice in question was made. Here is problem no. 1 with Maya's tests: she does NOT do this.
      The statistic Maya uses are the rank order out of ten million random permutations. But the entire test – dates, spelling, appellations, date forms,everything, was fixed before Diaconis suggested the permutations. Using the permutations here is an inadmissible anachronism---" (an excerpt from a letter by Robert J. Aumann dated 17 Jan 97. In this letter, an excellent analysis of 13 choices checked by M. Bar Hillel is given. The reader is urged to read this analysis in full, as well as Document 4; both are posted on the same web site)
Examining choices by the raw values shows that, for instance, the choice of the three forms of dates out of the four listed by BBM (p.19), was "unfortunate" for both lists in comparison with taking all four forms.

The Basic Logical Error:
However, the logic of BBM is flawed on a more fundamental level.
As a basis for their analysis, BBM state: "It is possible to set up a null hypothesis of blind choice, according to which the proportion of fortuitous choices is expected to be no higher than 50%."
This null hypothesis contains a clear logical error: it is true ONLY if one assumes in advance that the research hypothesis of WRR (the existence of the ELS phenomenon) is not correct. It ignores the fact that in certain cases WRR could expect beforehand that the results would be improved by their choice. For example, there should be no surprise that the results are improved by taking correct dates and not the wrong dates, IF the research hypothesis of WRR is correct. Of course, BBM rule out such a possibility, but it is a logical error to assume this (the absence of the ELS phenomenon) in their analysis. This mistake immediately invalidates a large part of their analysis.

Similar arguments also apply to a more technical point. On p.18, BBM list alternative suggestions to the proximity measure used in WRR. All but one of these suggestions use the first power of Euclidean distance (instead of the second power used by WRR). Therefore they have a "disfocusing" effect, and we can well expect in advance that this change will weaken the results. (One of these suggestions - using just the Euclidean distance between the two nearest letters in the pair - is even worse, because it totally disregards the geometry of the ELS's meetings. To be sure, such "test" cannot but fail: no wonder it fails.)
When all the above mentioned logical errors are corrected, entirely different picture emerges: a fair balance between "fortunate" and "unfortunate" choices. For the details of it the reader is referred to Document 4.

Doron Witztum, Eliyahu  Rips.