DRAFT.
This edition: Elul 25 5758 (September 16 1998). First edition: Iyyar 14 5758 (May 10 1998).
By Doron Witztum and Yosef Beremez [
In the experiment described in [1] significance was measured using a randomization
test. This test was first developed for use on the second sample of famous rabbinical
personalities (see [1] for details). The purpose of the test was to determine whether the
Overall Measures of Proximity for the sample - During the past year this test came under criticism from Dr. B. D. McKay [2]. Dr. McKay
claims that the test incorporates a methodological error. We will discuss his assertion
and show empirically - using a
I. Dr. McKay`s claim: Dr. McKay criticized the significance test described in [1], claiming that the test
incorporates a methodological error Actually, problems of this sort have been addressed already by our randomized pairing test: Suppose that the success of the convergences of a particular appellation was due entirely to its "charisma". If this were the case, this charismatic appellation should succeed equally well with other dates. The results of the permuted sample, in which random pairings replace the correct ones should be succeed to about the same degree. Thus the randomization test should serve to cancel the effects of the charisma of any particular appellation. Dr. McKay, however, claims that residual effects can still have a significant effect on the results.
II. The new measurement: In our estimation the residual effect mentioned by Dr. McKay is marginal, and only has
a negligible effect on the results. To demonstrate this, we subjected the second sample to
a different randomization test [ 1. If a word 2. This strategy solves the problem for the "first" word, but not for the "second". Therefore, we must arrange that every expression occurring as a "second word" be used no more than once. The sample under investigation consists of word pairs in which one word is the appelation of a rabbi and the second is a date. Usually there is more than one appellation for each personality. If we take the appellation as the "first" word, then we will have the same date as the "second" word several times. The date of birth or death was used in 3 different forms: ) T$RY , B) T$RY , ) BT$RY. Therefore, each appellation will, as a rule, take part in 3 pairings, that is, in association with each form of the date. Thus if we take the date as the "first" word, we will have to take each appellation as the "second" word several times. The solution: Let us divide the sample into three sets: Let us look, for example, at Set 1: The first personality on our list of rabbis has
several appellations: RBY )BRHM , HR)BY , HR)BD , HRB )BD , and H)$KWL. He passed away on the 20 K X$WN --- RBY )BRHM
In all our calculation we will take the date as the "first" word, and we will take it only as ELSs. However, the ELSs of each appellation (the "second" word) will compete with its PLSs over the more successful proximities to the ELSs of the date. We will follow the same procedure for all the dates and appellations in the sample. In each set every appellation appears only once, with the exception of appellations of the form "Rabbi So-and-So", which sometimes apply to more than one personality (for example, several personalities were known as "Rabbi Avraham"). To avoid this problem one could, for example, take only that "Rabbi So-and-So" whose date is the first in the sample which appears as an ELS. 3. In this manner we receive a set of results 4. To this end we will perform the The number of perturbed samples one can construct in this manner is enormous. Let us
label it N (one of these samples is the original set 1). Theoretically one could calculate
P`. We could then arrange these values in order of magnitude.
If the phenomenon we are measuring is random, the value _{i}P (the Overall
Measure of Proximity for Set 1) has an equal chance of occupying any of the N positions on
the list of values of _{i }P`._{i} As has been mentioned previously, the number N is enormous. For this reason we were
unable to calculate all the values of P` for each of these samples. Including _{i }P,
we will have M+1 values, which we can then arrange according to the usual order of real
numbers. We will define the "rank" of _{i}P among the M+1 values
as the number of _{i }P` whose magnitude is no greater than that of _{i}P
(if some of the values for _{i}P` are exactly equal to _{i}P,
we will consider half of them to "exceed" _{i}P). Next we will
define r_{i}_{i} as the rank of P divided by M+1. r_{i}_{i}
expresses the probability of P achieving such a low ranking._{i }
III The results: We ran the above test using M = 999,999 permuted samples. We recorded the ranking
out of 1,000,000 for the values of
Table 1
The first set was the most successful, particularly P of this set using M=999,999,999 permuted
samples. Its ranking was 313 out of 1,000,000,000. We calculated r_{2}_{i }and min r_{i
}for each set. The level of significance of each group is 2min r_{i}.
Table 2
Using a completely
Here are some technical points concerning the measurement above:
Appendix: 1. We took one of the set's pairs and carried out 100 different permutations of the appellation. In the event that the number of possible different permutations n was less than 100, we performed n permutations. The permutations were conducted in a standardized manner using a program designed by Yaakov Rosenberg. For example: the first pair is "K X$WN - RBY )BRHM" . We shall present here some of the pairs which are formed by the permutations (by order, from left to right) and also the original pair:
2. We calculated the values of c(w,w') for the convergences of all 100
(or n) permutated appellations with the date taken only as ELS's, as described
in sec. 2, par. 4. For example, with regards to the example above, we
obtain a row of cells. In each cell there is a c-value of the specific
pair. An empty cell means that the permutation of the appellation did
not appear as ELS:
3. If an appellation of one of the personalities is a part of another appellation of his, we took care that this relation will be kept in their permutations as well. For example: the appellation "MHRX$" is included in the appellation "HMHRX$" . The permutations of "HMHRX$" were taken as the permutations of "MHRX$" with a "H" as a prefix. Here are some of the pairs which are formed by the permutations of "MHRX$" (by order, from left to right) and also the original pair: Their c-values are:
In parallel, the permutations for "HMHRX$" give the following: And their c-values are:
We slotted these numbers into one row of cells: in each cell there are two c-values: the one for the permutation of "MHRX$" and one for the parallel permutation of "HMHRX$" : 4. Stages 1,2 and 3 were performed with regards to all the pairs in the set. We thus obtained rows of cells, each containing 101 (or n+1) cells. In each cell which is not empty, there are one, two or more values of c(w,w').
5. We then chose by lottery one of the cells in the first row, one of the cells in the second row, and so on. We obtained a set of values of c(w,w') and we calculated the values of P for them.
_{i}6. We repeated this procedure 999,999 times, using an algorithm for randomization similar to that described in [1]. The program used was also prepared by Yaakov Rosenberg. We used a seed of 10. 7. For Set 1 we ran the lottery 999,999,999 times using the same program and the same seed. Remark: We intend to repeat the experiment with 1000 permutations per each appellation
Bibliography: - D. Witztum, E. Rips & Y. Rosenberg, "Equidistant Letter
Sequences in the Book of Genesis",
*Stat. Science*, Vol. 9 ('94), No. 3, pp. 429-438. - B. D. McKay, "Equidistant Letter Sequences in Genesis - A Report" (Draft), Apr. 3, `97.
- D. Witztum, E. Rips & Y. Rosenberg, "A Hidden Code in the Book of Genesis- the Statistical Significance of the Phenomenon", ("CPN XBWY BSPR BR)$YT" ) preprint '96 (Hebrew).
Notes: Note 3: This test is essentially the same randomization test that we proposed and implemented in our work on "Headline" samples (see [3]). |