Description of Radin analysis for Y2K
(From email of Feb 7 2000)
[I have been] doing a reanalysis from scratch, [using freshly downloaded data from] noosphere. This is the simplest version I've been able to come up with, and quite close to my original plan.
The idea was to see whether the noise represented by the egg values would become constrained around Y2K. To look for this I calculated the variance among egg values per second, then examined whether the per-second variances would become constrained around midnight in all time zones using superposed epoch analysis. I assumed 24 time zones and that midnight occurred at the middle of each one-hour epoch. Later I learned form Ed that there are actually 38 different time zones, and that 2 of the eggs were running not on-the-hour, but on-the-half-hour. Not having planned to examine more than 24 time zones originally, or eggs that are not in sync with the others (with respect to when midnight occurred), I discuss here only an analysis method and results based upon my original expectations.
1) Take all raw data from the 24 hour period 12/31/99 10:30 AM GMT to 1/1/00 9:30 AM GMT. You do this to straddle all midnights, including the first egg at -13 GMT, where midnight there was at 11 AM GMT.
2) Form one variance per second from 26 egg values. Do not count the two eggs in India that are running on the half hour with respect to the other eggs. This creates a vector of variances, of length 3600 seconds x 24 hours.
3) Create a one-hour superposed epoch analysis matrix out of these variances. The matrix is 24 columns wide, indicating integer time zones from -13 to +10, and 3600 rows deep, corresponding to one hour, where the midpoint of the epoch is midnight.
4) Now normalize the variances in this matrix by turning each row into z scores in the usual way (i.e., find the ave and sd of each row then transform each variance score in that row into a z score). This creates a matrix of the same size as step 3, but now the values are z scores instead of variances.
5) Now find the average deviation (meaning, the average absolute deviation from the empirical mean) for each row of z scores. You do not want to find the average of the z scores, because this is zero by construction.
6) Now create a 5-minute sliding average of the vector created in step 5. The attached graph shows what you get.
7) Do a permutation analysis to find the combined p that the minimum is a low as observed, and as close to midnight as observed. For each permutation you scramble both the rows and the columns of the matrix in step 4. Results of 2000 such permutations shows that the minimum time (3 seconds before midnight in the observed data) z = -3.317, and the minimum value z = -1.861. Combined by Stouffer z, z = -3.662.
Note: Early on I found that you could get essentially this same result by applying -kurtosis to each row of variances. But some reviewers objected that a 4th order statistic was so far removed from our usual ways of thinking about RNG-type data that despite the strong statistical significance, it was probably meaningless.
I understood the objections, but couldn't quite believe that such a strong statistical result that appeared precisely where it was predicted to appear was meaningless. So I thought a bit about the equation that defines kurtosis, and then I decided that a 2nd order statistic *should* be sufficient to find the distributional changes I was looking for. So I spent a few minutes deconstructing the mathematics of kurtosis and came up with the method described above.
One way to interpret these results is that the distribution of raw values generated by the eggs did in fact become constrained, and significantly so with p < .0005, within a few seconds of Y2K. Another interpretation is that I'm a good data analyst and I can find virtually any result I want given a rich enough dataset and enough time to explore it. I don't know whether other analyses using this same general method with 38 time zones and using the two eggs running off-the-hour will find the same results.