| Analysis, Chisquare Calculation | ||
Assumptions and Definitions Experimental Questions Prediction Procedures Analytical Methodologies Standard Chisquare Calculation Other Experiments The primary focus for GCP analyses is anomalous shifts of the mean during periods of time specified in formal predictions. The standard test of such departures from expectation compares the Chisquare of the composite deviation across all eggs during a specified event, against theoretical expectation. This composite Chisquare is the "Stouffer Z" which is a normalized sum of the Z-scores for all predefined segments (see below). The segments may be defined either as the whole period of the prediction, or they may be broken into sub-segments (e.g., seconds or 15-minute blocks). A prediction specifies a moment or a period of time during which a deviation is expected in the data, corresponding to a global event. This provides most of the information needed for analysis, and leads to the algorithm for processing the data and calculating the statistics that may provide evidence for the hypothesis. The following is a description of the stable, standard procedures as of early 2000. The exact algorithmic procedures for the analysis must be specifed as part of the prediction, before the data are examined. This is done most often by indicating that the "standard analysis" will be used. This and other defined analyses that we have used over the course of the experiment are detailed in recipes that, if followed, will duplicate the original GCP analysis. (In some cases, extra data will have been accumulated from dial and drop eggs.) The earliest analyses used a summation of Z² across eggs, a different algorithm that is now used only when explicitly pre-specified (or in contextual explorations). This page continued to describe that superseded procedure, which simply measures the variance among the eggs, until October 2001, when the outdated description was noted. The standard or default analysis for the record is based on the composite, signed meanshift across eggs (the Stouffer Z), which properly represents an underlying hypothesis that the behavior of the eggs will tend to be correlated if there is a "global consciousness" effect. The Stouffer Z is defined as Zs = Sum(Zi)/Sqrt(i). In words, the Stouffer Z is the algebraic sum of the individual Z-scores in a set, divided by the square root of their number. York Dobyns has provided a rigorous description of the relationship of the Stouffer Z based measure and the variance measure. We begin with the assumption that the eggs are synchronized (even though this isn't 100% true). We calculate the mean, var, and Z across eggs for each second, properly treating missing values. This yields a single time-series representing the composite egg behavior, which can then be used in various analytical explorations like those done for the Y2K event. For short periods, we do not need to block the data, but in some cases, given a pre-specified reason for doing so, such as creating a manageable dataset for 6 days worth of seconds, we do blocking by a standard unit, typically minutes. For some analyses, like the inter-egg correlations, it is always necessary to choose some blocking period (Doug Mast also uses 1 minute, so that the correlations are calculated for 60 pairs of egg-trials.)
We still have much to learn, and in particular, lots to
learn about the dependence of results on these seemingly
arbitrary factors: the order in which composites are constructed, the size of
blocks, etc. This means we still need to balance the
desirable features of specificity and flexibility. There is nothing
new here, but it is especially notable because there are
so many questions and so much apparently relevant data.
The actual calculation for statistical tests involves a sequence of steps.
Control data are needed to establish the viability of the statistical
results from "active" data generated during events specified via the prediction
protocol. The control data are expected to produce chance results because by hypothesis no
engaging event is specified. The complex nature of the data in the Global Consciousness
Project and the situation-dependent nature of the predictions requires specially designed
procedures for ensuring that the statistical characterizations of the data are valid.
The main components of statistical control
are quality-controlled equipment design, thorough device calibration, and
a procedure called resampling. In addition, a "clone" database of
Algorithmic Pseudo-random data is automatically generated, and
these may be assessed as "control" data. The combined force
of these efforts ensures that the GCP data meet appropriate standards, and that the
"active" subsets subjected to hypothesis-testing are evaluated against chance
expectation as well as a large of surrounding "control" and calibration data.
See also Appendix, Nelson et al., FieldREG II.
|