|
The Prediction Registry describes events and specifies analytical
parameters to document the details of the prediction. However, since
most events are processed in the same way, the term "standard analysis"
(#1 below) is used.
This refers to a definite set of steps that are presented in
the procedures section, to avoid repetition for each item in the
registry. For a small subset (about 5 to 10 % depending on the criteria
for the standardization) some other analysis recipe is used. In these
cases, sufficient detail is usually given in place. Most of these "other"
analyses are what we call "device variance analyses" and the algorithmic
recipe for these can also be given in a separate, general description
(#3 below),
to which we can refer in the event descriptions. Here, with some
contextual discussion, are the major recipies.
As of this writing in late August, 2002, we are considering whether we
should change from the correlated meanshift (standard) analysis to the
device variance for the default procedure. There are good arguments for
doing so, including that it may be more sensitive. In any case, we
intend to apply both algorithms to all events where this is feasible,
in order to learn more about the question. For an interim period, we
will use the composite probability of the two measures as the formal
output probability. This will, in effect give an average outcome.
beginning in mid-2002, Peter Bancel posed the question,
"Without any a priori's, how
many different "recipes" are in the prediction registry and how do results
look in subgroups?
"By 'recipe' I simply mean a precise procedure that will get me from the
raw data to a stated formal GCP df and Chisquare for each event. I want to
count how many of these recipes there are and count how many predictions
go with each recipe. And eventually ask if effect size changes with
group."
We begin to answer these questions here, by describing the analysis
algorithms. This is work in progress.
Recipe #1: The "Standard Analysis"
-
1. Specify a period(s) of time
-
2. Get the raw trial counts for all the N regs for this time. If more
than one time period is specified, concatenate for each reg.
-
3. Convert the N reg data sets to z-scores using mean = 100 and var =
50.
-
4. Calc a Z for each second as Z = Sum[z]/Sqrt[N]. (Stouffer Z)
-
5. Sum Z^2 over all seconds. Note: df = number of seconds.
-
6. Calc an equivalent chi^2 on 600df ( p-values are identical)
-
7. The resulting chi^2 and df=600 can be used to give a composite
result for different predictions.
If there is a need to modify this recipe for a given prediction,
then it's a new recipe and the prediction goes into a new group.
A good, if very specialized example, is the formal prediction for
event 38, which requires appropriate modification of Recipe #1 to
replicate the stated GCP result. The prediction for event 38 was
specified as two contiguous segments, predefined to show positive
and negative expected deviations, respectively. Duplicating the analysis
and the GCP "bottom line" result obviously requires something extra to
be added to recipe #1.
Recipe #1.5 (Follows Recipe #1 with an additional step:)
-
1. Specify a period(s) of time
-
2. Get the raw trial counts for all the N regs for this time. If more
than one time period is specified, concatenate for each reg.
-
3. Convert the N reg data sets to z-scores using mean = 100 and var = 50.
-
3.5 For each period with a negative-going prediction, invert the sign of
the z-scores.
-
4. Calc a Z for each second as Z = Sum[z]/Sqrt[N]. (Stouffer Z)
-
5. Sum Z^2 over all seconds. Note: df = number of seconds.
-
6. Calc an equivalent chi^2 on 600df ( p-values are identical)
-
7. The resulting chi^2 and df=600 can be used to give a composite
z-scores.
The next recipe is for predictions that
are listed as having "15-minute" resolution. Steps 1-3 are the same as #1.
Although they use the same basic measurable (a composite signed Z-score
across eggs; Stouffer Z)
The 15-min resolution events may be considered a separate group because
recipe's #1 and #2 give different results for the same identified event.
In general, blocking in this way changes the outcome in proportion to
the blocking size. Thus if someone wishes to reproduce the GCP
results from the data they need to know the exact recipe and which
events go with which recipe.
Recipe #2:
-
1. Specify a period(s) of time
-
2. Get the raw trial counts for all the N regs for this time. If more
than one period, concatenate for each reg.
-
3. Convert the N reg data sets to z-scores using mean = 100 and var = 50.
-
4. For each reg, form a z-score over 15 minutes (900 seconds) as : W=
Sum[z]/Sqrt[900].
-
5. Sum all W^2. Note: df = (number of 15 minute blocks)x(number of
regs).
-
6. Use Sum[W^2] and df in the composite result of all predictions
Predictions for several events in the formal database were specified in
a fundamentally different manner that examines the variability among the
individual egg scores. One way to express this is as the concatenation
across eggs of their squared z-scores. The result is a Chi-square
distributed quantity, that can be composed over the period of interest
much as in Recipe #1. The formal analyses use a direct computation of
the variance among the eggs (which is essentially the same measure).
Recipe #3: The "Device Variance Analysis"
-
1. Specify a period(s) of time
-
2. Get the raw trial counts for all the N regs for this time. If more
than one time period is specified, concatenate for each reg.
-
3. Calc the variance (Var) of the egg trial counts for each second.
-
4. Cumulate the deviation of the variance from expectation
[Cumsum (Var - 50)] or [Cumsum (Var - eVar) where eVar is empirical
expectation for Var calculated as Mean (Var)]
-
5. Extract contiguous data for the full day surrounding the period in
step 1.
-
6. Compute probability of the maximum absolute deviation during the period,
based on 10,000 random permutations of the contiguous data.
à toi.
|