Comment 050108
Things are starting to look very interesting indeed.
I have been working on two fronts.
The big news is that I have looked at the alternate second data and
there
is a very clear correlation between the Stouffer z-sqrd cumdev [and its
Janus twin, the network variance] for two datasets with alternating
seconds of data. That is all odd seconds go to one set and the even
seconds go to the other. There is a strong correlation which is exactly
what one expects if an anomalous effect is responsible for the structure
in the data. [The significance depends on the numbers and I'm
calculating
empirical distributions for my correlation function. So it will take a
day
or two to have more precision.]
But it's better than that.
The correlation only comes through for the post-9/11 data.
And it is coming from the same structure that we see correlating with
the
poll results. So this alternate second result is independent of the poll
result and also supportive of it ( or visa vera) Anyway it is starting
to
make a very nice story, since we now have
1. event experiment significant for an effect on short timecsales and
connecting to global scales.
2. alternate datasets showing significance on long timescales.
3. poll results connecting long time behavior to global events.
That's the short of it.
The attached html doc has some more details and some more are to come.
The technical point of importance is that I've devised a correlation
test
capable of detecting correlations in structure [as opposed to the too
limited usual correlation coeffcients that miss detailed structure like
peaks]. This is sketched in the doc.
The other front is the poll results. I'll use my structure correlation
test to look at the data/poll correlation quantitatively. But another
very
interesting avenue is to study the correspondance by making a mapping
from
one to the other. I fiddled around a bit and it looks possible, but it
will take some work. It would be a nice demo to show how you can
generate
the structure in the GCP data by a simple transformation of the poll
data.
I think we have some red meat.
At this point we should reconsider the late April date for a meeting.
This
could be the good moment to do it and we could still make it if we move
fast. Read over the doc and let me know what you think.
Comment 050109
Here's a little update.
I should have some pval envelopes done a little later in the day, but at
the moment it indeed looks like a zscore of 3 for the correlation on
post-9/11 alt-sec netvar sets. The z-score for the pre-9/11 will be
small,
probably less than 0.4. In terms of pvals, the post-9/11 correlation is
near .002.
I have also looked at the device variance for the same correlation.
There
is nothing there. That's very interesting indeed because it helps us in
our quest to find "the right statistic". It's looking more and more like
the stouffer Zsqr'd aka netvar is a good one. This ties in nicely with
the
significant result of the event based analyses, which are mostly
standard
analyses aka netvar. Interestingly, our official NYear variance events
measure the device variance and we don't see any effect there. This
could
be corroborating evidence.
So these are some things we can look at with regard to the altsec
results.
I will send you an updated memo with these new results by the end of the
day.
When do you want to talk about 'what's next?'
Comment 050110
Here are two revised plots with envelopes updated (tho' I'm still
calculating...)
Netvarcorr.gif should replace index_gr_5 (give it that name if you want
it
to load in the memo htm page)
NetvsDevcorr.gif replaces index_gr_9.
The basic result is a pval of better than 0.003 for the alt-sec post
9/11
correlation.
The Z-score for that is 2.8.
The big picture that is emerging is this:
We show correlation between GCP data and a societal metric AND we
show independently that the GCP data has non-random structure (via the
alt-sec analysis).
The icing on the cake is that we can unpack the alt-sec analysis to show
the data features that correlate on alternate seconds are precisely the
ones that correlate with the poll. So it's a check and mate situation. I
think we can show that this hangs together at the 0.001 pval level.
The big things we are learning are:
1. We can test to see if data trends are non-random (incredible!).
2. We can determine what stats capture the effect. For instance is it
the
netvar or the devvar? (this was the goal of the event-based analysis).
What we simplify for the moment is the possibility that the effect has
several aspects (it could be global consc + experimenter, after all...)
So where do we go from here?
First, we should talk. Can we do it today?
We need to move quickly if we want to try for a Spring meeting.
Also, I need to make some commitments for the next 6-9 months in the
next
few days.
[I delayed decsions when I first saw the poll correlation]
What we decide to do for the project effects my choices.
Here's what I'd like.
1. Have an meeting in late April
If we want a meeting we should send the analysis memo to Dean
(and Marylin?) asap to get them excited and fix the Ions date.
If they're ok, send emails to principals and nail it down.
2. Write a paper for FoP
A big lesson I learned (the hard way) during my thesis and
later during the post-doc at IBM was when one should cut the
work and sit down and write a Letter. My gut is telling me
big-time this is a cut-and-write case. I'm pretty sure we can
get a Letter published. This is also an excellent preparation
for the April meeting. It will also help loads for funding
requests, so best to get it in the pipeline now.
3. Find some money so I can put time into the analysis.
Eternal problem but I'm a pumpkin without some revenue.
Some immediate next things to do:
1. Calculate the correlation of netvar and presidential poll
data.
2. See if the correlation for alt-secs works on shorter timescales :
look at 9/11.
If this is so we have independent evidence that the strong 3-day
deviation after 9/11 was not merely an extraordinary chance
fluctuation. That would be a substantial result.
3. Check the correlation for alternate minutes of data, instead of
alternate seconds . [This will be a nail-in-the-coffin for "inherent
electronic autocorrelations in the devices"-type arguments against
anomalous interpretations. Actually, there is a good story with
several parts to destroy those objections]
These 3 are all quick to do.
There are important and obvious further tracks to take. But most of
them could potentially get bogged down and take considerable time to
get right.
One priority direction is to look for another metric like the poll
data.
Another is to look for a better stat than the netvar.
[actually, I suspect that a measure of the average reg
pair-correlation is the underlying statistic. This is a major
component of the netvar...]
But I think we should focus on a draft paper for the end of
February.
Comment 050110.2
It's quite amazing how things are coming together.
Here is another piece of the puzzle.
Attached is a plot from the early blocking page I did.
Look at the upper right plot.
The BLACK trace is the event analysis cumdev for the 1-sec standard
analysis (aka netvar).
It's a bit hard to see, but it shows the cumulative result for all
accepted events using the standard recipe.
The big downward kink between 85-125 corresponds exactly to the big
drop in the overall netvar from Jan 2002 - Oct 2003.
[Detail: the plot is a little different than the usual z-score cumdev.
It
plots the terminal value of the aggregate result at each point.]
So all those events were effected by that persisent year(s)-long trend.
If we want to estimate post-hoc what's the real-time effect of the
events,
we need to take out the long-timescale background trend.
So clearly there is a stronger event effect happening than get from the
the formal analysis result.
And, by the way, it appears that decrease effect goes away at larger
blockings, just like the overall event effect. More evidence that the
1-sec netvar is our best stat.
|