When Station Data Goes Missing
February 9, 2010Posted by on
Yesterday I mentioned that by losing stations that are in the Baseline at a later time, it will change the trend line since you are no longer comparing recent data to the data that makes up the Baseline. This in turn will also get you “false” anomalies on a gridded map.
What got this all started was something from EM Smith on his site about how there was a “Dying of Thermometers” ( http://chiefio.wordpress.com/2009/11/03/ghcn-the-global-analysis/ ) that started in 1990 and progressed through 2005. What happened was that the Global Historical Climatology Network (GHCN) dataset used by NASA GISS as its base dataset has a lot of stations in the base period of 1951-80 that no longer show up in the dataset in the later years, but are still used to compute the “average” that is subtracted from these newer readings. Even more some of the newest data records don’t even reach back to the Baseline period and are not used in the process of making the “average” temperature. The deciding factor for GISS is that a record has to be at least 20 years in length to be usable so if a station was set up in 1984 and reported the data by 2003 GISS could use that record in its work. What EM Smith postulated was that by losing those stations in the later years, but still in the average that the trend is not representative of the data.
So I decided to do a simple test of this and since I already went through the NZ record I decided to use them.
Lets start off this test with this. We will take the 12 Stations in the GHCN adjusted data and get a trend for the record of 1894 to 2007 with a Baseline period of 1961-90. Then we will subtract out of the Baseline every station that does not have at least a 50% complete data record for the years 2000 to 2007. This brings us down to 4 stations and gets us Figure 1.
What we find is that by cutting the number in stations to only the ones that have enough data in both the baseline period of 1961-90 and 2000-2007 we see that the trend has shifted. Yes it’s a small change and it shows that the other 8 stations caused a “cooling” effect on the trend. Would this apply to the entire GHCN dataset I do not know at this time. It could all average out or it could amplify as areas of lost spatial coverage increase, remember that over half the entire African Continent goes “missing” starting in 1990 and most of Canada as shown by this 2008 NOAA non infilled Anomaly map in Figure 2.
Also just because this example in Figure 1 shows an increase in warming by removing stations it could go in the opposite direction. To show this I went and did another little test. I cut down the timescale of the 12 Stations to just the 1961 – 90 Baseline. This is our control trend since it has all 12 stations in it. Then I went and checked the trend for each individual station and made two more trends. One I call the Cool 4 and the other the Warm 4. Now we are going to simulate what happens if I lose the data for both the Cool 4 and the Warm 4 in the years 1981-90 and see how it affects the trend line in Figure 3.
The control trend is roughly about .4 C and when I lose the data from the 4 stations with the strongest warming trend the trend line drops to roughly .3 C and if you lose the 4 stations with the weakest warming trend you get a trend of almost .5 C.
In conclusion I believe I have shown that losing stations that were in the Baseline from later years data can have an impact on the trend line. How much and in what direction needs to be checked, but it can not just be waved away as not important since shown here when we “lost” 1/3 of the data from just a quarter of the stations it had an effect on the trend line of +/- .1 C.