Before Using Temperature Data Read The Fine Print

Who ever thought that you would have to treat temperature datasets and graphs like they were credit card offers that come in the mail? You know where you get a really low introductory offer, but in the fine print on the back of the third page it tells you by accepting the terms you are locked into 3 years of 21% interest after the first 6 months. Well you do and here is an example that started with one of Steve Goddard’s posts over on WUWT.

http://wattsupwiththat.com/2010/02/16/global-warming-in-texas/     

Here is a reproduction of one of the graphs used in his piece:

Figure 1

 Now where did that graph come from? Did Steve make it himself?    

The answer to those questions is it came from the US Government, Steve did not make that graph nor any of the other graphs in his piece. They all came from the National Climatic Data Center’s (NCDC), United States Historical Climatology Network (USHCN) website.   

You see NCDC provides a nice simple interface where you can get monthly station data for any station in the USHCN v2 dataset in a convenient Comma Separated Variable (CSV) file. This is similar to the setup that NASA GISS uses, however unlike GISS where when you select a station you see a graph first and then from there can get the data used to make it, the USHCN site has those functions separate.   

See you don’t even need to get the data you can just plot the graph on the USHCN site and copy it, which is where you needed to read the fine print while working your way to this point. For you see part of the data plotted in Figure 1 is not real. That’s right the data from 2003 through 2008 is not data taken from a thermometer at the Temple, Tx station.

How do I know this?  

Well lets work the steps I took to get here and I’ll provide the links so you can follow the footsteps.  

A reader of the post on WUWT (John Slayton) had a question about that plot because the station in in Temple, Tx closed in 2003. How he knew this was because he took part in the Surface Stations project and went looking for the thing at the water treatment plant. See the photos and survey here:   

http://gallery.surfacestations.org/main.php?g2_itemId=7241

So John was really curious where the data in the graph came from since there is no station there anymore. He asked in the WUWT thread but his question probably got lost in the chatter so he went over to Chiefio’s site and asked if he could find out if there was “infilling” occurring.

http://chiefio.wordpress.com/2010/02/15/thermometer-zombie-walk/

Now this is were I got involved. Chiefio has had a lot on his plate recently (see John Coleman and KUSI) and I have done a certain amount of research on the different databases. John asked about this and Chiefio answered in the context of GISTemp, but I knew that the info that Steve used didn’t come from GISS so I started looking through the USHCN site.  

First I went to the USHCN site which is here:   

http://cdiac.ornl.gov/epubs/ndp/ushcn/access.html

First you notice that you can download the entire dataset from the FTP server or you can get individual station data from the web interface. I selected the web interface and that takes us to a page that has a Google Map and a drop down menu:

 http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn_map_interface.html

Now from there select Texas from the drop down menu and click the Map Site button. This gives you a list of stations with the USHCN station ID. Scroll down the list until you get to Temple Texas, ID # 418910. Click on it and you will find back up on the map a Quote bubble. In this quote bubble you get this some options such as Get Monthly or Daily data and Monthly and Daily Documentation. 

TEMPLE, TX (418910)
Latitude 31.0781, Longitude -97.3183
Elevation 193.5 Meters
Get Monthly Data, Monthly Documentation
Get Daily Data, Daily Documentation     

From there when you select the Monthly data it takes you to a screen where you can get the CSV file or just plot the graph. What Steve did was just plot the graph and copied it:   

http://cdiac.ornl.gov/cgi-bin/broker?_PROGRAM=prog.climsite_monthly.sas&_SERVICE=default&id=418910

Now from there I went and also got the CSV file for that station and in that file there was data up to 2008:  

http://cdiac.esd.ornl.gov/sasserv/TX418910_7471.csv    

So I thought I will just read the documentation and there should be a simple explanation that a new thermometer was used and there should be an ID number of which station it was to extend the record.   

So first stop is the station list for USHCN v2 which I so happen to have on my computer (I have the whole USHCN dataset stored) and this is what it says:     

 418910 31.0781 -97.3183 210.0 TX TEMPLE —— —— —— +6    

Now what does that mean? We go to the documentation page:   

http://cdiac.ornl.gov/epubs/ndp/ushcn/monthly_doc.html

and from there we get this:   

STATION INFORMATIONThe format of each record in the USHCN station inventory file (ushcn-stations.txt) is as follows.

Variable   Columns   Type
COOP ID   1-6   Character
LATITUDE   8-15   Real
LONGITUDE   17-25   Real
ELEVATION   27-32   Real
STATE   34-35   Character
NAME   37-66   Character
COMPONENT 1   68-73   Character
COMPONENT 2   75-80   Character
COMPONENT 3   82-87   Character
UTC OFFSET   89-90   Integer

These variables have the following definitions: 

COOP ID   is the U.S. Cooperative Observer Network station identification code. Note that the first two digits in the Coop ID correspond to the assigned state number (see Table 1 below).
     
LATITUDE   is latitude of the station (in decimal degrees).
     
LONGITUDE   is the longitude of the station (in decimal degrees).
     
ELEVATION   is the elevation of the station (in meters, missing = -999.9).
     
STATE   is the U.S. postal code for the state.
     
NAME   is the name of the station location.
     
COMPONENT 1   is the Coop Id for the first station (in chronologic order) whose records were joined with those of the HCN site to form a longer time series. “——” indicates “not applicable”.
     
COMPONENT 2   is the Coop Id for the second station (if applicable) whose records were joined with those of the HCN site to form a longer time series.
     
COMPONENT 3   is the Coop Id for the third station (if applicable) whose records were joined with those of the HCN site to form a longer time series.
     
UTC OFFSET   is the time difference between Coordinated Universal Time (UTC) and local standard time at the station (i.e., the number of hours that must be added to local standard time to match UTC).

      

Ok according to that there is no other station added to make the record longer. Now that leaves a big question mark. According to the station list there is a station with ID #418910 and it has no added stations to it. The data from the CSV file goes up to 2008 and so does the graph. On the other hand we know that the CO-OP station was closed in 2003, since that is what the plant manager told the survey team and they didn’t see either an old style Stevenson screen or an MMTS. So I went to the site where they have the paper copies from the CO-OP stations in PDF format accessible:

http://www7.ncdc.noaa.gov/IPS/coop/coop.html?foreign=false&_page=0&jsessionid=181F99AA3B75589EEC107270C9EB9A9F&state=TX&_target1=Next+%3E

So there it is listed that the CO-OP station data ended in Sept of 2003.   

So where is that data coming from? If they were using another station to extend the record you are suppose to see the ID # in the data fields of the station list. 

So I thought about it for awhile and remembered something about the selection screen for making the CSV file. You see there is this option just under where you check you want the monthly temperature averages for that station. This option is called “The Mean Temperature Flag”. Now this doesn’t sound like much unless you read the fine print.  

You see all the way back, even before you select the station you want, there is this little blurb:   

Please refer to the daily and monthly data documentation before using these data.

      

Yep you need to go through the selection process and read the documentation file closely. You see in that link where they explained what each field in the station list meant there is something further down which is this:    

Each USHCN data file contains data for all 1218 stations for one of the four meteorological variables (also known as data “elements”). Each record (line) in the files contains one year of 12 monthly values plus an annual value, with formatting as follows:

Variable   Columns   Type
STATION ID   1-6   Character
ELEMENT   7-7   Integer
YEAR   8-11   Integer
VALUE1   13-17   Integer
FLAG1   18-18   Character
VALUE2   20-24   Integer
FLAG2   25   Character
.   .   .
.   .   .
VALUE13   97-101   Integer
FLAG13   102   Character

These variables have the following definitions:

STATION ID   is the station identification code. Note that the first two characters in the Station ID correspond to the state number in Table 1.
     
ELEMENT   is the element code. There are four values corresponding to the element contained in the file:
    1 = mean maximum temperature (in tenths of degrees F)
    2 = mean minimum temperature (in tenths of degrees F)
    3 = average temperature (in tenths of degrees F)
    4 = total precipitation (in hundredths of inches)
     
YEAR   is the year of the record.
     
VALUE1   is the value for January in the year of record (missing = -9999).
   
FLAG1   is the flag for January in the year of record. There are five possible values:
    Blank = no flag is applicable
    E = value is an estimate from surrounding values; no original value is available;
    I = monthly value calculated from incomplete daily data (1 to 9 days were missing);
    Q = value is an estimate from surrounding values; the original value was flagged by the monthly quality control algorithms;
    X = value is an estimate from surrounding values; the original was part of block of monthly values that was too short to adjust in the temperature homogenization algorithm.
     
VALUE2   is the value for February in the year of record.
     
FLAG2   is the flag for February in the year of record.
.   .
.   .
VALUE12   is the value for December in the year of record.
     
FLAG12   is the flag for December in the year of record.
     
VALUE13   is the annual value (mean for temperature; total for precipitation).
     
FLAG13   is the flag for the annual value.
 
   
   

      

Notice that they are talking about Flag Variables but don’t say where those flags are. Well it turns out, that is the “The Mean Temperature Flag” option, so if you check it along with the mean temp data option you get a little more info in the CSV file:   

http://cdiac.esd.ornl.gov/sasserv/TX418910_0958.csv    

So after the data is now a letter flag and in the Flag 1 variable slot. We see that from 04/2003 all the way through 2008 there is the E flag which when you look above tells us that: 

E = value is an estimate from surrounding values; no original value is available;

So basically from April 2003 through to the end of 2008 they made up the data for that station, but do they come straight out and tell you this? No.   

If you did what Steve Goddard did and just plot the graph is there any inkling that over 5 years worth of plotted data is an “estimate” just by looking at it? Nope.    

Instead you have to read the Monthly Documentation file, download a CSV file with the flag tag option checked, see the E flag then you would know that the plot that you are making from the helpful USHCN site has fake data on it. Also notice that the Graph option is at the top of the page and the area to get the CSV file is further down and it has that innocuous “Mean Temperature Flag” option without a clue of how important that is. There is also nothing mentioned about that option when you read the documentation either, matter of fact the information on that page looks geared to the FTP full dataset download then to the individual station web interface. So just like having to hunt through multiple pages on that wonderful credit card offer that came in the mail to find the fine print of how much it will really cost you, you need to apply it to US Government temperature datasets.    

What makes it this even worse is that even NASA GISS doesn’t trust that “estimated” data for that record, they cut the data off they use at 2003 .  

http://data.giss.nasa.gov/work/gistemp/STATIONS//tmp.425722570050.0.1/station.txt    

So we got data estimators and infillers at NASA and they don’t trust the data estimation and infilling at NCDC.  

But you can trust that the science is settled and gamble the worlds economy on this don’t you know!

Advertisements

8 responses to “Before Using Temperature Data Read The Fine Print

  1. John Slayton February 20, 2010 at 12:32 pm

    Thanks for your detailed exploration. Seems like we wind up with more questions than we started with. Like what were the surrounding stations that conveniently generated that temperature spike? How were the stations’ output merged? Was the station choice an interactive administrative decision or does FILNET have a procedure to choose? And on and on…

  2. boballab February 20, 2010 at 2:26 pm

    From what I found there used to be servel different stations back in the 30’s and 40’s. From that it went down to one CO-OP station until they put the automated system at the nearby airport and it is the only nearby station still operating. You can see this when you look up the airport station on the NCDC website here:
    http://www4.ncdc.noaa.gov/cgi-win/wwcgi.dll?WWDI~StnSrch

    Nothing is explicitly said but it looks like Filnet is using the airport to “stitch” 2003 to 2008 data onto the end of the Temple record.

  3. John Slayton February 22, 2010 at 1:22 am

    Temple is not alone. Just looking over my visits from last year I find the following:
    Fremont OR closed 19 Apr 96
    Modena UT closed 9 Jul 04
    Corinne UT closed 1 Mar 07
    Pecos TX closed 21 Dec 01

    All these show graphed data to 2008 as though it were genuinely from these sites.

  4. John Slayton February 22, 2010 at 10:55 am

    Add Forks 4NNE (Montana) closed 1 Apr 96

    and Gage, New Mexico closed 1 Feb 07 but just reopened 14 Jan. No gap in the graph.

  5. vjones February 27, 2010 at 2:31 pm

    Thanks for the detective work. I’ve not delved into the USHCN data, but I have found an example of this type of filnet activity in the GISS data set. You’ve made me want go and look for more.

    Is there anywhere that there is a list of station closure dates, other than looking at individual station data. (I’ll check out your links more thoroughly after posting this, perhaps I’ll find it)

  6. boballab February 27, 2010 at 2:54 pm

    I haven’t found anything, yet, but who knows they could have a list tucked away in some obscure corner. As to GISS and infilling I got an upcomming post on how much GISS’s Infill changes the temperature trends for each and every grid box on the planet.

  7. KevinUK February 28, 2010 at 6:44 am

    boballab,

    That sounds like a very interesting thread you are about to do on the effects of in-filling of missing data.

    How are you going to manage to find out what missing GISS data has been infilled? Presumably you must have the binary data files before and after the infilling process that you’ve many to read/unscramble somehow?

  8. boballab February 28, 2010 at 11:19 am

    Kevin

    The post is already up and what I did was just use the tools GISS provides. GISS has an option to do 1200km infill or 250km infill maps. From the page where they display the map there is an option (just like for single station data) to get the data for the map in text format. So from their it’s a simple matter of copying it into a spreadsheet and do a comparison. You see up to 2 degree C temperature trend swings. The worst case that I saw is an area in Northern Russia where in 1200km infill the trend was warming at 1.6 degrees per centrury and when you cut the infill back to 250 km the trend went to a -.61 degress per century. The worse aspect is that just about evey grid box is affect by changing the amount of infill. For the full thing you really need to see ythe two different posts. As to the data the only way to get closer then 250Km is to modify GISTemp which is a dicey thing and still get it to run LOL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: