Why did Australia cross Cap-n-Tax Road? Could it be this?
So why did Australia unexpectedly dump their “cap and trade” scheme? Could it be related to the following from the HARRY_READ_ME.txt file? Where they talk about throwing 2/3rds of the Australia data out, then bringing it back in because dumping it for HADCRUT3 would reveal the lies processed in HADCRUT2? The money quote is at the end:
I’ve tried the simple method (as used in Tim O’s geodist.pro, and the more complex and accurate method found elsewhere (wiki and other places). Neither give me results that are anything near reality.
Does anyone else get a laugh out of these supposedly brilliant “scientists” going to WIKIPEDIA to get code?
I’ve tried to snip out the worst of the code-mumbo-jumbo for those not inclined to read code, but left enough that no one may claim I’m taking anything “out of context.”
Decided to process temperature all the way. Ran IDL:
then glo2abs, then mergegrids, to produce monthly output grids. It apparently worked:
As a reminder, these output grids are based on the tmp.0705101334.dtb database, with no merging of neighbourly stations and a limit of 3 standard deviations on anomalies.
Decided to (re-) process precip all the way, in the hope that I was in the zone or something. Started with IDL:
“Hoping” to be close to right? Sounds like junk code to me. But I digress.
Then glo2abs, then mergegrids.. all went fine, apparently.
31. And so.. to DTR! First time for generation I think.
Wrote ‘makedtr.for’ to tackle the thorny problem of the tmin and tmax databases not being kept in step. Sounds familiar, if worrying. am I the first person to attempt to get the CRU databases in working order?!! The program pulls no punches. I had already found that tmx.0702091313.dtb had seven more stations than tmn.0702091313.dtb, but that hadn’t prepared me for the grisly truth:
Yes, the difference is a lot more than seven! And the program helpfully dumps a listing of the surplus stations to the log file. Not a pretty sight. Unfortunately, it hadn’t worked either. It turns out that there are 3518 stations in each database with a WMO Code of ‘ 0’. So, as the makedtr program indexes on the WMO Code.. you get the picture. *cries*
Rewrote as makedtr2, which uses the first 20 characters of the header to match:
The big jump in the number of ‘surplus’ stations is because we are no longer automatically matching stations with WMO=0.
Here’s what happened to the tmin and tmax databases, and the new dtr database:
Old tmin: tmn.0702091139.dtb Total Records Read: 14309
New tmin: tmn.0705162028.dtb Total Records Read: 14106
Del tmin: tmn.0702091139.dtb.del Total Records Read: 203
Old tmax: tmx.0702091313.dtb Total Records Read: 14315
New tmax: tmx.0705162028.dtb Total Records Read: 14106
Del tmax: tmx.0702091313.dtb.del Total Records Read: 209
New dtr: dtr.0705162028.dtb Total Records Read: 14107
*sigh* – one record out! Also three header problems:
BLANKS (expected at 8,14,21,26,47,61,66,71,78)
Why?!! Well the sad answer is.. because we’ve got a date wrong. All three ‘header’ problems relate to this line:
6190 94 95 98 100 101 101 102 103 102 97 94 94
..and as we know, this is not a conventional header. Oh bum. But, but.. how? I know we do muck around with the header and start/end years, but still..
Wrote filtertmm.for, which simply steps through one database (usually tmin) and looks for a ‘perfect’ match in another database (usually tmax). ‘Perfect’ here means a match of WMO Code, Lat, Lon, Start-Year and End-Year. If a match is found, both stations are copied to new databases:
Old tmin database: tmn.0702091139.dtb had 14309 stations
New tmin database: tmn.0705182204.dtb has 13016 stations
Old tmax database: tmx.0702091313.dtb had 14315 stations
New tmax database: tmx.0705182204.dtb has 13016 stations
I am going to *assume* that worked! So now.. to incorporate the Australian monthly data packs. Ow. Most future-proof strategy is probably to write a converter that takes one or more of the packs and creates CRU-format databases of them. Edit: nope, thought some more and the *best* strategy is a program that takes *pairs* of Aus packs and updates the actual databases. Bearing in mind that these are trusted updates and won’t be used in any other context.
From Dave L – who incorporated the initial Australian dump – for the tmin/tmax bulletins, he used a threshold of 26 days/month or greater for inclusion.
Obtained two files from Dave – an email that explains some of the Australian bulletin data/formatting, and a list of Austraian headers matched with their internal codes (the latter being generated by Dave).
Actually.. although I was going to assume that filtertmm had done the synching job OK, a brief look at the Australian stations in the databases showed me otherwise. For instance, I pulled all the headers with ‘AUSTRALIA’ out of the two 0705182204 databases. Now because these were produced by filtertmm, we know that the codes (if present), lats, lons and dates will all match. Any differences will be in altitude and/or name. And so they were:
crua6 diff tmn.0705182204.dtb.oz tmx.0705182204.dtb.oz | wc -l
..so roughly 100 don’t match. They are mostly altitude discrepancies, though there are an alarming number of name mismatches too. Examples of both:
0 -3800 14450 8 AVALON AIRPORT AUSTRALIA 2000 2006 -999 -999.00
0 -4230 14650 595 TARRALEAH CHALET AUSTRALIA 2000 2006 -999 -999.00
Examples of the second kind (name mismatch) are most concerning as they may well be different stations. Looked for all occurences in all tmin/tmax databases:
crua6 grep ‘TARRALEAH’ *dtb
tmn.0702091139.dtb: 0 -4230 14650 585 TARRALEAH VILLAGE AUSTRALIA 2000 2006 -999 -999.00
tmn.0702091139.dtb:9597000 -4230 14645 595 TARRALEAH CHALET AUSTRALIA 1991 2000 -999 -999.00
tmn.0705182204.dtb: 0 -4230 14650 585 TARRALEAH VILLAGE AUSTRALIA 2000 2006 -999 -999.00
tmn.0705182204.dtb:9597000 -4230 14645 595 TARRALEAH CHALET AUSTRALIA 1991 2000 -999 -999.00
tmx.0702091313.dtb: 0 -4230 14650 595 TARRALEAH CHALET AUSTRALIA 2000 2006 -999 -999.00
tmx.0702091313.dtb:9597000 -4230 14645 595 TARRALEAH CHALET AUSTRALIA 1991 2000 -999 -999.00
tmx.0705182204.dtb: 0 -4230 14650 595 TARRALEAH CHALET AUSTRALIA 2000 2006 -999 -999.00
tmx.0705182204.dtb:9597000 -4230 14645 595 TARRALEAH CHALET AUSTRALIA 1991 2000 -999 -999.00
This takes a little sorting out. Well first, recognise that we are dealing with four files: tmin and tmax, early and late (before and after filtertmm.for). We see there are two TARRALEAH entries in each of the four files. We see that ‘TARRALEAH VILLAGE’ only appears in the tmin file. We see, most importantly perhaps, that they are temporally contiguous – that is, each pair could join with minimal overlap, as one is 1991-2000 and the other 2000-2006. Also, we note that the ‘early’ one of each pair has a slightly different longitude and altitude (the former being the thing that distinguished the stations in filtertmm.for).
Finally, this, from the tmax.2005120120051231.txt bulletin:
95018, 051201051231, -42.30, 146.45, 18.0, 00, 31, 31, 585, TARRALEAH VILLAGE
So we can resolve this case – a single station called TARRALEAH VILLAGE, running from 1991 to 2006.
But what about the others?! There are close to 1000 incoming stations in the bulletins, must every one be identified in this way?!! Oh God. There’s nothing for it – I’ll have to write a prog to find matches for the incoming Australian bulletin stations in the main databases. I’ll have to use the databases from before the filtertmm application, so *0705182204.dtb. And it will only need the Australian headers, so I used grep to create *0705182204.dtb.auhead files. The other input is the list of stations taken from the monthly bulletins. Now these have a different number of stations each month, so the prog will build an array of all possible stations based on the files we have. Oh boy. And the program shall be called, ‘auminmaxmatch.for’.
Assembled some information:
crua6 wc -l *auhead
(the ‘auhead’ files were created with )
Actually, stopped work on that. Trying to match over 800 ‘bulletin’ stations against over 3,000 database stations *in two unsynchronised files* was just hurting my brain. The files have to be properly synchronised first, with a more lenient and interactive version of filtertmm. Or… could I use mergedb?! Pretend to merge tmin into tmax and see what pairings it managed? No roll through obviously. Well it’s worth a play.
..unfortunately, not. Because when I tried, I got a lot of odd errors followed by a crash. The reason, I eventually deduced, was that I didn’t build mergedb with the idea that WMO codes might be zero (many of the australian stations have wmo=0). This means that primary matching on WMO code is impossible. This just gets worse and worse: now it looks as though I’ll have to find WMO Codes (or pseudo-codes) for the *3521* stations in the tmin file that don’t have one!!!
OK.. let’s break the problem down. Firstly, a lot of stations are going to need WMO codes, if available. It shouldn’t be too hard to find any matches with the existing WMO coded stations in the other databases (precip, temperature). Secondly, we need to exclude stations that aren’t synchronised between the two databases (tmin/tmax). So can mergedb be modified to treat WMO codes of 0 as ‘missing’? Had a look, and it does check that the code isn’t -999 OR 0.. but not when preallocating flags in subroutine ‘countscnd’. Fixed that and tried running it again.. exactly the same result (crash). I can’t see anything odd about the station it crashes on:
0 -2810 11790 407 MOUNT MAGNET AERO AUSTRALIA 2000 2006 -999 -999.00
2000 339 344 280 252 214 202 189 196 262 291 316 377
2001 371 311 310 300 235 212 201 217 249 262 314 333
2002-9999-9999 339 297 258 209 205 212 246 299 341 358
2003 365 367 336 296 249 195 193 200 238 287 325 368
2004 395 374 321 284 219 214 173 188 239 309 305 370
2005 389 396 358 315 251 182 189 201 233 267 332 341
2006 366 331 314 246 240-9999-9999-9999-9999-9999-9999-9999
.. it’s very similar to preceding (and following) stations, and the station before has even less real data (the one before that has none at all and is auto-deleted). The nature of the crash is ‘forrtl: error (65): floating invalid’ – so a type mismatch possibly. The station has a match in the tmin database (tmn.0702091139.dtb) but the longitude is different:
0 -2810 11780 407 MOUNT MAGNET AERO AUSTRALIA 2000 2006 -999 -999.00
0 -2810 11790 407 MOUNT MAGNET AERO AUSTRALIA 2000 2006 -999 -999.00
It also appears in the tmin/tmax bulletins, eg:
7600, 070401070430, -28.12, 117.84, 16.0, 00, 30, 30, 407, MOUNT MAGNET AERO
Note that the altitude matches (as distinct from the station below).
Naturally, there is a further ‘MOUNT MAGNET’ station, but it’s probably distinct:
9442800 -2807 11785 427 MOUNT MAGNET (MOUNT AUSTRALIA 1956 1992 -999 -999.00
9442800 -2807 11785 427 MOUNT MAGNET (MOUNT AUSTRALIA 1957 1992 -999 -999.00
I am at a bit of a loss. It will take a very long time to resolve each of these ‘rogue’ stations. Time I do not have. The only pragmatic thing to do is to dump any stations that are too recent to have normals. They will not, after all, be contributing to the output. So I knocked out ‘goodnorm.for’, which simply uses the presence of a valid normals line to sort. The results were pretty scary:
Stations retained: 5026
Stations removed: 9283
GOODNORM: Extract stations with non-missing normals
Please enter the input database name: tmx.0702091313.dtb
The output database will be called: tmx.0705281724.dtb
(removed stations will be placed in: tmx.0705281724.del)
Stations retained: 4997
Stations removed: 9318
Essentially, two thirds of the stations have no normals! Of course, this still leaves us with a lot more stations than we had for tmean (goodnorm reported 3316 saved, 1749 deleted) though still far behind precipitation (goodnorm reported 7910 saved, 8027 deleted).
I suspect the high percentage lost reflects the influx of modern Australian data. Indeed, nearly 3,000 of the 3,500-odd stations with missing WMO codes were excluded by this operation. This means that, for tmn.0702091139.dtb, 1240 Australian stations were lost, leaving only 278.
This is just silly. I can’t dump these stations, they are needed to potentially match with the bulletin stations. I am now going to try the following:
1. Attempt to pair bulletin stations with existing in the tmin database. Mark pairings in the database headers and in a new ‘Australian Mappings’ file. Program auminmatch.for.
2. Run an enhanced filtertmm to synchronise the tmin and tmax databases, but prioritising the ‘paired’ stations from step 1 (so they are not lost). Mark the same pairings in the tmax headers too, and update the ‘Australian Mappings’ file.
3. Add the bulletins to the databases.
OK.. step 1. Modified auminmaxmatch.for to produce auminmatch.for. Hit a semi-philosophical problem: what to do with a positive match between a bulletin station and a zero-wmo database station? The station must have a real WMO code or it’ll be rather hard to describe the match!
Got a list of around 12,000 wmo codes and stations from Dave L; unfortunately there was a problem with its formatting that I just couldn’t resolve.
So.. current thinking is that, if I find a pairing between a bulletin station and a zero-coded Australain station in the CRU database, I’ll give the CRU database station the Australian local (bulletin) code twice: once at the end of the header, and once as the WMO code *multiplied by -1* to avoid implying that it’s legitimate. Then if a ‘proper’ code is found or allocated later, the mapping to the bulletin code will still be there at the end of the header. Of course, an initial check will ensure that a match can’t be found, within the CRU database, between the zero-coded station and a properly-coded one.
Debated header formats with David. I think we’re going to go with (i8,a8) at the end of the header, though really it’s (2x,i6,a8) as I remember the Anders code being i2 and the real start year being i4 (both from the tmean database). This will mean post-processing existing databases of course, but that’s not a priority.
A brief (hopefully) diversion to get station counts sorted. David needs them so might as well sort the procedure. In the upside-down world of Mark and Tim, the numbers of stations contributing to each cell during the gridding operation are calculated not in the IDL gridding program – oh, no! – but in anomdtb! Yes, the program which reads station data and writes station data has a second, almost-entirely unrelated function of assessing gridcell contributions. So, to begin with it runs in the usual way:
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
> Enter the suffix of the variable required:
> Will calculate percentage anomalies.
> Select the .cts or .dtb file to load:
> Specify the start,end of the normals period:
> Specify the missing percentage permitted:
> Data required for a normal: 23
> Specify the no. of stdevs at which to reject data:
But then, we choose a different output, and it all shifts focus and has to ask all the IDL questions!!
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
> Check for duplicate stns after anomalising? (0=no,>0=km range)
> Select the .stn file to save:
> Enter the correlation decay distance:
> Submit a grim that contains the appropriate grid.
> Enter the grim filepath:
> Grid dimensions and domain size: 720 360 67420
> Select the first,last years AD to save:
> NORMALS MEAN percent STDEV percent
> .dtb 7315040 73.8
> .cts 299359 3.0 7613600 76.8
> PROCESS DECISION percent %of-chk
> no lat/lon 17911 0.2 0.2
> no normal 2355275 23.8 23.8
> out-of-range 13253 0.1 0.2
> accepted 7521013 75.9
> Calculating station coverages…
And then.. it unhelpfully crashes:
> ##### WithinRange: Alloc: DataB #####
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Ho hum. I did try this last year which is why I’m not tearing my hair out. The plan is to use the outputs from the regular anomdtb runs – ie, the monthly files of valid stations. After all we need to know the station counts on a per month basis. We can use the lat and lon, along with the correlation decay distance.. shouldn’t be too awful. Just even more programming and work. So before I commit to that, a quick look at the IDL gridding prog to see if it can dump the figures instead: after all, this is where the actual ‘station count’ information is assembled and used!!
..well that was, erhhh.. ‘interesting’. The IDL gridding program calculates whether or not a station contributes to a cell, using.. graphics. Yes, it plots the station sphere of influence then checks for the colour white in the output. So there is no guarantee that the station number files, which are produced *independently* by anomdtb, will reflect what actually happened!!
Well I’ve just spent 24 hours trying to get Great Circle Distance calculations working in Fortran, with precisely no success. I’ve tried the simple method (as used in Tim O’s geodist.pro, and the more complex and accurate method found elsewhere (wiki and other places). Neither give me results that are anything near reality. FFS.