06. Global atmospheric temperature trends

Analysing large data sets in MATLAB/octave

By Edward Sternin, 2023-10

This simple demo of octave-based data plotting is based on the publicly-available data from UAH (University of Alabama, Huntsville), as reported by the group of John Christy christy@atmos.uah.edu. The raw satellite data (NOAA series of satellites; AQUA, starting in mid-2002) from the National Space Science and Technology Centre (http://vortex.nsstc.uah.edu/) is extensively verified and corrected for calibration errors, as those are detected. For example, camera drift as NOAA-18 and NOAA-19 satellites aged was discovered and corrected in 2017, as described here. This is the best large-scale atmospheric temperature data that we (the humans) have, free of any bias that may (or may not) exist in the surface-measured temperatures.

The following only needs to be done once, using either method below. Rather than read the external data files directly, we first make local copies using the operating system command wget or urlwrite(external_URL,local_file_name):

Reading the data in

Notes:

Visualizing the data and identifying influencers

The most obvious influencer is the solar input, and a visual inspection confirms that there is a likely correlation between the solar activity and the atmospheric temperatires:

Clearly, some of the oscillations observed in the temperature data have the same periodicity and are similar in phase to the Sun activity as reported by either of the sun spots data. The next step is to perform some form of regression analysis, to try to remove the effects of this strong influencer.

However, it is also obvious, especially from the lower stratospheric data, that something else has a dramatic short-term effect on temperature. The two peaks at 1982-93 and 1992-93 stand out. These seem to correspond to the time of two major recent volcaninc eruptions, El Chichon (Mexico, 1982) and Mt. Pinatubo (Phillipines, June 1991). Therefore, full analysis will have to include multiple regressions against both the solar activity and the atmospheric transmission rate which was greatly affected by the volcanic ash emissions.

Unfortunately, the data for atmospheric transparency is not as extensive, and some missing dates may need to be excluded from consideration. The atmospheric transmission data is derived from the measurements of direct solar radiation at Mauna Loa, Hawaii. The reason why the data measured at a location in Hawaii are very valuable for the global records is because of its remote location from any anthropogenic activities and pollution sources, other land-related events, cross-Atlantic dust storm signatures or continent weather conditions. It is not a perfect global marker; for example, the April 2010 volcanic eruption in Iceland (Eyjafjallajökull) does not seem to affect the Mauna Loa data very strongly, with only a small feature around 2010. Still, it is a good starting point.

First we download the transparency data and add another graph to the previous plot(s). The transparency decreases due to volcanic eruptions appear to be highly correlated with prominent temperature peaks in the lower statosphere.

Dataframes are useful

Dataframes are objects that offer database or spreadsheet-like properties. After you have all your data gathered into a single dataframe, with descriptive headers, column types (strings, numerical values, etc.) established, and erratic or missing values identified, certain global operations become particularly simple.

In order to analyze the atmospheric temperature data further and to perform some statistical or other forms of analysis, we blend into a dataframe the data on solar activity from two different sources. Since the data in this case is a time series of measurements we establish a common timebase index for all data.

Multi-factor regression analysis

A full regression analysis can calculate the statistical influence of the two major factors, and remove them from the data. Other effects and long-term trends in the data that may be obscured by these major influencers, may then become apparent. However, it is clear that there could be a time lag between a rise in ash emissions and the associated reduction in the amount of radiation reaching the surface, and any temperature changes in the various layers of the atmosphere. In fact, each of the identified layers may well have its own time delay, different from the others.

Homework

Complete the multi-factor regression analysis of the data, and plot the temperature data adjusted by subtracting the effects of the two major influencers, level of solar activity and the atmospheric transparency. Be sure to consider and allow for appropriate time lag in the response of the temperature to changes in the influencers.

Follow-up questions

Evaluate the long-term trends (slope, in $^\circ$C/decade) in the three observed atmospheric zones. Use appropriate statistical methods to evaluate the 95% significance interval ($p<0.05$ for the null hypothesis) for the result. Does this interval include zero? What does this mean?

Note the sign of the slope in the linear regression of the atmospheric transparency. What does this imply in terms of the planetary radiative energy balance for Earth?

Note the size of the regression estimates for the three atmosphetic zones, and compare the strengths of our influencers on the temperature in the three zones. Discuss.