You can run this notebook in a live session or view it on Github.
Working with Multidimensional Coordinates¶
Author: Ryan Abernathey
Many datasets have physical coordinates which differ from their logical coordinates. Xarray provides several ways to plot and analyze such datasets.
[1]:
%matplotlib inline
import numpy as np
import pandas as pd
import xarray as xr
import cartopy.crs as ccrs
from matplotlib import pyplot as plt
As an example, consider this dataset from the xarray-data repository.
[2]:
ds = xr.tutorial.open_dataset("rasm").load()
ds
---------------------------------------------------------------------------
gaierror Traceback (most recent call last)
File /usr/lib/python3/dist-packages/urllib3/connection.py:174, in HTTPConnection._new_conn(self)
173 try:
--> 174 conn = connection.create_connection(
175 (self._dns_host, self.port), self.timeout, **extra_kw
176 )
178 except SocketTimeout:
File /usr/lib/python3/dist-packages/urllib3/util/connection.py:73, in create_connection(address, timeout, source_address, socket_options)
69 return six.raise_from(
70 LocationParseError("'%s', label empty or too long" % host), None
71 )
---> 73 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
74 af, socktype, proto, canonname, sa = res
File /usr/lib/python3.12/socket.py:964, in getaddrinfo(host, port, family, type, proto, flags)
963 addrlist = []
--> 964 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
965 af, socktype, proto, canonname, sa = res
gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
NewConnectionError Traceback (most recent call last)
File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:716, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
715 # Make the request on the httplib connection object.
--> 716 httplib_response = self._make_request(
717 conn,
718 method,
719 url,
720 timeout=timeout_obj,
721 body=body,
722 headers=headers,
723 chunked=chunked,
724 )
726 # If we're going to release the connection in ``finally:``, then
727 # the response doesn't need to know about the connection. Otherwise
728 # it will also try to release it and we'll have a double-release
729 # mess.
File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:405, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
404 try:
--> 405 self._validate_conn(conn)
406 except (SocketTimeout, BaseSSLError) as e:
407 # Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.
File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:1059, in HTTPSConnectionPool._validate_conn(self, conn)
1058 if not getattr(conn, "sock", None): # AppEngine might not have `.sock`
-> 1059 conn.connect()
1061 if not conn.is_verified:
File /usr/lib/python3/dist-packages/urllib3/connection.py:363, in HTTPSConnection.connect(self)
361 def connect(self):
362 # Add certificate verification
--> 363 self.sock = conn = self._new_conn()
364 hostname = self.host
File /usr/lib/python3/dist-packages/urllib3/connection.py:186, in HTTPConnection._new_conn(self)
185 except SocketError as e:
--> 186 raise NewConnectionError(
187 self, "Failed to establish a new connection: %s" % e
188 )
190 return conn
NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f50ab054b90>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last)
File /usr/lib/python3/dist-packages/requests/adapters.py:667, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
666 try:
--> 667 resp = conn.urlopen(
668 method=request.method,
669 url=url,
670 body=request.body,
671 headers=request.headers,
672 redirect=False,
673 assert_same_host=False,
674 preload_content=False,
675 decode_content=False,
676 retries=self.max_retries,
677 timeout=timeout,
678 chunked=chunked,
679 )
681 except (ProtocolError, OSError) as err:
File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:800, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
798 e = ProtocolError("Connection aborted.", e)
--> 800 retries = retries.increment(
801 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
802 )
803 retries.sleep()
File /usr/lib/python3/dist-packages/urllib3/util/retry.py:592, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
591 if new_retry.is_exhausted():
--> 592 raise MaxRetryError(_pool, url, error or ResponseError(cause))
594 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)
MaxRetryError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /pydata/xarray-data/raw/master/rasm.nc (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f50ab054b90>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last)
Cell In[2], line 1
----> 1 ds = xr.tutorial.open_dataset("rasm").load()
2 ds
File /usr/lib/python3/dist-packages/xarray/tutorial.py:161, in open_dataset(name, cache, cache_dir, engine, **kws)
158 url = f"{base_url}/raw/{version}/{path.name}"
160 # retrieve the file
--> 161 filepath = pooch.retrieve(url=url, known_hash=None, path=cache_dir)
162 ds = _open_dataset(filepath, engine=engine, **kws)
163 if not cache:
File /usr/lib/python3/dist-packages/pooch/core.py:239, in retrieve(url, known_hash, fname, path, processor, downloader, progressbar)
236 if downloader is None:
237 downloader = choose_downloader(url, progressbar=progressbar)
--> 239 stream_download(url, full_path, known_hash, downloader, pooch=None)
241 if known_hash is None:
242 get_logger().info(
243 "SHA256 hash of downloaded file: %s\n"
244 "Use this value as the 'known_hash' argument of 'pooch.retrieve'"
(...)
247 file_hash(str(full_path)),
248 )
File /usr/lib/python3/dist-packages/pooch/core.py:807, in stream_download(url, fname, known_hash, downloader, pooch, retry_if_failed)
803 try:
804 # Stream the file to a temporary so that we can safely check its
805 # hash before overwriting the original.
806 with temporary_file(path=str(fname.parent)) as tmp:
--> 807 downloader(url, tmp, pooch)
808 hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
809 shutil.move(tmp, str(fname))
File /usr/lib/python3/dist-packages/pooch/downloaders.py:208, in HTTPDownloader.__call__(self, url, output_file, pooch, check_only)
206 output_file = open(output_file, "w+b")
207 try:
--> 208 response = requests.get(url, **kwargs)
209 response.raise_for_status()
210 content = response.iter_content(chunk_size=self.chunk_size)
File /usr/lib/python3/dist-packages/requests/api.py:73, in get(url, params, **kwargs)
62 def get(url, params=None, **kwargs):
63 r"""Sends a GET request.
64
65 :param url: URL for the new :class:`Request` object.
(...)
70 :rtype: requests.Response
71 """
---> 73 return request("get", url, params=params, **kwargs)
File /usr/lib/python3/dist-packages/requests/api.py:59, in request(method, url, **kwargs)
55 # By using the 'with' statement we are sure the session is closed, thus we
56 # avoid leaving sockets open which can trigger a ResourceWarning in some
57 # cases, and look like a memory leak in others.
58 with sessions.Session() as session:
---> 59 return session.request(method=method, url=url, **kwargs)
File /usr/lib/python3/dist-packages/requests/sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
584 send_kwargs = {
585 "timeout": timeout,
586 "allow_redirects": allow_redirects,
587 }
588 send_kwargs.update(settings)
--> 589 resp = self.send(prep, **send_kwargs)
591 return resp
File /usr/lib/python3/dist-packages/requests/sessions.py:703, in Session.send(self, request, **kwargs)
700 start = preferred_clock()
702 # Send the request
--> 703 r = adapter.send(request, **kwargs)
705 # Total elapsed time of the request (approximately)
706 elapsed = preferred_clock() - start
File /usr/lib/python3/dist-packages/requests/adapters.py:700, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
696 if isinstance(e.reason, _SSLError):
697 # This branch is for urllib3 v1.22 and later.
698 raise SSLError(e, request=request)
--> 700 raise ConnectionError(e, request=request)
702 except ClosedPoolError as e:
703 raise ConnectionError(e, request=request)
ConnectionError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /pydata/xarray-data/raw/master/rasm.nc (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f50ab054b90>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))
In this example, the logical coordinates are x and y, while the physical coordinates are xc and yc, which represent the longitudes and latitudes of the data.
[3]:
print(ds.xc.attrs)
print(ds.yc.attrs)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 print(ds.xc.attrs)
2 print(ds.yc.attrs)
NameError: name 'ds' is not defined
Plotting¶
Let’s examine these coordinate variables by plotting them.
[4]:
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14, 4))
ds.xc.plot(ax=ax1)
ds.yc.plot(ax=ax2)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 2
1 fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14, 4))
----> 2 ds.xc.plot(ax=ax1)
3 ds.yc.plot(ax=ax2)
NameError: name 'ds' is not defined
Note that the variables xc (longitude) and yc (latitude) are two-dimensional scalar fields.
If we try to plot the data variable Tair, by default we get the logical coordinates.
[5]:
ds.Tair[0].plot()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 1
----> 1 ds.Tair[0].plot()
NameError: name 'ds' is not defined
In order to visualize the data on a conventional latitude-longitude grid, we can take advantage of xarray’s ability to apply cartopy map projections.
[6]:
plt.figure(figsize=(14, 6))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_global()
ds.Tair[0].plot.pcolormesh(
ax=ax, transform=ccrs.PlateCarree(), x="xc", y="yc", add_colorbar=False
)
ax.coastlines()
ax.set_ylim([0, 90]);
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 4
2 ax = plt.axes(projection=ccrs.PlateCarree())
3 ax.set_global()
----> 4 ds.Tair[0].plot.pcolormesh(
5 ax=ax, transform=ccrs.PlateCarree(), x="xc", y="yc", add_colorbar=False
6 )
7 ax.coastlines()
8 ax.set_ylim([0, 90]);
NameError: name 'ds' is not defined
Multidimensional Groupby¶
The above example allowed us to visualize the data on a regular latitude-longitude grid. But what if we want to do a calculation that involves grouping over one of these physical coordinates (rather than the logical coordinates), for example, calculating the mean temperature at each latitude. This can be achieved using xarray’s groupby function, which accepts multidimensional variables. By default, groupby will use every unique value in the variable, which is probably not what we want.
Instead, we can use the groupby_bins function to specify the output coordinates of the group.
[7]:
# define two-degree wide latitude bins
lat_bins = np.arange(0, 91, 2)
# define a label for each bin corresponding to the central latitude
lat_center = np.arange(1, 90, 2)
# group according to those bins and take the mean
Tair_lat_mean = ds.Tair.groupby_bins("yc", lat_bins, labels=lat_center).mean(
dim=xr.ALL_DIMS
)
# plot the result
Tair_lat_mean.plot()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 6
4 lat_center = np.arange(1, 90, 2)
5 # group according to those bins and take the mean
----> 6 Tair_lat_mean = ds.Tair.groupby_bins("yc", lat_bins, labels=lat_center).mean(
7 dim=xr.ALL_DIMS
8 )
9 # plot the result
10 Tair_lat_mean.plot()
NameError: name 'ds' is not defined
The resulting coordinate for the groupby_bins operation got the _bins suffix appended: yc_bins. This help us distinguish it from the original multidimensional variable yc.
Note: This group-by-latitude approach does not take into account the finite-size geometry of grid cells. It simply bins each value according to the coordinates at the cell center. Xarray has no understanding of grid cells and their geometry. More precise geographic regridding for xarray data is available via the xesmf package.