5. Warming levels

This Notebook explores the options provided in xscen to analyze climate simulations through the scope of global warming levels, instead of temporal horizons.

First, we just need to prepare some data. We’ll use NorESM2-MM as our example dataset.

[2]:

# Basic imports
from pathlib import Path

import xarray as xr
import xesmf as xe
from matplotlib import pyplot as plt

import xscen as xs
from xscen.testing import datablock_3d, fake_data

# Prepare a Projectcatalog for this Tutorial.
output_folder = Path().absolute() / "_data"
project = {
    "title": "example-warminglevel",
    "description": "This is an example catalog for xscen's documentation.",
}
pcat = xs.ProjectCatalog(
    str(output_folder / "example-wl.json"),
    project=project,
    create=True,
    overwrite=True,
)

# Extract the data needed for the Tutorial
cat_sim = xs.search_data_catalogs(
    data_catalogs=[str(output_folder / "tutorial-catalog.json")],
    variables_and_freqs={"tas": "D"},
    other_search_criteria={"source": "NorESM2-MM", "activity": "ScenarioMIP"},
)
region = {
    "name": "example-region",
    "method": "bbox",
    "tile_buffer": 1.5,
    "lon_bnds": [-75, -74],
    "lat_bnds": [45, 46],
}

for ds_id, dc in cat_sim.items():
    ds = xs.extract_dataset(
        catalog=dc,
        region=region,
        xr_open_kwargs={"drop_variables": ["height", "time_bnds"]},
    )["D"]
    # Since the sample files are very small, we'll create fake data covering a longer time period
    data = fake_data(
        nyears=121,
        ny=len(ds.lat),
        nx=len(ds.lon),
        rand_type="tas",
        seed=list(cat_sim.keys()).index(ds_id),
        amplitude=15,
        offset=2,
    )
    attrs = ds.attrs
    ds = datablock_3d(
        data,
        "tas",
        "lon",
        -75,
        "lat",
        45,
        x_step=1,
        y_step=1.5,
        start="1/1/1981",
        freq="D",
        as_dataset=True,
    )
    ds.attrs = attrs

    filename = str(
        output_folder
        / f"wl_{ds.attrs['cat:id']}.{ds.attrs['cat:domain']}.{ds.attrs['cat:processing_level']}.{ds.attrs['cat:frequency']}.zarr"
    )
    chunks = xs.io.estimate_chunks(ds, dims=["time"], target_mb=50)
    xs.save_to_zarr(ds, filename, rechunk=chunks, mode="o")
    pcat.update_from_ds(ds=ds, path=filename, info_dict={"format": "zarr"})

Successfully wrote ESM catalog json file to: file:///home/docs/checkouts/readthedocs.org/user_builds/xscen/checkouts/latest/docs/notebooks/_data/example-wl.json

5.1. Find warming levels with only the model name

If all that you want to know is the year or the period during which a climate model reaches a given warming level, then xs.get_period_from_warming_level is the function to use since you can simply give it a string or a list of strings and receive that information.

The usual arguments of xs.get_period_from_warming_level are:

realization: Dataset, dict or string.
- Strings should follow the format ‘mip-era_source_experiment_member’. Those fields should be found in the dict or in the attributes of the dataset (allowing for a possible ‘cat:’ prefix).
- In all cases, regex is allowed to relax the name matching.
- The “source” part can also be a driving_model name. If a Dataset is passed and it’s driving_model attribute is non-null, it is used.
wl: warming level.
window: Number of years in the centered window during which the warming level is reached. Note that in the case of an even number, the IPCC standard is used (-n/2+1, +n/2).
tas_baseline_period: The period over which the warming level is calculated, equivalent to “+0°C”. Defaults to 1850-1900.
ignore_member: The default tas_src only contains data for 1 member. If you want a result regardless of the realization number, set this to True.
return_central_year: Whether to return the start/end of the period or to return the middle year.

It returns either a string or ['start_yr', 'end_yr'], depending on return_central_year. For entries that it fails to find in the database, or for instances where a given warming level is not reached, the function returns None (or [None, None]).

If realization is a list of the accepted types, or a DataArray or a DataFrame, the function returns a sequence of the same size (and with the same index, if relevant). It can happen that a requested model’s name was not found exactly in the database, but that arguments allowed for a relaxed search (ignore_member = True or regex in realization). In that case, the selected model doesn’t have the same name as the requested one and this information is only shown in the log, unless one passes output='selected' to receive a dictionary instead where the keys are the selected models in the database.

[3]:

# Multiple entries, returns a list of the same length
print(
    xs.get_period_from_warming_level(
        [
            "CMIP6_CanESM5_ssp126_r1i1p1f1",
            "CMIP6_CanESM5_ssp245_r1i1p1f1",
            "CMIP6_CanESM5_ssp370_r1i1p1f1",
            "CMIP6_CanESM5_ssp585_r1i1p1f1",
        ],
        wl=2,
        window=20,
    )
)
# Returns a list by default
print(
    xs.get_period_from_warming_level("CMIP6_CanESM5_ssp585_r1i1p1f1", wl=2, window=20)
)
# Only the middle year is requested, returns a string
print(
    xs.get_period_from_warming_level(
        "CMIP6_CanESM5_ssp585_r1i1p1f1", wl=2, window=20, return_central_year=True
    )
)
# +10°C is never reached, returns None
print(
    xs.get_period_from_warming_level("CMIP6_CanESM5_ssp585_r1i1p1f1", wl=10, window=20)
)

[['2017', '2036'], ['2015', '2034'], ['2014', '2033'], ['2013', '2032']]
['2013', '2032']
2022
[None, None]

The opposite search, getting the warming level associated to a given time period, is possible through a sister function get_warming_level_from_period. The function takes similar arguments, but with wl and window replaced by a period argument that follows the usual period formatting (e.g. ["2041", "2070"]).

[4]:

# Get the warming level associated to 2041-2070
print(
    xs.get_warming_level_from_period("CMIP6_CanESM5_ssp585_r1i1p1f1", ["2041", "2070"])
)

4.107697058823529

5.2. Find and extract data by warming levels

If you instead need to subset and analyze data, then two options are currently provided in xscen: subset_warming_level and produce_horizon.

Use subset_warming_level when you want to cut a dataset for a period corresponding to a given warming level, but leave its frequency untouched.
Use produce_horizon when you need to: subset a time period, compute indicators, and compute the climatological mean for one or multiple horizons.

The two methods are detailed in the following section.

5.2.1. Method #1: Subsetting datasets by warming level

xs.subset_warming_level can be used to subset a dataset for a window over which a given global warming level is reached. A new dimension named warminglevel is created by the function.

The function calls get_period_from_warming_level, so the arguments are essentially the same.:

ds: input dataset.
wl: warming level.
window: Number of years in the centered window during which the warming level is reached. Note that in the case of an even number, the IPCC standard is used (-n/2+1, +n/2).
tas_baseline_period: The period over which the warming level is calculated, equivalent to “+0°C”. Defaults to 1850-1900.
ignore_member: The default database only contains data for 1 member. If you want a result regardless of the realization number, set this to True.
to_level: Contrary to other methods, you can use “{wl}”, “{period0}” and “{period1}” in the string to dynamically include wl, ‘tas_baseline_period[0]’ and ‘tas_baseline_period[1]’ in the processing_level.
wl_dim: The string used to fill the new warminglevel dimension. You can use “{wl}”, “{period0}” and “{period1}” in the string to dynamically include wl, tas_baseline_period[0] and tas_baseline_period[1]. Or you can use True to have a float coordinate with units of °C. If None, no new dimension will be added.

If the source, experiment, (member), and warming level are not found in the database. The function returns None.

[5]:

ds = pcat.search(
    processing_level="extracted",
    experiment="ssp245",
    member="r1.*",
    source="NorESM2-MM",
    frequency="day",
).to_dataset()

xs.subset_warming_level(
    ds,
    wl=2,
    window=20,
)

[5]:

<xarray.Dataset> Size: 409kB
Dimensions:       (warminglevel: 1, time: 7305, lat: 3, lon: 2)
Coordinates:
  * warminglevel  (warminglevel) object 8B '+2Cvs1850-1900'
  * time          (time) datetime64[ns] 58kB 2069-01-01 ... 2088-12-31
  * lat           (lat) float64 24B 45.0 46.5 48.0
  * lon           (lon) int64 16B -75 -74
Data variables:
    tas           (warminglevel, time, lat, lon) float64 351kB dask.array<chunksize=(1, 7305, 3, 2), meta=np.ndarray>
Attributes: (12/22)
    cat:_data_format_:       zarr
    cat:activity:            ScenarioMIP
    cat:date_end:            2101-12-02 00:00:00
    cat:date_start:          1981-01-01 00:00:00
    cat:domain:              example-region
    cat:experiment:          ssp245
    ...                      ...
    cat:xrfreq:              D
    comment:                 This is a test file created for the xscen tutori...
    history:                 [2026-07-06 21:09:11] bbox spatial subsetting wi...
    intake_esm_dataset_key:  CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1...
    intake_esm_vars:         ('tas',)
    version:                 v20191108

5.2.1.1. Vectorized subsetting

The function can also vectorize the subsetting over multiple warming levels or over a properly constructed “realization” dimension. In that case, the original time axis can’t be preserved. It is replaced by a fake one starting in 1000. However, as this process is a bit complex, the current xscen version only supports this if the data is annual. As the time axis doesn’t carry any information, a warminglevel_bounds coordinate is added with the time bounds of the subsetting. If a warming level was not reached, a NaN slice is inserted in the output dataset.

This option is to be used when “scalar” subsetting is not enough, but you want to do things differently than produce_horizons.

Here, we’ll open all experiments into a single ensemble dataset where the realization dimension is constructed exactly as get_period_from_warming_level expects it to be. We’ll also average the daily data to an annual scale.

[6]:

ds = pcat.search(
    processing_level="extracted",
    member="r1.*",
    frequency="day",
).to_dataset(
    # Value of the "realization" dimension will be constructed by concatenating those fields with a '_'
    create_ensemble_on=["mip_era", "source", "experiment", "member"]
)
ds = ds.resample(time="YS").mean()
ds

[6]:

<xarray.Dataset> Size: 24kB
Dimensions:      (time: 121, realization: 4, lat: 3, lon: 2)
Coordinates:
  * time         (time) object 968B 1981-01-01 00:00:00 ... 2101-01-01 00:00:00
  * realization  (realization) object 32B 'CMIP6_NorESM2-MM_ssp126_r1i1p1f1' ...
  * lat          (lat) float64 24B 45.0 46.5 48.0
  * lon          (lon) int64 16B -75 -74
Data variables:
    tas          (time, realization, lat, lon) float64 23kB dask.array<chunksize=(121, 1, 3, 2), meta=np.ndarray>
Attributes: (12/19)
    cat:_data_format_:       zarr
    cat:activity:            ScenarioMIP
    cat:date_end:            2101-12-02 00:00:00
    cat:date_start:          1981-01-01 00:00:00
    cat:domain:              example-region
    cat:frequency:           day
    ...                      ...
    cat:variable:            tas
    cat:xrfreq:              D
    comment:                 This is a test file created for the xscen tutori...
    intake_esm_vars:         ('tas',)
    version:                 v20191108
    intake_esm_dataset_key:  ScenarioMIP_NCC_example-region.example-region.ex...

[7]:

xs.subset_warming_level(ds, wl=[1.5, 2, 3], wl_dim=True, to_level="warming-level")

[7]:

<xarray.Dataset> Size: 12kB
Dimensions:              (realization: 4, warminglevel: 3, time: 20, lat: 3,
                          lon: 2, wl_bounds: 2)
Coordinates:
  * realization          (realization) object 32B 'CMIP6_NorESM2-MM_ssp126_r1...
  * warminglevel         (warminglevel) float64 24B 1.5 2.0 3.0
  * time                 (time) object 160B 1000-01-01 00:00:00 ... 1019-01-0...
  * lat                  (lat) float64 24B 45.0 46.5 48.0
  * lon                  (lon) int64 16B -75 -74
    warminglevel_bounds  (realization, warminglevel, wl_bounds) object 192B n...
Dimensions without coordinates: wl_bounds
Data variables:
    tas                  (realization, warminglevel, time, lat, lon) float64 12kB dask.array<chunksize=(2, 3, 20, 3, 2), meta=np.ndarray>
Attributes: (12/19)
    cat:_data_format_:       zarr
    cat:activity:            ScenarioMIP
    cat:date_end:            2101-12-02 00:00:00
    cat:date_start:          1981-01-01 00:00:00
    cat:domain:              example-region
    cat:frequency:           day
    ...                      ...
    cat:variable:            tas
    cat:xrfreq:              D
    comment:                 This is a test file created for the xscen tutori...
    intake_esm_vars:         ('tas',)
    version:                 v20191108
    intake_esm_dataset_key:  ScenarioMIP_NCC_example-region.example-region.ex...

5.2.2. Method #2: Producing horizons

If what you need is to compute indicators and their climatological mean, xs.aggregate.produce_horizon is a more convenient function to work with than subset_warming_level. Since the years are meaningless for warming levels, and are even detrimental to making ensemble statistics, the function formats the output as to remove the ‘time’ and ‘year’ information from the dataset, while the seasons/months are unstacked to different coordinates. Hence, the single dataset outputted can contain indicators of different frequencies, as well as multiple warming levels or temporal horizons.

The arguments of xs.aggregate.produce_horizon are:

ds: input dataset.
indicators: As in compute_indicators
periods: Periods to cut.
warminglevels: Dictionary of arguments to pass to subset_warming_level. If ‘wl’ is a list, the function will be called for each value and produce multiple horizons.

If both periods and warminglevels are None, the full time series will be used. If a dataset does not contain a given period or warming level, then that specific period will be skipped.

[8]:

dict_input = pcat.search(processing_level="extracted", xrfreq="D").to_dataset_dict(
    xarray_open_kwargs={"decode_timedelta": False}
)

for id_input, ds_input in dict_input.items():
    # 1981-2010 will be used as our reference period. We can compute it at the same time.
    ds_hor = xs.produce_horizon(
        ds_input,
        indicators="samples/indicators.yml",
        periods=[["1981", "2010"]],
        warminglevels={"wl": [1, 1.5, 2], "window": 30, "ignore_member": True},
        to_level="horizons",
    )

    # Save
    filename = str(
        output_folder
        / f"wl_{ds_hor.attrs['cat:id']}.{ds_hor.attrs['cat:domain']}.{ds_hor.attrs['cat:processing_level']}.{ds_hor.attrs['cat:frequency']}.zarr"
    )
    xs.save_to_zarr(ds_hor, filename, mode="o")
    pcat.update_from_ds(ds=ds_hor, path=filename, info_dict={"format": "zarr"})


--> The keys in the returned dictionary of datasets are constructed as follows:
        'id.domain.processing_level.xrfreq'

100.00% [5/5 00:00<00:00]

[9]:

display(ds_hor)

<xarray.Dataset> Size: 680B
Dimensions:              (horizon: 4, lat: 3, lon: 2)
Coordinates:
  * horizon              (horizon) <U16 256B '1981-2010' ... '+2Cvs1850-1900'
  * lat                  (lat) float64 24B 45.0 46.5 48.0
  * lon                  (lon) int64 16B -75 -74
Data variables:
    growing_degree_days  (horizon, lat, lon) float64 192B dask.array<chunksize=(1, 3, 2), meta=np.ndarray>
    tg_min               (horizon, lat, lon) float64 192B dask.array<chunksize=(1, 3, 2), meta=np.ndarray>
Attributes: (12/22)
    cat:_data_format_:       zarr
    cat:activity:            ScenarioMIP
    cat:date_end:            2101-12-02 00:00:00
    cat:date_start:          1981-01-01 00:00:00
    cat:domain:              example-region
    cat:experiment:          ssp245
    ...                      ...
    cat:xrfreq:              fx
    comment:                 This is a test file created for the xscen tutori...
    history:                 [2026-07-06 21:09:11] bbox spatial subsetting wi...
    intake_esm_dataset_key:  CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r2i1p1f1...
    intake_esm_vars:         ('tas',)
    version:                 v20191108

5.3. Deltas and spatial aggregation

This step is done as in the Getting Started Notebook. Here we will spatially aggregate the data, but the datasets could also be regridded to a common grid.

[10]:

dict_wl = pcat.search(processing_level="horizons").to_dataset_dict(
    xarray_open_kwargs={"decode_timedelta": False}
)

for id_wl, ds_wl in dict_wl.items():
    # compute delta
    ds_delta = xs.aggregate.compute_deltas(
        ds=ds_wl, reference_horizon="1981-2010", to_level="deltas"
    )

    # remove the reference period from the dataset
    ds_delta = ds_delta.sel(horizon=~ds_delta.horizon.isin(["1981-2010"]))

    # aggregate
    ds_delta["lon"].attrs["axis"] = "X"
    ds_delta["lat"].attrs["axis"] = "Y"
    ds_agg = xs.spatial_mean(
        ds_delta,
        method="cos-lat",
        to_level="deltas-agg",
    )

    # Save
    filename = str(
        output_folder
        / f"wl_{ds_agg.attrs['cat:id']}.{ds_agg.attrs['cat:domain']}.{ds_agg.attrs['cat:processing_level']}.{ds_agg.attrs['cat:frequency']}.zarr"
    )
    xs.save_to_zarr(ds_agg, filename, mode="o")
    pcat.update_from_ds(ds=ds_agg, path=filename, info_dict={"format": "zarr"})


--> The keys in the returned dictionary of datasets are constructed as follows:
        'id.domain.processing_level.xrfreq'

100.00% [5/5 00:00<00:00]

[11]:

display(ds_agg)

<xarray.Dataset> Size: 240B
Dimensions:                              (horizon: 3)
Coordinates:
  * horizon                              (horizon) <U16 192B '+1Cvs1850-1900'...
Data variables:
    growing_degree_days_delta_1981_2010  (horizon) float64 24B dask.array<chunksize=(1,), meta=np.ndarray>
    tg_min_delta_1981_2010               (horizon) float64 24B dask.array<chunksize=(1,), meta=np.ndarray>
Attributes: (12/22)
    cat:_data_format_:       zarr
    cat:activity:            ScenarioMIP
    cat:date_end:            2101-12-02 00:00:00
    cat:date_start:          1981-01-01 00:00:00
    cat:domain:              example-region
    cat:experiment:          ssp245
    ...                      ...
    cat:xrfreq:              fx
    comment:                 This is a test file created for the xscen tutori...
    history:                 [2026-07-06 21:09:16] weighted mean(dim=('lon', ...
    intake_esm_dataset_key:  CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1...
    intake_esm_vars:         tg_min, growing_degree_days
    version:                 v20191108

5.4. Ensemble statistics

Even more than with time-based horizons, the first step of ensemble statistics should be to generate the weights. Indeed, if a model has 3 experiments reaching a given warming level, we want it to have the same weight as a model with only 2 experiments reaching that warming level. The argument skipna=False should be passed to xs.generate_weights to properly assess which simulations reaches which warming level. If the horizon dimension differs between datasets (as is the case here), they are reindexed and given a weight of 0.

When working with warming levels, how to assess experiments is more open-ended. The IPCC Atlas splits the statistics and climate change signals by SSPs, even when they are being analyzed through warming levels, but experiments could also be considered as ‘members’ of a given model and used to bolster the number of realizations.

[12]:

datasets = pcat.search(processing_level="deltas-agg").to_dataset_dict()


--> The keys in the returned dictionary of datasets are constructed as follows:
        'id.domain.processing_level.xrfreq'

100.00% [5/5 00:00<00:00]

[13]:

# All realisations of NorESM2-MM are given the same weight for the first horizon, while it correctly recognizes that +1.5°C and +2°C weren't reached for the SSP126.
weights = xs.ensembles.generate_weights(
    datasets=datasets, independence_level="model", skipna=False
)
display(weights)

/tmp/ipykernel_4979/4289363083.py:2: UserWarning: Extra dimension horizon is not the same for all datasets. Reindexing.

<xarray.DataArray (realization: 5, horizon: 3)> Size: 120B
array([[0.2 , 0.25, 0.25],
       [0.2 , 0.25, 0.25],
       [0.2 , 0.25, 0.25],
       [0.2 , 0.25, 0.25],
       [0.2 , 0.  , 0.  ]])
Coordinates:
  * realization  (realization) <U92 2kB 'CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp...
  * horizon      (horizon) <U16 192B '+1Cvs1850-1900' ... '+2Cvs1850-1900'

Next, the weights and the datasets can be passed to xs.ensemble_stats to calculate the ensemble statistics.

[14]:

ds_ens = xs.ensemble_stats(
    datasets=datasets,
    common_attrs_only=True,
    weights=weights,
    statistics={"ensemble_mean_std_max_min": None},
    to_level=f"ensemble-deltas-wl",
)

# It is sometimes useful to keep track of how many realisations made the ensemble.
ds_ens.horizon.attrs["ensemble_size"] = len(datasets)

filename = str(
    output_folder
    / f"wl_{ds_ens.attrs['cat:id']}.{ds_ens.attrs['cat:domain']}.{ds_ens.attrs['cat:processing_level']}.{ds_ens.attrs['cat:frequency']}.zarr"
)
xs.save_to_zarr(ds_ens, filename, mode="o")
pcat.update_from_ds(ds=ds_ens, path=filename, info_dict={"format": "zarr"})

[15]:

display(ds_ens)

<xarray.Dataset> Size: 384B
Dimensions:                                    (horizon: 3)
Coordinates:
  * horizon                                    (horizon) <U16 192B '+1Cvs1850...
Data variables:
    growing_degree_days_delta_1981_2010_mean   (horizon) float64 24B dask.array<chunksize=(1,), meta=np.ndarray>
    growing_degree_days_delta_1981_2010_stdev  (horizon) float64 24B dask.array<chunksize=(1,), meta=np.ndarray>
    growing_degree_days_delta_1981_2010_max    (horizon) float64 24B dask.array<chunksize=(1,), meta=np.ndarray>
    growing_degree_days_delta_1981_2010_min    (horizon) float64 24B dask.array<chunksize=(1,), meta=np.ndarray>
    tg_min_delta_1981_2010_mean                (horizon) float64 24B dask.array<chunksize=(1,), meta=np.ndarray>
    tg_min_delta_1981_2010_stdev               (horizon) float64 24B dask.array<chunksize=(1,), meta=np.ndarray>
    tg_min_delta_1981_2010_max                 (horizon) float64 24B dask.array<chunksize=(1,), meta=np.ndarray>
    tg_min_delta_1981_2010_min                 (horizon) float64 24B dask.array<chunksize=(1,), meta=np.ndarray>
Attributes: (12/16)
    cat:_data_format_:     zarr
    cat:activity:          ScenarioMIP
    cat:domain:            example-region
    cat:frequency:         fx
    cat:institution:       NCC
    cat:mip_era:           CMIP6
    ...                    ...
    cat:xrfreq:            fx
    comment:               This is a test file created for the xscen tutorial...
    intake_esm_vars:       tg_min_delta_1981_2010, growing_degree_days_delta_...
    version:               v20191108
    cat:id:                CMIP6_ScenarioMIP_NCC_NorESM2-MM_example-region
    ensemble_size:         5