{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# Remove flox spam\n", "\n", "import logging\n", "\n", "# Get the logger for the 'flox' package\n", "logger = logging.getLogger(\"flox\")\n", "# Set the logging level to WARNING\n", "logger.setLevel(logging.WARNING)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Ensembles" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ensemble reduction\n", "\n", "This tutorial will explore ensemble reduction (also known as ensemble selection) using `xscen`. This will use pre-computed annual mean temperatures from `xclim.testing`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pooch\n", "import xarray as xr\n", "from xclim.testing.utils import nimbus\n", "\n", "import xscen as xs\n", "\n", "downloader = pooch.HTTPDownloader(headers={\"User-Agent\": f\"xscen-{xs.__version__}\"})\n", "\n", "datasets = {\n", " \"ACCESS\": \"EnsembleStats/BCCAQv2+ANUSPLIN300_ACCESS1-0_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc\",\n", " \"BNU-ESM\": \"EnsembleStats/BCCAQv2+ANUSPLIN300_BNU-ESM_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc\",\n", " \"CCSM4-r1\": \"EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc\",\n", " \"CCSM4-r2\": \"EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r2i1p1_1950-2100_tg_mean_YS.nc\",\n", " \"CNRM-CM5\": \"EnsembleStats/BCCAQv2+ANUSPLIN300_CNRM-CM5_historical+rcp45_r1i1p1_1970-2050_tg_mean_YS.nc\",\n", "}\n", "\n", "for d in datasets:\n", " file = nimbus().fetch(datasets[d], downloader=downloader)\n", " ds = xr.open_dataset(file).isel(lon=slice(0, 4), lat=slice(0, 4))\n", " ds = xs.climatological_op(\n", " ds,\n", " op=\"mean\",\n", " window=30,\n", " periods=[[1981, 2010], [2021, 2050]],\n", " horizons_as_dim=True,\n", " ).drop_vars(\"time\")\n", " datasets[d] = xs.compute_deltas(ds, reference_horizon=\"1981-2010\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preparing the data\n", "\n", "Ensemble reduction is built upon climate indicators that are relevant to represent the ensemble's variability for a given application. In this case, we'll use the mean temperature delta between 2021-2050 and 1981-2010, but monthly or seasonal indicators could also be required. The `horizons_as_dim` argument in `climatological_op` can help combine indicators of multiple frequencies into a single dataset. Alternatively, `xscen.utils.unstack_dates` can also accomplish the same thing if the climatological operations have already been computed.\n", "\n", "The functions implemented in `xclim.ensembles._reduce` require a very specific 2-D DataArray of dimensions \"realization\" and \"criteria\". The first solution is to first create an ensemble using `xclim.ensembles.create_ensemble`, then pass the result to `xclim.ensembles.make_criteria`. Alternatively, the datasets can be passed directly to `xscen.ensembles.reduce_ensemble` and the necessary preliminary steps will be accomplished automatically.\n", "\n", "In this example, the number of criteria will corresponds to: `indicators x horizons x longitude x latitude`, but criteria that are purely NaN across all realizations will be removed.\n", "\n", "Note that `xs.spatial_mean` could have been used prior to calling that function to remove the spatial dimensions.\n", "\n", "### Selecting a reduced ensemble\n", "\n", "
NOTE\n", " \n", "Ensemble reduction in `xscen` is built upon `xclim.ensembles`. For more information on basic usage and available methods, [please consult their documentation](https://xclim.readthedocs.io/en/stable/notebooks/ensembles-advanced.html).\n", "
\n", "\n", "Ensemble reduction through `xscen.reduce_ensemble` consists in a simple call to `xclim`. The arguments are:\n", "- `data`, which is the aforementioned 2D DataArray, or the list/dict of datasets required to build it.\n", "- `method` is either `kkz` or `kmeans`. See the link above for further details on each technique.\n", "- `horizons` is used to instruct on which horizon(s) to build the data from, if data needs to be constructed.\n", "- `create_kwargs`, the arguments to pass to `xclim.ensembles.create_ensemble` if data needs to be constructed.\n", "- `kwargs` is a dictionary of arguments to send to the clustering method chosen." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "selected, clusters, fig_data = xs.reduce_ensemble(\n", " data=datasets, method=\"kmeans\", horizons=[\"2021-2050\"], max_clusters=3\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The method always returns 3 outputs (selected, clusters, fig_data):\n", "- `selected` is a DataArray of dimension 'realization' listing the selected simulations.\n", "- `clusters` (kmeans only) groups every realization in their respective clusters in a python dictionary.\n", "- `fig_data` (kmeans only) can be used to call `xclim.ensembles.plot_rsqprofile(fig_data)`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "selected" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# To see the clusters in more details\n", "clusters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from xclim.ensembles import plot_rsqprofile\n", "\n", "plot_rsqprofile(fig_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ensemble partition\n", "This tutorial will show how to use xscen to create the input for [xclim partition functions](https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "is_executing": true }, "outputs": [], "source": [ "# Get catalog\n", "from pathlib import Path\n", "\n", "import xclim as xc\n", "\n", "output_folder = Path().absolute() / \"_data\"\n", "cat = xs.DataCatalog(str(output_folder / \"tutorial-catalog.json\"))\n", "\n", "# create a dictionary of datasets wanted for the partition\n", "input_dict = cat.search(variable=\"tas\", member=\"r1i1p1f1\").to_dataset_dict(\n", " xarray_open_kwargs={\"engine\": \"h5netcdf\"}\n", ")\n", "datasets = {}\n", "for k, v in input_dict.items():\n", " ds = xc.atmos.tg_mean(v.tas).to_dataset()\n", " ds.attrs = v.attrs\n", " datasets[k] = ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From a dictionary of datasets, the function creates a dataset with new dimensions in `partition_dim`(`[\"source\", \"experiment\", \"bias_adjust_project\"]`, if they exist). In this toy example, we only have different experiments.\n", "- By default, it translates the xscen vocabulary (eg. `experiment`) to the xclim partition vocabulary (eg. `scenario`). It is possible to pass `rename_dict` to rename the dimensions with other names.\n", "- If the inputs are not on the same grid, they can be regridded through `regrid_kw` or subset to a point through `subset_kw`. The functions assumes that if there are different `bias_adjust_project`, they will be on different grids (with all `source` on the same grid). If there is one or less `bias_adjust_project`, the assumption is that`source` have different grids.\n", "- You can also compute indicators on the data if the input is daily. This can be especially useful when the daily input data is on different calendars." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# build a single dataset\n", "import xclim as xc\n", "\n", "ds = xs.ensembles.build_partition_data(\n", " datasets,\n", " subset_kw=dict(name=\"mtl\", method=\"gridpoint\", lat=[45.5], lon=[-73.6]),\n", ")\n", "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pass the input to an xclim partition function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# This is a hidden cell.\n", "# extend with fake data to have at least 3 years\n", "import xarray as xr\n", "\n", "ds2 = ds.copy()\n", "ds[\"time\"] = xr.cftime_range(start=\"2001-01-01\", periods=len(ds[\"time\"]), freq=\"YS\")\n", "ds2[\"time\"] = xr.cftime_range(start=\"2003-01-01\", periods=len(ds[\"time\"]), freq=\"YS\")\n", "ds = xr.concat([ds, ds2], dim=\"time\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# compute uncertainty partitioning\n", "mean, uncertainties = xc.ensembles.hawkins_sutton(ds.tg_mean)\n", "uncertainties" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
NOTE\n", " \n", "Note that the [figanos library](https://figanos.readthedocs.io/en/latest/) provides a function `fg.partition` to plot the uncertainties.\n", " \n", "
" ] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "celltoolbar": "Aucun(e)", "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 4 }