{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "# Remove flox spam\n",
    "\n",
    "import logging\n",
    "\n",
    "# Get the logger for the 'flox' package\n",
    "logger = logging.getLogger(\"flox\")\n",
    "# Set the logging level to WARNING\n",
    "logger.setLevel(logging.WARNING)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ensembles"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ensemble reduction\n",
    "\n",
    "This tutorial will explore ensemble reduction (also known as ensemble selection) using `xscen`. This will use pre-computed annual mean temperatures from `xclim.testing`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pooch\n",
    "import xarray as xr\n",
    "from xclim.testing.utils import nimbus\n",
    "\n",
    "import xscen as xs\n",
    "\n",
    "downloader = pooch.HTTPDownloader(headers={\"User-Agent\": f\"xscen-{xs.__version__}\"})\n",
    "\n",
    "datasets = {\n",
    "    \"ACCESS\": \"EnsembleStats/BCCAQv2+ANUSPLIN300_ACCESS1-0_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc\",\n",
    "    \"BNU-ESM\": \"EnsembleStats/BCCAQv2+ANUSPLIN300_BNU-ESM_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc\",\n",
    "    \"CCSM4-r1\": \"EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc\",\n",
    "    \"CCSM4-r2\": \"EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r2i1p1_1950-2100_tg_mean_YS.nc\",\n",
    "    \"CNRM-CM5\": \"EnsembleStats/BCCAQv2+ANUSPLIN300_CNRM-CM5_historical+rcp45_r1i1p1_1970-2050_tg_mean_YS.nc\",\n",
    "}\n",
    "\n",
    "for d in datasets:\n",
    "    file = nimbus().fetch(datasets[d], downloader=downloader)\n",
    "    ds = xr.open_dataset(file).isel(lon=slice(0, 4), lat=slice(0, 4))\n",
    "    ds = xs.climatological_op(\n",
    "        ds,\n",
    "        op=\"mean\",\n",
    "        window=30,\n",
    "        periods=[[1981, 2010], [2021, 2050]],\n",
    "        horizons_as_dim=True,\n",
    "    ).drop_vars(\"time\")\n",
    "    datasets[d] = xs.compute_deltas(ds, reference_horizon=\"1981-2010\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Preparing the data\n",
    "\n",
    "Ensemble reduction is built upon climate indicators that are relevant to represent the ensemble's variability for a given application. In this case, we'll use the mean temperature delta between 2021-2050 and 1981-2010, but monthly or seasonal indicators could also be required. The `horizons_as_dim` argument in `climatological_op` can help combine indicators of multiple frequencies into a single dataset. Alternatively, `xscen.utils.unstack_dates` can also accomplish the same thing if the climatological operations have already been computed.\n",
    "\n",
    "The functions implemented in `xclim.ensembles._reduce` require a very specific 2-D DataArray of dimensions \"realization\" and \"criteria\". The first solution is to first create an ensemble using `xclim.ensembles.create_ensemble`, then pass the result to `xclim.ensembles.make_criteria`. Alternatively, the datasets can be passed directly to `xscen.ensembles.reduce_ensemble` and the necessary preliminary steps will be accomplished automatically.\n",
    "\n",
    "In this example, the number of criteria will corresponds to: `indicators x horizons x longitude x latitude`, but criteria that are purely NaN across all realizations will be removed.\n",
    "\n",
    "Note that `xs.spatial_mean` could have been used prior to calling that function to remove the spatial dimensions.\n",
    "\n",
    "### Selecting a reduced ensemble\n",
    "\n",
    "<div class=\"alert alert-info\"> <b>NOTE</b>\n",
    "    \n",
    "Ensemble reduction in `xscen` is built upon `xclim.ensembles`. For more information on basic usage and available methods, [please consult their documentation](https://xclim.readthedocs.io/en/stable/notebooks/ensembles-advanced.html).\n",
    "</div>\n",
    "\n",
    "Ensemble reduction through `xscen.reduce_ensemble` consists in a simple call to `xclim`. The arguments are:\n",
    "- `data`, which is the aforementioned 2D DataArray, or the list/dict of datasets required to build it.\n",
    "- `method` is either `kkz` or `kmeans`. See the link above for further details on each technique.\n",
    "- `horizons` is used to instruct on which horizon(s) to build the data from, if data needs to be constructed.\n",
    "- `create_kwargs`, the arguments to pass to `xclim.ensembles.create_ensemble` if data needs to be constructed.\n",
    "- `kwargs` is a dictionary of arguments to send to the clustering method chosen."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "selected, clusters, fig_data = xs.reduce_ensemble(\n",
    "    data=datasets, method=\"kmeans\", horizons=[\"2021-2050\"], max_clusters=3\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The method always returns 3 outputs (selected, clusters, fig_data):\n",
    "- `selected` is a DataArray of dimension 'realization' listing the selected simulations.\n",
    "- `clusters` (kmeans only) groups every realization in their respective clusters in a python dictionary.\n",
    "- `fig_data` (kmeans only) can be used to call `xclim.ensembles.plot_rsqprofile(fig_data)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "selected"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# To see the clusters in more details\n",
    "clusters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from xclim.ensembles import plot_rsqprofile\n",
    "\n",
    "plot_rsqprofile(fig_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ensemble partition\n",
    "This tutorial will show how to use xscen to create the input for [xclim partition functions](https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "is_executing": true
   },
   "outputs": [],
   "source": [
    "# Get catalog\n",
    "from pathlib import Path\n",
    "\n",
    "import xclim as xc\n",
    "\n",
    "output_folder = Path().absolute() / \"_data\"\n",
    "cat = xs.DataCatalog(str(output_folder / \"tutorial-catalog.json\"))\n",
    "\n",
    "# create a dictionary of datasets wanted for the partition\n",
    "input_dict = cat.search(variable=\"tas\", member=\"r1i1p1f1\").to_dataset_dict(\n",
    "    xarray_open_kwargs={\"engine\": \"h5netcdf\"}\n",
    ")\n",
    "datasets = {}\n",
    "for k, v in input_dict.items():\n",
    "    ds = xc.atmos.tg_mean(v.tas).to_dataset()\n",
    "    ds.attrs = v.attrs\n",
    "    datasets[k] = ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From a dictionary of datasets, the function creates a dataset with new dimensions in `partition_dim`(`[\"source\", \"experiment\", \"bias_adjust_project\"]`, if they exist). In this toy example, we only have different experiments.\n",
    "- By default, it translates the xscen vocabulary (eg. `experiment`) to the xclim partition vocabulary (eg. `scenario`). It is possible to pass `rename_dict` to rename the dimensions with other names.\n",
    "- If the inputs are not on the same grid, they can be regridded through `regrid_kw` or subset to a point through `subset_kw`. The functions assumes that if there are different `bias_adjust_project`, they will be on different grids (with all `source` on the same grid). If there is one or less `bias_adjust_project`, the assumption is that`source` have different grids.\n",
    "- You can also compute indicators on the data if the input is daily. This can be especially useful when the daily input data is on different calendars."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# build a single dataset\n",
    "import xclim as xc\n",
    "\n",
    "ds = xs.ensembles.build_partition_data(\n",
    "    datasets,\n",
    "    subset_kw=dict(name=\"mtl\", method=\"gridpoint\", lat=[45.5], lon=[-73.6]),\n",
    ")\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pass the input to an xclim partition function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "# This is a hidden cell.\n",
    "# extend with fake data to have at least 3 years\n",
    "import xarray as xr\n",
    "\n",
    "ds2 = ds.copy()\n",
    "ds[\"time\"] = xr.cftime_range(start=\"2001-01-01\", periods=len(ds[\"time\"]), freq=\"YS\")\n",
    "ds2[\"time\"] = xr.cftime_range(start=\"2003-01-01\", periods=len(ds[\"time\"]), freq=\"YS\")\n",
    "ds = xr.concat([ds, ds2], dim=\"time\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# compute uncertainty partitioning\n",
    "mean, uncertainties = xc.ensembles.hawkins_sutton(ds.tg_mean)\n",
    "uncertainties"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\"> <b>NOTE</b>\n",
    "    \n",
    "Note that the [figanos library](https://figanos.readthedocs.io/en/latest/) provides a function `fg.partition` to plot the uncertainties.\n",
    "    \n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "celltoolbar": "Aucun(e)",
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}