{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "# Remove flox spam\n",
    "\n",
    "import logging\n",
    "\n",
    "# Get the logger for the 'flox' package\n",
    "logger = logging.getLogger(\"flox\")\n",
    "# Set the logging level to WARNING\n",
    "logger.setLevel(logging.WARNING)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "# Warming levels\n",
    "\n",
    "This Notebook explores the options provided in `xscen` to analyze climate simulations through the scope of global warming levels, instead of temporal horizons.\n",
    "\n",
    "First, we just need to prepare some data. We'll use NorESM2-MM as our example dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Basic imports\n",
    "from pathlib import Path\n",
    "\n",
    "import xarray as xr\n",
    "import xesmf as xe\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "import xscen as xs\n",
    "from xscen.testing import datablock_3d, fake_data\n",
    "\n",
    "# Prepare a Projectcatalog for this Tutorial.\n",
    "output_folder = Path().absolute() / \"_data\"\n",
    "project = {\n",
    "    \"title\": \"example-warminglevel\",\n",
    "    \"description\": \"This is an example catalog for xscen's documentation.\",\n",
    "}\n",
    "pcat = xs.ProjectCatalog(\n",
    "    str(output_folder / \"example-wl.json\"),\n",
    "    project=project,\n",
    "    create=True,\n",
    "    overwrite=True,\n",
    ")\n",
    "\n",
    "# Extract the data needed for the Tutorial\n",
    "cat_sim = xs.search_data_catalogs(\n",
    "    data_catalogs=[str(output_folder / \"tutorial-catalog.json\")],\n",
    "    variables_and_freqs={\"tas\": \"D\"},\n",
    "    other_search_criteria={\"source\": \"NorESM2-MM\", \"activity\": \"ScenarioMIP\"},\n",
    ")\n",
    "region = {\n",
    "    \"name\": \"example-region\",\n",
    "    \"method\": \"bbox\",\n",
    "    \"tile_buffer\": 1.5,\n",
    "    \"lon_bnds\": [-75, -74],\n",
    "    \"lat_bnds\": [45, 46],\n",
    "}\n",
    "\n",
    "for ds_id, dc in cat_sim.items():\n",
    "    ds = xs.extract_dataset(\n",
    "        catalog=dc,\n",
    "        region=region,\n",
    "        xr_open_kwargs={\"drop_variables\": [\"height\", \"time_bnds\"]},\n",
    "    )[\"D\"]\n",
    "    # Since the sample files are very small, we'll create fake data covering a longer time period\n",
    "    data = fake_data(\n",
    "        nyears=121,\n",
    "        ny=len(ds.lat),\n",
    "        nx=len(ds.lon),\n",
    "        rand_type=\"tas\",\n",
    "        seed=list(cat_sim.keys()).index(ds_id),\n",
    "        amplitude=15,\n",
    "        offset=2,\n",
    "    )\n",
    "    attrs = ds.attrs\n",
    "    ds = datablock_3d(\n",
    "        data,\n",
    "        \"tas\",\n",
    "        \"lon\",\n",
    "        -75,\n",
    "        \"lat\",\n",
    "        45,\n",
    "        x_step=1,\n",
    "        y_step=1.5,\n",
    "        start=\"1/1/1981\",\n",
    "        freq=\"D\",\n",
    "        as_dataset=True,\n",
    "    )\n",
    "    ds.attrs = attrs\n",
    "\n",
    "    filename = str(\n",
    "        output_folder\n",
    "        / f\"wl_{ds.attrs['cat:id']}.{ds.attrs['cat:domain']}.{ds.attrs['cat:processing_level']}.{ds.attrs['cat:frequency']}.zarr\"\n",
    "    )\n",
    "    chunks = xs.io.estimate_chunks(ds, dims=[\"time\"], target_mb=50)\n",
    "    xs.save_to_zarr(ds, filename, rechunk=chunks, mode=\"o\")\n",
    "    pcat.update_from_ds(ds=ds, path=filename, info_dict={\"format\": \"zarr\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "## Find warming levels with only the model name\n",
    "\n",
    "If all that you want to know is the year or the period during which a climate model reaches a given warming level, then ``xs.get_period_from_warming_level`` is the function to use since you can simply give it a string or a list of strings and receive that information.\n",
    "\n",
    "The usual arguments of ``xs.get_period_from_warming_level`` are:\n",
    "\n",
    "- `realization`: Dataset, dict or string.\n",
    "    * Strings should follow the format 'mip-era_source_experiment_member'. Those fields should be found in the dict or in the attributes of the dataset (allowing for a possible 'cat:' prefix).\n",
    "    * In all cases, regex is allowed to relax the name matching.\n",
    "    * The \"source\" part can also be a `driving_model` name. If a `Dataset` is passed and it's `driving_model` attribute is non-null, it is used.\n",
    "- `wl`: warming level.\n",
    "- `window`: Number of years in the centered window during which the warming level is reached. Note that in the case of an even number, the IPCC standard is used (-n/2+1, +n/2).\n",
    "- `tas_baseline_period`: The period over which the warming level is calculated, equivalent to \"+0°C\". Defaults to 1850-1900.\n",
    "- `ignore_member`: The default `tas_src` only contains data for 1 member. If you want a result regardless of the realization number, set this to True.\n",
    "- `return_central_year`: Whether to return the start/end of the period or to return the middle year.\n",
    "\n",
    "It returns either a string or `['start_yr', 'end_yr']`, depending on `return_central_year`. For entries that it fails to find in the database, or for instances where a given warming level is not reached, the function returns None (or `[None, None]`).\n",
    "\n",
    "If `realization` is a list of the accepted types, or a DataArray or a DataFrame, the function returns a sequence of the same size (and with the same index, if relevant). It can happen that a requested model's name was not found exactly in the database, but that arguments allowed for a relaxed search (`ignore_member = True` or regex in  `realization`). In that case, the _selected_ model doesn't have the same name as the requested one and this information is only shown in the log, unless one passes `output='selected'` to receive a dictionary instead where the keys are the _selected_ models in the database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Multiple entries, returns a list of the same length\n",
    "print(\n",
    "    xs.get_period_from_warming_level(\n",
    "        [\n",
    "            \"CMIP6_CanESM5_ssp126_r1i1p1f1\",\n",
    "            \"CMIP6_CanESM5_ssp245_r1i1p1f1\",\n",
    "            \"CMIP6_CanESM5_ssp370_r1i1p1f1\",\n",
    "            \"CMIP6_CanESM5_ssp585_r1i1p1f1\",\n",
    "        ],\n",
    "        wl=2,\n",
    "        window=20,\n",
    "    )\n",
    ")\n",
    "# Returns a list by default\n",
    "print(\n",
    "    xs.get_period_from_warming_level(\"CMIP6_CanESM5_ssp585_r1i1p1f1\", wl=2, window=20)\n",
    ")\n",
    "# Only the middle year is requested, returns a string\n",
    "print(\n",
    "    xs.get_period_from_warming_level(\n",
    "        \"CMIP6_CanESM5_ssp585_r1i1p1f1\", wl=2, window=20, return_central_year=True\n",
    "    )\n",
    ")\n",
    "# +10°C is never reached, returns None\n",
    "print(\n",
    "    xs.get_period_from_warming_level(\"CMIP6_CanESM5_ssp585_r1i1p1f1\", wl=10, window=20)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "The opposite search, getting the warming level associated to a given time period, is possible through a sister function `get_warming_level_from_period`. The function takes similar arguments, but with `wl` and `window` replaced by a `period` argument that follows the usual `period` formatting (e.g. `[\"2041\", \"2070\"]`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the warming level associated to 2041-2070\n",
    "print(\n",
    "    xs.get_warming_level_from_period(\"CMIP6_CanESM5_ssp585_r1i1p1f1\", [\"2041\", \"2070\"])\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7",
   "metadata": {},
   "source": [
    "## Find and extract data by warming levels\n",
    "\n",
    "If you instead need to subset and analyze data, then two options are currently provided in `xscen`: `subset_warming_level` and `produce_horizon`.\n",
    "- Use `subset_warming_level` when you want to cut a dataset for a period corresponding to a given warming level, but leave its frequency untouched.\n",
    "- Use `produce_horizon` when you need to: subset a time period, compute indicators, and compute the climatological mean for one or multiple horizons.\n",
    "\n",
    "The two methods are detailed in the following section.\n",
    "\n",
    "### Method #1: Subsetting datasets by warming level\n",
    "\n",
    "``xs.subset_warming_level`` can be used to subset a dataset for a window over which a given global warming level is reached. A new dimension named `warminglevel` is created by the function.\n",
    "\n",
    "The function calls `get_period_from_warming_level`, so the arguments are essentially the same.:\n",
    "\n",
    "- `ds`: input dataset.\n",
    "- `wl`: warming level.\n",
    "- `window`: Number of years in the centered window during which the warming level is reached. Note that in the case of an even number, the IPCC standard is used (-n/2+1, +n/2).\n",
    "- `tas_baseline_period`: The period over which the warming level is calculated, equivalent to \"+0°C\". Defaults to 1850-1900.\n",
    "- `ignore_member`: The default database only contains data for 1 member. If you want a result regardless of the realization number, set this to True.\n",
    "- `to_level`: Contrary to other methods, you can use \"{wl}\", \"{period0}\" and \"{period1}\" in the string to dynamically include `wl`, 'tas_baseline_period[0]' and 'tas_baseline_period[1]' in the `processing_level`.\n",
    "- `wl_dim`: The string used to fill the new `warminglevel` dimension. You can use \"{wl}\", \"{period0}\" and \"{period1}\" in the string to dynamically include `wl`, `tas_baseline_period[0]` and `tas_baseline_period[1]`. Or you can use `True` to have a float coordinate with units of °C. If None, no new dimension will be added.\n",
    "\n",
    "If the source, experiment, (member), and warming level are not found in the database. The function returns None."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "ds = pcat.search(\n",
    "    processing_level=\"extracted\",\n",
    "    experiment=\"ssp245\",\n",
    "    member=\"r1.*\",\n",
    "    source=\"NorESM2-MM\",\n",
    "    frequency=\"day\",\n",
    ").to_dataset()\n",
    "\n",
    "xs.subset_warming_level(\n",
    "    ds,\n",
    "    wl=2,\n",
    "    window=20,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "#### Vectorized subsetting\n",
    "\n",
    "The function can also vectorize the subsetting over multiple warming levels or over a properly constructed \"realization\" dimension. In that case, the original time axis can't be preserved. It is replaced by a fake one starting in 1000. However, as this process is a bit complex, the current xscen version only supports this if the data is annual. As the time axis doesn't carry any information, a `warminglevel_bounds` coordinate is added with the time bounds of the subsetting. If a warming level was not reached, a NaN slice is inserted in the output dataset.\n",
    "\n",
    "This option is to be used when \"scalar\" subsetting is not enough, but you want to do things differently than `produce_horizons`.\n",
    "\n",
    "Here, we'll open all experiments into a single ensemble dataset where the `realization` dimension is constructed exactly as `get_period_from_warming_level` expects it to be. We'll also average the daily data to an annual scale."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds = pcat.search(\n",
    "    processing_level=\"extracted\",\n",
    "    member=\"r1.*\",\n",
    "    frequency=\"day\",\n",
    ").to_dataset(\n",
    "    # Value of the \"realization\" dimension will be constructed by concatenating those fields with a '_'\n",
    "    create_ensemble_on=[\"mip_era\", \"source\", \"experiment\", \"member\"]\n",
    ")\n",
    "ds = ds.resample(time=\"YS\").mean()\n",
    "ds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "xs.subset_warming_level(ds, wl=[1.5, 2, 3], wl_dim=True, to_level=\"warming-level\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "### Method #2: Producing horizons\n",
    "\n",
    "If what you need is to compute indicators and their climatological mean, ``xs.aggregate.produce_horizon`` is a more convenient function to work with than `subset_warming_level`. Since the years are meaningless for warming levels, and are even detrimental to making ensemble statistics, the function formats the output as to remove the 'time' and 'year' information from the dataset, while the seasons/months are unstacked to different coordinates. Hence, the single dataset outputted can contain indicators of different frequencies, as well as multiple warming levels or temporal horizons.\n",
    "\n",
    "The arguments of ``xs.aggregate.produce_horizon`` are:\n",
    "\n",
    "- `ds`: input dataset.\n",
    "- `indicators`: As in `compute_indicators`\n",
    "- `periods`: Periods to cut.\n",
    "- `warminglevels`: Dictionary of arguments to pass to `subset_warming_level`. If 'wl' is a list, the function will be called for each value and produce multiple horizons.\n",
    "\n",
    "If both periods and warminglevels are None, the full time series will be used. If a dataset does not contain a given period or warming level, then that specific period will be skipped."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "dict_input = pcat.search(processing_level=\"extracted\", xrfreq=\"D\").to_dataset_dict(\n",
    "    xarray_open_kwargs={\"decode_timedelta\": False}\n",
    ")\n",
    "\n",
    "for id_input, ds_input in dict_input.items():\n",
    "    # 1981-2010 will be used as our reference period. We can compute it at the same time.\n",
    "    ds_hor = xs.produce_horizon(\n",
    "        ds_input,\n",
    "        indicators=\"samples/indicators.yml\",\n",
    "        periods=[[\"1981\", \"2010\"]],\n",
    "        warminglevels={\"wl\": [1, 1.5, 2], \"window\": 30, \"ignore_member\": True},\n",
    "        to_level=\"horizons\",\n",
    "    )\n",
    "\n",
    "    # Save\n",
    "    filename = str(\n",
    "        output_folder\n",
    "        / f\"wl_{ds_hor.attrs['cat:id']}.{ds_hor.attrs['cat:domain']}.{ds_hor.attrs['cat:processing_level']}.{ds_hor.attrs['cat:frequency']}.zarr\"\n",
    "    )\n",
    "    xs.save_to_zarr(ds_hor, filename, mode=\"o\")\n",
    "    pcat.update_from_ds(ds=ds_hor, path=filename, info_dict={\"format\": \"zarr\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "display(ds_hor)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15",
   "metadata": {},
   "source": [
    "## Deltas and spatial aggregation\n",
    "\n",
    "This step is done as in the [Getting Started](2_getting_started.ipynb#Computing-deltas) Notebook. Here we will spatially aggregate the data, but the datasets could also be regridded to a common grid."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "dict_wl = pcat.search(processing_level=\"horizons\").to_dataset_dict(\n",
    "    xarray_open_kwargs={\"decode_timedelta\": False}\n",
    ")\n",
    "\n",
    "for id_wl, ds_wl in dict_wl.items():\n",
    "    # compute delta\n",
    "    ds_delta = xs.aggregate.compute_deltas(\n",
    "        ds=ds_wl, reference_horizon=\"1981-2010\", to_level=\"deltas\"\n",
    "    )\n",
    "\n",
    "    # remove the reference period from the dataset\n",
    "    ds_delta = ds_delta.sel(horizon=~ds_delta.horizon.isin([\"1981-2010\"]))\n",
    "\n",
    "    # aggregate\n",
    "    ds_delta[\"lon\"].attrs[\"axis\"] = \"X\"\n",
    "    ds_delta[\"lat\"].attrs[\"axis\"] = \"Y\"\n",
    "    ds_agg = xs.spatial_mean(\n",
    "        ds_delta,\n",
    "        method=\"cos-lat\",\n",
    "        to_level=\"deltas-agg\",\n",
    "    )\n",
    "\n",
    "    # Save\n",
    "    filename = str(\n",
    "        output_folder\n",
    "        / f\"wl_{ds_agg.attrs['cat:id']}.{ds_agg.attrs['cat:domain']}.{ds_agg.attrs['cat:processing_level']}.{ds_agg.attrs['cat:frequency']}.zarr\"\n",
    "    )\n",
    "    xs.save_to_zarr(ds_agg, filename, mode=\"o\")\n",
    "    pcat.update_from_ds(ds=ds_agg, path=filename, info_dict={\"format\": \"zarr\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "display(ds_agg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18",
   "metadata": {},
   "source": [
    "## Ensemble statistics\n",
    "\n",
    "Even more than with time-based horizons, the first step of ensemble statistics should be to generate the weights. Indeed, if a model has 3 experiments reaching a given warming level, we want it to have the same weight as a model with only 2 experiments reaching that warming level. The argument `skipna=False` should be passed to `xs.generate_weights` to properly assess which simulations reaches which warming level. If the `horizon` dimension differs between datasets (as is the case here), they are reindexed and given a weight of 0.\n",
    "\n",
    "When working with warming levels, how to assess experiments is more open-ended. The IPCC Atlas splits the statistics and climate change signals by SSPs, even when they are being analyzed through warming levels, but experiments could also be considered as 'members' of a given model and used to bolster the number of realizations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "datasets = pcat.search(processing_level=\"deltas-agg\").to_dataset_dict()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# All realisations of NorESM2-MM are given the same weight for the first horizon, while it correctly recognizes that +1.5°C and +2°C weren't reached for the SSP126.\n",
    "weights = xs.ensembles.generate_weights(\n",
    "    datasets=datasets, independence_level=\"model\", skipna=False\n",
    ")\n",
    "display(weights)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21",
   "metadata": {},
   "source": [
    "Next, the weights and the datasets can be passed to `xs.ensemble_stats` to calculate the ensemble statistics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "ds_ens = xs.ensemble_stats(\n",
    "    datasets=datasets,\n",
    "    common_attrs_only=True,\n",
    "    weights=weights,\n",
    "    statistics={\"ensemble_mean_std_max_min\": None},\n",
    "    to_level=f\"ensemble-deltas-wl\",\n",
    ")\n",
    "\n",
    "# It is sometimes useful to keep track of how many realisations made the ensemble.\n",
    "ds_ens.horizon.attrs[\"ensemble_size\"] = len(datasets)\n",
    "\n",
    "filename = str(\n",
    "    output_folder\n",
    "    / f\"wl_{ds_ens.attrs['cat:id']}.{ds_ens.attrs['cat:domain']}.{ds_ens.attrs['cat:processing_level']}.{ds_ens.attrs['cat:frequency']}.zarr\"\n",
    ")\n",
    "xs.save_to_zarr(ds_ens, filename, mode=\"o\")\n",
    "pcat.update_from_ds(ds=ds_ens, path=filename, info_dict={\"format\": \"zarr\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {},
   "outputs": [],
   "source": [
    "display(ds_ens)"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}