2. Getting Started
This Notebook will go through all major steps of creating a climate scenario using xscen
. These steps are:
search_data_catalogs
to find a subset of datasets that match a given project’s requirements.extract_dataset
to extract them.regrid_dataset
to regrid all data to a common grid.train
andadjust
to bias correct the raw simulations.compute_indicators
to compute a list of indicators.climatological_op
andspatial_mean
for spatio-temporal aggregation.compute_deltas
to compute deltas.ensemble_stats
for ensemble statistics.clean_up
for minor adjustments that have to be made in preparation for the final product.
2.1. Initialisation
Typically, the first step should be to create a new ProjectCatalog to store the files that will be created during the process. More details on basic usage are provided in the Catalogs Notebook.
[2]:
from pathlib import Path
# A temporary bug fix waiting for xclim 0.49
import xclim as xc
import xscen as xs
xc.set_options(sdba_encode_cf=False)
# Folder where to put the data
output_folder = Path().absolute() / "_data"
output_folder.mkdir(exist_ok=True)
project = {
"title": "example-gettingstarted",
"description": "This is an example catalog for xscen's documentation.",
}
pcat = xs.ProjectCatalog(
str(output_folder / "example-gettingstarted.json"),
create=True,
project=project,
overwrite=True,
)
Successfully wrote ESM catalog json file to: file:///home/docs/checkouts/readthedocs.org/user_builds/xscen/checkouts/stable/docs/notebooks/_data/example-gettingstarted.json
ERROR 1: PROJ: proj_create_from_database: Open of /home/docs/checkouts/readthedocs.org/user_builds/xscen/conda/stable/share/proj failed
2.1.1. Searching a subset of datasets within DataCatalogs
INFO
At this stage, the search criteria should be for variables that will be bias corrected, not necessarily the variables required for the final product. For example, if sfcWindfromdir
is the final product, then uas
and vas
should be searched for since these are the variables that will be bias corrected.
xs.search_data_catalogs
is used to consult a list of DataCatalogs and find a subset of datasets that match given search parameters. More details on that function and possible usage are given in the Understanding Catalogs Notebook.
The function also plays the double role of preparing certain arguments for the extraction function, as detailed in the relevant section of this tutorial.
Due to how different reference datasets are from climate simulations, this function might have to be called multiple times and the results concatenated into a single dictionary.
For the purpose of this tutorial, temperatures and the land fraction from NorESM2-MM will be used:
[3]:
variables_and_freqs = {"tas": "D", "sftlf": "fx"}
other_search_criteria = {
"source": ["NorESM2-MM"],
"processing_level": ["raw"],
"experiment": "ssp245",
}
cat_sim = xs.search_data_catalogs(
data_catalogs=[str(output_folder / "tutorial-catalog.json")],
variables_and_freqs=variables_and_freqs,
other_search_criteria=other_search_criteria,
periods=[2001, 2002],
match_hist_and_fut=True,
)
cat_sim
[3]:
{'CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r2i1p1f1_example-region': <tutorial-catalog catalog with 2 dataset(s) from 2 asset(s)>,
'CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1_example-region': <tutorial-catalog catalog with 2 dataset(s) from 2 asset(s)>}
The result of search_data_catalog
is a dictionary with one entry per unique ID. Note that a unique ID can be associated to multiple intake datasets, as is the case here, because intake-esm
groups catalog lines per id - domain - processing_level - xrfeq.
[4]:
cat_sim["CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1_example-region"].df
[4]:
id | type | processing_level | bias_adjust_institution | bias_adjust_project | mip_era | activity | driving_model | institution | source | ... | member | xrfreq | frequency | variable | domain | date_start | date_end | version | format | path | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1... | simulation | raw | NaN | NaN | CMIP6 | ScenarioMIP | NaN | NCC | NorESM2-MM | ... | r1i1p1f1 | D | day | (tas,) | example-region | 2001-01-01 | 2002-12-31 | NaN | nc | /home/docs/checkouts/readthedocs.org/user_buil... |
1 | CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1... | simulation | raw | NaN | NaN | CMIP6 | ScenarioMIP | NaN | NCC | NorESM2-MM | ... | r1i1p1f1 | fx | fx | (sftlf,) | example-region | NaT | NaT | NaN | nc | /home/docs/checkouts/readthedocs.org/user_buil... |
2 rows × 21 columns
2.2. Extracting data
WARNING
It is heavily recommended to stop and analyse the results of search_data_catalogs
before proceeding to the extraction function.
2.2.1. Defining the region
The region for a given project is defined using a dictionary with the relevant information to be used by clisops.core.subset
. The required fields are as follows:
region =
"name": str, {this will overwrite the *domain* column in the catalog}
"method": str, [gridpoint, bbox, shape, sel]
"tile_buffer": float, {approximate number of pixels used to expand the domain based on model resolution}
**kwargs {other arguments to send `clisops`}
The argument tile_buffer (optional) is used to apply a buffer zone around the region that is adjusted dynamically according to model resolution during the extraction process (for bbox and shape only). This is useful to make sure that grid cells that only partially cover the region are selected too.
The documentation for the supported subsetting methods are in the following links:
[5]:
region = {
"name": "example-region",
"method": "bbox",
"tile_buffer": 1.5,
"lon_bnds": [-75, -74],
"lat_bnds": [45, 46],
}
2.2.2. Preparing arguments for xarray
xscen
makes use of intake_esm
’s to_dataset_dict() for the extraction process, which will automatically compute missing variables as required. Also, this function manages Catalogs, IDs, and both NetCDF and Zarr files seamlessly. When the catalog is made of a single dataset, to_dataset()
can be used instead to directly obtain an xr.Dataset instead of a dictionary.
There are a few key differences compared to using xarray directly, one of which being that it uses xr.open_dataset
, even when multiple files are involved, with a subsequent call to xr.combine_by_coords
. Kwargs are therefore separated in two:
xr_open_kwargs
is used for optional arguments inxarray.open_dataset
.xr_combine_kwargs
is used for optional arguments inxarray.combine_by_coords
.
More information on possible kwargs can be obtained here: xarray.open_dataset & xarray.combine_by_coords
[6]:
# Kwargs for xr.open_dataset
xr_open_kwargs = {"drop_variables": ["height", "time_bnds"], "engine": "h5netcdf"}
# Kwargs for xr.combine_by_coords
xr_combine_kwargs = {"data_vars": "minimal"}
2.2.3. Extraction function
Extraction is done on each dataset by calling xs.extract_dataset()
. Since the output could have multiple frequencies, the function returns a python dictionary with keys following the output frequency.
catalog
is the DataCatalog to extract.variables_and_freqs
is the same as previously used forsearch_data_catalogs
.periods
is used to extract specific time periods.to_level
will change the processing_level of the output. Defaults to “extracted”.region
,xr_open_kwargs
, andxr_combine_kwargs
are described above.
NOTE: Calling the extraction function without passing by search_data_catalogs
beforehand is possible, but will not support DerivedVariables.
NOTE
extract_dataset
currently only accepts a single unique ID at a time.
[7]:
# Example with a single simulation
ds_dict = xs.extract_dataset(
catalog=cat_sim["CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1_example-region"],
variables_and_freqs=variables_and_freqs,
periods=[2001, 2002],
region=region,
xr_open_kwargs=xr_open_kwargs,
xr_combine_kwargs=xr_combine_kwargs,
)
ds_dict
--> The keys in the returned dictionary of datasets are constructed as follows:
'id.domain.processing_level.xrfreq'
[7]:
{'D': <xarray.Dataset> Size: 41kB
Dimensions: (time: 730, lat: 3, lon: 2)
Coordinates:
* time (time) datetime64[ns] 6kB 2001-01-01 2001-01-02 ... 2002-12-31
* lat (lat) float64 24B 45.0 46.5 48.0
* lon (lon) int64 16B -75 -74
Data variables:
tas (time, lat, lon) float64 35kB dask.array<chunksize=(730, 3, 2), meta=np.ndarray>
Attributes: (12/22)
comment: This is a test file created for the xscen tutori...
version: v20191108
intake_esm_vars: ['tas']
cat:id: CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1...
cat:type: simulation
cat:processing_level: extracted
... ...
cat:date_start: 2001-01-01 00:00:00
cat:date_end: 2002-12-31 00:00:00
cat:_data_format_: nc
cat:path: /home/docs/checkouts/readthedocs.org/user_builds...
intake_esm_dataset_key: CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1...
history: [2024-05-07 19:19:25] bbox spatial subsetting wi...,
'fx': <xarray.Dataset> Size: 88B
Dimensions: (lat: 3, lon: 2)
Coordinates:
* lat (lat) float64 24B 45.0 46.5 48.0
* lon (lon) int64 16B -75 -74
Data variables:
sftlf (lat, lon) float64 48B dask.array<chunksize=(3, 2), meta=np.ndarray>
Attributes: (12/20)
comment: This is a test file created for the xscen tutori...
version: v20200702
intake_esm_vars: ['sftlf']
cat:id: CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1...
cat:type: simulation
cat:processing_level: extracted
... ...
cat:variable: sftlf
cat:domain: example-region
cat:_data_format_: nc
cat:path: /home/docs/checkouts/readthedocs.org/user_builds...
intake_esm_dataset_key: CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1...
history: [2024-05-07 19:19:25] bbox spatial subsetting wi...}
2.2.4. Saving files to disk
extract_dataset
does not actually save anything to disk. It simply opens and prepares the files as per requested, using lazy computing. The result is a python dictionary containing the results, separated per xrfreq.
xscen
has two functions for the purpose of saving data: save_to_netcdf
and save_to_zarr
. If possible for a given project, it is strongly recommended to use Zarr files since these are often orders of magnitude faster to read and create compared to NetCDF. They do have a few quirks, however:
Chunk size must separate the dataset in exactly equal chunks (with the exception of the last). While it is recommended to calculate ideal chunking and provide them explicitely to the function,
io.estimate_chunks()
can be used to automatically estimate a decent chunking. In a similar way,io.subset_maxsize()
can be used to roughly cut a dataset along the time dimension into subsets of a given size (in Gb), which is especially useful for saving NetCDF files. Chunk sizes can be passed to the two saving functions in a dictionary. Spatial dimensions can be generalized as'X'
and'Y'
, which will be mapped to the xarray.Dataset’s actual grid type’s dimension names.Default behaviour for a Zarr is to act like a directory, with each new variable being assigned a subfolder. This is great when all required variables have the same dimensions and frequency, but will crash otherwise. If you have daily tasmax and subdaily pr, for example, they need to be assigned different paths.
2.2.4.1. Updating the catalog
intake-esm
will automatically copy the catalog’s entry in the dataset’s metadata, with a cat:attr
format. Where appropriate, xscen
updates that information to keep the metadata up to date with the manipulations. ProjectCatalog.update_from_ds
will in turn read the metadata of a Dataset and fill in the information into a new catalog entry.
This loop means that upon completing a step in the creation of a climate scenario, ProjectCatalog.update_from_ds()
can be called to update the catalog.
[8]:
for ds in ds_dict.values():
filename = str(
output_folder
/ f"{ds.attrs['cat:id']}.{ds.attrs['cat:domain']}.{ds.attrs['cat:processing_level']}.{ds.attrs['cat:frequency']}.zarr"
)
chunks = xs.io.estimate_chunks(ds, dims=["time"], target_mb=50)
xs.save_to_zarr(ds, filename, rechunk=chunks, mode="o")
# Strongly suggested to update the project catalog AFTER you save to disk, in case it crashes during the process
pcat.update_from_ds(ds=ds, path=filename, info_dict={"format": "zarr"})
pcat.df
[8]:
id | type | processing_level | bias_adjust_institution | bias_adjust_project | mip_era | activity | driving_model | institution | source | ... | member | xrfreq | frequency | variable | domain | date_start | date_end | version | format | path | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1... | simulation | extracted | NaN | NaN | CMIP6 | ScenarioMIP | NaN | NCC | NorESM2-MM | ... | r1i1p1f1 | D | day | (tas,) | example-region | 2001-01-01 00:00:00 | 2002-12-31 00:00:00 | NaN | zarr | /home/docs/checkouts/readthedocs.org/user_buil... |
1 | CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1... | simulation | extracted | NaN | NaN | CMIP6 | ScenarioMIP | NaN | NCC | NorESM2-MM | ... | r1i1p1f1 | fx | fx | (sftlf,) | example-region | NaN | NaN | NaN | zarr | /home/docs/checkouts/readthedocs.org/user_buil... |
2 rows × 21 columns
2.2.5. Simplifying the call to extract_dataset() with search_data_catalogs()
When a catalog was produced using search_data_catalogs
, xscen
will automatically save the requested periods and frequencies, in addition to DerivedVariables. This means that these items do not need to be included during the call to extract_dataset
and make it possible to extract datasets that have different requirements (such as reference datasets and future simulations).
[9]:
cat_sim[
"CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1_example-region"
]._requested_periods
[9]:
[['2001', '2002']]
[10]:
print(
cat_sim[
"CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1_example-region"
]._requested_variables_true
)
print(
cat_sim[
"CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1_example-region"
]._requested_variable_freqs
)
['tas', 'sftlf']
['D', 'fx']
Since cat_sim
contains multiple datasets, extracting the data should be done by looping on .items()
or .values()
. Also, since ‘CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp126_r1i1p1f1_example-region’ was extracted in the previous step, pcat.exists_in_cat
can be used to skip re-extracting.
[11]:
for key, dc in cat_sim.items():
if not pcat.exists_in_cat(id=key, processing_level="extracted"):
dset_dict = xs.extract_dataset(
catalog=dc,
region=region,
xr_open_kwargs=xr_open_kwargs,
xr_combine_kwargs=xr_combine_kwargs,
)
for ds in dset_dict.values():
filename = str(
output_folder
/ f"{ds.attrs['cat:id']}.{ds.attrs['cat:domain']}.{ds.attrs['cat:processing_level']}.{ds.attrs['cat:frequency']}.zarr"
)
chunks = xs.io.estimate_chunks(ds, dims=["time"], target_mb=50)
xs.save_to_zarr(ds, filename, rechunk=chunks, mode="o")
# Strongly suggested to update the project catalog AFTER you save to disk, in case it crashes during the process
pcat.update_from_ds(ds=ds, path=filename, info_dict={"format": "zarr"})
--> The keys in the returned dictionary of datasets are constructed as follows:
'id.domain.processing_level.xrfreq'
2.3. Regridding data
NOTE
Regridding in xscen
is built upon xESMF
. For more information on basic usage and available regridding methods, please consult their documentation. Their masking and extrapolation tutorial is of particular interest.
More details on the regridding functions themselves can be found within the ESMPy and ESMF documentation.
The only requirement for using datasets in xESMF
is that they contain lon and lat, with mask as an optional data variable. Using these, the package can manage both regular and rotated grids. The main advantage of xESMF
compared to other tools such as scipy’s griddata, in addition to the fact that the methods are climate science-based, is that the transformation weights are calculated once and broadcasted on the time dimension.
2.3.1. Preparing the destination grid
xscen
currently does not explicitely support any function to create a destination grid. If required, however, xESMF
itself has utilities that can easily create custom regular grids, such as xesmf.util.cf_grid_2d
.
[13]:
import xesmf
ds_grid = xesmf.util.cf_grid_2d(-75, -74, 0.25, 45, 48, 0.55)
# cf_grid_2d does not set the 'axis' attribute
ds_grid["lon"].attrs["axis"] = "X"
ds_grid["lat"].attrs["axis"] = "Y"
# xscen will use the domain to re-assign attributes, so it is important to set it up for custom grids like this
ds_grid.attrs["cat:domain"] = "finer-grid"
ds_grid
[13]:
<xarray.Dataset> Size: 224B Dimensions: (lon: 4, bound: 2, lat: 5) Coordinates: * lon (lon) float64 32B -74.88 -74.62 -74.38 -74.12 * lat (lat) float64 40B 45.27 45.82 46.37 46.92 47.47 latitude_longitude float64 8B nan Dimensions without coordinates: bound Data variables: lon_bounds (lon, bound) float64 64B -75.0 -74.75 ... -74.25 -74.0 lat_bounds (lat, bound) float64 80B 45.0 45.55 45.55 ... 47.2 47.75 Attributes: cat:domain: finer-grid
2.3.2. Masking grid cells
Masks can be used on both the original grid and the destination grid to ignore certain grid cells during the regridding process. These masks follow the ESMF
convention, meaning that the mask is a variable within the Dataset, named mask and comprised of 0 and 1.
xs.create_mask
will create an adequate DataArray, following the instructions given by mask_args. In the case of variables that have a time component, the first timestep will be chosen.
mask_args:
- 'variable' (optional)
- 'where_operator' (optional)
- 'where_threshold' (optional)
- 'mask_nans': bool
[14]:
# Will mask all pixels that do not match these requirements (at least 25% land)
mask_args = {
"variable": "sftlf",
"where_operator": ">",
"where_threshold": 25,
"mask_nans": True,
}
# to_dataset() will open the dataset, as long as the search() gave a single result.
ds_example = pcat.search(
id="CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1_example-region",
processing_level="extracted",
variable="sftlf",
).to_dataset()
# Masking function
ds_example["mask"] = xs.create_mask(ds_example, mask_args=mask_args)
[15]:
import matplotlib.patches as patches
import matplotlib.pyplot as plt
# Plot sftlf
ax = plt.subplot(121)
ds_example.sftlf.plot.imshow()
plt.title("sftlf")
# Plot the mask
plt.subplot(122)
ds_example.mask.plot.imshow()
plt.title("mask")
[15]:
Text(0.5, 1.0, 'mask')
2.3.3. Preparing arguments for xESMF.Regridder
NOTE
xESMF’s API appears to be broken on their ReadTheDocs. For a list of available arguments and options in Regridder()
, please consult their Github page directly.
xESMF.Regridder
is the main utility that computes the transformation weights and performs the regridding. It is supported by many optional arguments and methods, which can be called in xscen
through regridder_kwargs
.
Available options are:
method: str
Regridding method. More details are given within the ESMF documentation and xESMF tutorials.
- 'bilinear' (Default)
- 'nearest_s2d'
- 'nearest_d2s'
- 'conservative'
- 'conservative_normed'
- 'patch'
extrap_method: str
Extrapolation method. Defaults to None.
- 'inverse_dist'
- 'nearest_s2d'
extrap_dist_exponent: float
Exponent to raise the distance to when calculating weights for extrapolation.
Defaults to 2.0.
extrap_num_src_pnts : int, optional
The number of source points to use for the extrapolation methods that use more than one source point.
Defaults to 8
unmapped_to_nan: boolean, optional
Set values of unmapped points to `np.nan` instead of the ESMF default of 0.
Defaults to True.
periodic: boolean, optional
Only really useful for global grids with non-conservative regridding.
Other options exist in ESMF/ESMPy
, but not xESMF
. As they get implemented, they should automatically get supported by xscen
.
NOTE
Some utilities that exist in xESMF
have not yet been explicitely added to xscen
. If conservative regridding is desired, for instance, some additional scripts might be required on the User’s side to create the lon/lat boundaries
[16]:
regridder_kwargs = {"extrap_method": "inverse_dist"}
2.3.4. Regridding function
Regridding for a Dataset is done through xs.regrid_dataset
, which manages calls to xESMF.Regridder
and makes sure that the output is CF-compliant.
weights_location
provides a path where to save the regridding weights (NetCDF file). This file (alongsidereuse_weights=True
) is used byxESMF
to reuse the transformation weights between datasets that are deemed to have the same grid and vastly improve the speed of the function.intermediate_grids
can be called to perform the regridding process in multiple steps. This is recommended when the jump in resolution is very high between the original and destination grid (e.g. from 3° to 0.08°).ds_grid
®ridder_kwargs
are described above.
[17]:
# to_dataset_dict() is called to cast the search results as xr.Dataset objects
# frequency="^(?!fx$).*$" is used to exclude fixed fields from the results
ds_dict = pcat.search(
processing_level="extracted", frequency="^(?!fx$).*$", domain="example-region"
).to_dataset_dict()
mask_args = {
"variable": "sftlf",
"where_operator": ">",
"where_threshold": 25,
"mask_nans": True,
}
for ds in ds_dict.values():
# Add a mask on the original grid.
ds["mask"] = xs.create_mask(
pcat.search(
id=ds.attrs["cat:id"], processing_level="extracted", variable="sftlf"
).to_dataset(),
mask_args=mask_args,
)
# Regridding function
ds_regrid = xs.regrid_dataset(
ds=ds,
weights_location=str(output_folder / "gs-weights"),
ds_grid=ds_grid,
regridder_kwargs=regridder_kwargs,
)
# Save to zarr
filename = str(
output_folder
/ f"{ds_regrid.attrs['cat:id']}.{ds_regrid.attrs['cat:domain']}.{ds_regrid.attrs['cat:processing_level']}.{ds_regrid.attrs['cat:frequency']}.zarr"
)
chunks = xs.io.estimate_chunks(ds, dims=["time"], target_mb=50)
xs.save_to_zarr(ds=ds_regrid, filename=filename, rechunk=chunks, mode="o")
pcat.update_from_ds(ds_regrid, path=filename, format="zarr")
--> The keys in the returned dictionary of datasets are constructed as follows:
'id.domain.processing_level.xrfreq'
[18]:
import matplotlib.patches as patches
plt.figure(figsize=[15, 5])
vmin = float(ds.tas.isel(time=0).min())
vmax = float(ds.tas.isel(time=0).max())
ax = plt.subplot(131)
ds.tas.isel(time=0).plot.imshow(vmin=vmin, vmax=vmax)
plt.title("tas: original grid")
rect = patches.Rectangle(
(-75, 45), 1, 2.75, linewidth=1, edgecolor="r", facecolor="none"
)
ax.add_patch(rect)
ax = plt.subplot(132)
(ds.tas.isel(time=0).where(ds.mask == 1)).plot.imshow(vmin=vmin, vmax=vmax)
rect = patches.Rectangle(
(-75, 45), 1, 2.75, linewidth=1, edgecolor="r", facecolor="none"
)
ax.add_patch(rect)
plt.title("tas: original + mask")
plt.tight_layout()
plt.subplot(133)
ds_regrid.tas.isel(time=0).plot.imshow(vmin=vmin, vmax=vmax)
plt.title("tas: regridded with mask + extrapolation")
plt.tight_layout()
2.4. Bias adjusting data
NOTE
Bias adjustment in xscen
is built upon xclim.sdba
. For more information on basic usage and available methods, please consult their documentation.
2.4.1. Preparing arguments for xclim.sdba
Many optional arguments are used by xclim.sdba
during the training and adjustment processes. These options heavily depend on the bias adjustment method used, so it is recommended to consult their documentation before proceeeding further.
These arguments can be sent by using xclim_train_kwargs
and xclim_adjust_kwargs
during the call to xs.train
and xs.adjust
respectively.
[19]:
xclim_train_args = {"kind": "+", "nquantiles": 50}
xclim_adjust_args = {"detrend": 3, "interp": "linear", "extrapolation": "constant"}
2.4.2. Bias adjustment function
Bias adjustment is done through xs.train
and xs.adjust
. They are kept separate to account for cases where a voluminous dataset would require saving after the training step.
The arguments to train() are:
dref
anddhist
indicate the reference and historical datasets.var
indicates which variable to bias correct.period
defines the period used for building the transfer function.method
indicates which bias adjusting method to call withinxclim.sdba
.maximal_calendar
instructs on which calendar to use, following this hierarchy: 360_day < noleap < standard < all_leapadapt_freq
is used for bias adjusting the frequency of dry/wet days (precipitation only).jitter_under
adds a random noise under a given threshold.jitter_over
adds a random noise over a given threshold.xclim_train_kwargs
is described above.
The arguments to adjust() are:
dtrain
is the result ofbiasadjust.train
.dsim
is the simulation to bias adjust.periods
defines the period(s) to bias adjust.xclim_adjust_kwargs
is described above.
NOTE
These functions currently do not support multiple variables due to the fact that train and adjust arguments might vary. The function must be called separately for every variable.
[20]:
ds_dict = pcat.search(processing_level="regridded").to_dataset_dict()
# # Open the reference dataset, in this case ERA5-Land
ds_ref = pcat.search(processing_level="extracted", source="ERA5-Land").to_dataset()
# Currently, only a single variable can be bias adjusted at a time
variables = ["tas"]
for v in variables:
for ds in ds_dict.values():
# Train
ds_train = xs.train(
dref=ds_ref,
dhist=ds,
var=["tas"],
period=["1981", "2010"],
xclim_train_args=xclim_train_args,
)
# Adjust
ds_adj = xs.adjust(
dtrain=ds_train,
dsim=ds,
periods=["1981", "2050"],
bias_adjust_institution="Ouranos", # add new attribute cat:bias_adjust_institution
bias_adjust_project="xscen-tutorial", # add new attribute cat:bias_adjust_project
xclim_adjust_args=xclim_adjust_args,
)
# Save to zarr
filename = str(
output_folder
/ f"{ds_adj.attrs['cat:id']}.{ds_adj.attrs['cat:domain']}.{ds_adj.attrs['cat:processing_level']}.{ds_adj.attrs['cat:frequency']}.zarr"
)
chunks = xs.io.estimate_chunks(ds_adj, dims=["time"], target_mb=50)
xs.save_to_zarr(ds=ds_adj, filename=filename, rechunk=chunks, mode="o")
pcat.update_from_ds(ds_adj, path=filename, format="zarr")
--> The keys in the returned dictionary of datasets are constructed as follows:
'id.domain.processing_level.xrfreq'
[21]:
ds = pcat.search(
processing_level="regridded",
variable="tas",
id="CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1_example-region",
).to_dataset()
ds_adj = pcat.search(
processing_level="biasadjusted",
variable="tas",
id="CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1_example-region",
).to_dataset()
vmin = min(
[
float(ds_ref.tas.sel(time=slice("1981", "2010")).mean(dim="time").min()),
float(ds.tas.sel(time=slice("1981", "2010")).mean(dim="time").min()),
]
)
vmax = max(
[
float(ds_ref.tas.sel(time=slice("1981", "2010")).mean(dim="time").max()),
float(ds.tas.sel(time=slice("1981", "2010")).mean(dim="time").max()),
]
)
fig = plt.figure(figsize=(15, 5))
plt.subplot(141)
ds_ref.tas.sel(time=slice("1981", "2010")).mean(dim="time").transpose(
"lat", ...
).plot.imshow(vmin=vmin, vmax=vmax)
plt.title("tas - Ref")
plt.subplot(142)
ds.tas.sel(time=slice("1981", "2010")).mean(dim="time").transpose(
"lat", ...
).plot.imshow(vmin=vmin, vmax=vmax)
plt.title("tas - Raw")
plt.subplot(143)
ds_adj.tas.sel(time=slice("1981", "2010")).mean(dim="time").transpose(
"lat", ...
).plot.imshow(vmin=vmin, vmax=vmax)
plt.title("tas - Bias adjusted")
plt.tight_layout()
plt.subplot(144)
(
ds_adj.tas.sel(time=slice("1981", "2010")).mean(dim="time")
- ds_ref.tas.sel(time=slice("1981", "2010")).mean(dim="time")
).transpose("lat", ...).plot.imshow(vmin=-1, vmax=1, cmap="RdBu_r")
plt.title("Bias (°C)")
plt.tight_layout()
2.5. Computing indicators
NOTE
xscen
relies heavily on xclim
’s YAML support for calculating indicators. For more information on how to build the YAML file, consult this Notebook.
xs.compute_indicators
makes use of xclim’s indicator modules functionalities to compute a given list of indicators. It is called by either using:
The path to a YAML file structured in a way compatible with xclim’s
build_indicator_module_from_yaml
An indicator module directly
A sequence of indicators
A sequence of tuples as returned by calling
iter_indicators()
on an indicator module.
Same as the extraction function, since the output could have multiple frequencies, the function returns a python dictionary with the output frequency as keys. The inputs of xs.compute_indicators
are:
ds
is the xr.Dataset containing the required variables.indicators
instructs on which indicator(s) to compute. It can be a number of things, as listed above.periods
is a list of [start, end] of continuous periods to be considered.
This example will use a simple YAML file structured like this:
realm: atmos
indicators:
growing_degree_days:
base: growing_degree_days
tg_min:
base: tg_min
[22]:
ds_dict = pcat.search(processing_level="biasadjusted").to_dataset_dict()
for ds in ds_dict.values():
# Output is dict, but it has only one frequency.
_, ds_ind = xs.compute_indicators(
ds=ds,
indicators=Path().absolute() / "samples" / "indicators.yml",
).popitem()
# Save the results
filename = str(
output_folder
/ f"{ds_ind.attrs['cat:id']}.{ds_ind.attrs['cat:domain']}.{ds_ind.attrs['cat:processing_level']}.{ds_ind.attrs['cat:frequency']}.zarr"
)
chunks = xs.io.estimate_chunks(ds, dims=["time"], target_mb=50)
xs.save_to_zarr(ds_ind, filename, rechunk=chunks, mode="o")
# Strongly suggested to update the project catalog AFTER you save to disk, in case it crashes during the process
pcat.update_from_ds(ds=ds_ind, path=filename, format="zarr")
--> The keys in the returned dictionary of datasets are constructed as follows:
'id.domain.processing_level.xrfreq'
[23]:
display(ds_ind)
<xarray.Dataset> Size: 23kB Dimensions: (lat: 5, lon: 4, time: 70) Coordinates: * lat (lat) float64 40B 45.27 45.82 46.37 46.92 47.47 * lon (lon) float64 32B -74.88 -74.62 -74.38 -74.12 * time (time) object 560B 1981-01-01 00:00:00 ... 2050-01-0... Data variables: growing_degree_days (time, lon, lat) float64 11kB dask.array<chunksize=(70, 4, 5), meta=np.ndarray> tg_min (time, lon, lat) float64 11kB dask.array<chunksize=(70, 4, 5), meta=np.ndarray> Attributes: (12/25) cat:_data_format_: zarr cat:activity: ScenarioMIP cat:bias_adjust_institution: Ouranos cat:bias_adjust_project: xscen-tutorial cat:date_end: 2050-12-31 00:00:00 cat:date_start: 1981-01-01 00:00:00 ... ... comment: This is a test file created for the xscen t... history: [2024-05-07 19:19:28] regridded with argume... intake_esm_dataset_key: CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r2i... intake_esm_vars: ('tas',) regrid_method: bilinear version: v20191108
2.6. Spatio-temporal aggregation
2.6.1. Climatological operations
xs.climatological_op
is used to perform n-year operations over ds.time.dt.year
.
NOTE: The aggregation is over year, not over time. For example, if given monthly data, the climatological operation will be computed separately for January, February, etc. This means that the data should already be aggregated at the required frequency, for example using xs.compute_indicators
to compute yearly, seasonal, or monthly indicators.
The function call requires a xr.Dataset
and argument op
specifies the operation to perform. It can be any of the following: ['max', 'mean', 'median', 'min', 'std', 'sum', 'var', 'linregress']
.
The optional arguments are as follows:
window
indicates how many year to use for the average. Uses all available years by default.min_period
minimum number of years required for a value to be computed durring therolling
operation.stride
indicates the stride (in years) at which to provide an output.periods
is a list of [start, end] of continuous periods to be considered.
Additional arguments allow to control the output of the function by automatically renaming variables to reflect the operation performed, restructuring the output dataset and setting the to_level
attribute.
In the following example, we will use op='mean'
.
[24]:
ds_dict = pcat.search(processing_level="indicators").to_dataset_dict()
for key, ds in ds_dict.items():
ds_mean = xs.climatological_op(
ds=ds,
op="mean",
window=30,
stride=10,
rename_variables=False,
to_level="30yr-climatology",
horizons_as_dim=False,
)
# Save to zarr
filename = str(
output_folder
/ f"{ds_mean.attrs['cat:id']}.{ds_mean.attrs['cat:domain']}.{ds_mean.attrs['cat:processing_level']}.{ds_mean.attrs['cat:frequency']}.zarr"
)
xs.save_to_zarr(ds=ds_mean, filename=filename, mode="o")
pcat.update_from_ds(ds_mean, path=filename, format="zarr")
--> The keys in the returned dictionary of datasets are constructed as follows:
'id.domain.processing_level.xrfreq'
[25]:
display(ds_mean)
<xarray.Dataset> Size: 2kB Dimensions: (lat: 5, lon: 4, time: 5) Coordinates: * lat (lat) float64 40B 45.27 45.82 46.37 46.92 47.47 * lon (lon) float64 32B -74.88 -74.62 -74.38 -74.12 horizon (time) <U9 180B '1981-2010' '1991-2020' ... '2021-2050' * time (time) object 40B 1981-01-01 00:00:00 ... 2021-01-01... Data variables: growing_degree_days (time, lon, lat) float64 800B dask.array<chunksize=(5, 4, 5), meta=np.ndarray> tg_min (time, lon, lat) float64 800B dask.array<chunksize=(5, 4, 5), meta=np.ndarray> Attributes: (12/25) cat:_data_format_: zarr cat:activity: ScenarioMIP cat:bias_adjust_institution: Ouranos cat:bias_adjust_project: xscen-tutorial cat:date_end: 2050-01-01 00:00:00 cat:date_start: 1981-01-01 00:00:00 ... ... comment: This is a test file created for the xscen t... history: [2024-05-07 19:19:28] regridded with argume... intake_esm_dataset_key: CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r2i... intake_esm_vars: ('tg_min', 'growing_degree_days') regrid_method: bilinear version: v20191108
2.6.1.1. Horizon coordinate and time dimension
Even if no stride
is called, xs.climatological_op
will substantially change the nature of the time
dimension, because it now represents an aggregation over time. While no standards exist on how to reflect that in a dataset, the following was chosen for xscen
:
time
corresponds to the first timestep of each temporal average.horizon
is a new coordinate that either follows the format YYYY-YYYY or a warming-level specific nomenclature.The
cat:frequency
andcat:xrfreq
attributes remain unchanged.
Alternatively, setting the horizons_as_dim
argument to True will rearrange the dataset with a new dimension horizon
and a dimension named according to the temporal aggregation when it is month
or season
, but omitting the singleton dimension year
. The time stamps are conserved in the time
coordinate as an array with those new dimensions.
[26]:
print(f"time: {ds_mean.time.values}")
print(f"horizon: {ds_mean.horizon.values}")
print(f"cat:xrfreq attribute: {ds_mean.attrs['cat:xrfreq']}")
time: [cftime.DatetimeNoLeap(1981, 1, 1, 0, 0, 0, 0, has_year_zero=True)
cftime.DatetimeNoLeap(1991, 1, 1, 0, 0, 0, 0, has_year_zero=True)
cftime.DatetimeNoLeap(2001, 1, 1, 0, 0, 0, 0, has_year_zero=True)
cftime.DatetimeNoLeap(2011, 1, 1, 0, 0, 0, 0, has_year_zero=True)
cftime.DatetimeNoLeap(2021, 1, 1, 0, 0, 0, 0, has_year_zero=True)]
horizon: ['1981-2010' '1991-2020' '2001-2030' '2011-2040' '2021-2050']
cat:xrfreq attribute: YS-JAN
2.6.2. Computing deltas
xs.compute_deltas
is pretty self-explanatory. However, note that this function relies on the horizon
coordinate described above and, thus, is intended to be performed following some kind of temporal aggregation.
It has the following arguments:
reference_horizon
indicates which horizon to use as reference.kind
is either “+”, “/”, or “%” for absolute, relative, or percentage deltas respectively. This argument can also be a dictionary, with the keys corresponding to data variables.
[27]:
ds_dict = pcat.search(processing_level="30yr-climatology").to_dataset_dict()
for key, ds in ds_dict.items():
ds_delta = xs.compute_deltas(
ds=ds,
reference_horizon="1981-2010",
kind={"growing_degree_days": "%", "tg_min": "+"},
to_level="deltas",
)
# Save to zarr
filename = str(
output_folder
/ f"{ds_delta.attrs['cat:id']}.{ds_delta.attrs['cat:domain']}.{ds_delta.attrs['cat:processing_level']}.{ds_delta.attrs['cat:frequency']}.zarr"
)
chunks = xs.io.estimate_chunks(ds, dims=["time"], target_mb=50)
xs.save_to_zarr(ds=ds_delta, filename=filename, rechunk=chunks, mode="o")
pcat.update_from_ds(ds_delta, path=filename, format="zarr")
--> The keys in the returned dictionary of datasets are constructed as follows:
'id.domain.processing_level.xrfreq'
[28]:
print(f"Deltas over {ds_delta.horizon.values}")
display(ds_delta.tg_min_delta_1981_2010.isel(lon=0, lat=0).values)
Deltas over ['1981-2010' '1991-2020' '2001-2030' '2011-2040' '2021-2050']
array([0. , 0.47775064, 1.10851666, 2.00552566, 3.16497266])
2.6.3. Spatial mean
xs.spatial_mean
is used to compute the spatial average over a given region, using various methods. The argument call_clisops
can also be used to subset the domain prior to the averaging.
method: cos-lat
will perform an average operation over the spatial dimensions, accounting for changes in grid cell area along the ‘lat’ coordinate.method: interp_centroid
will perform an interpolation towards given coordinates or towards the centroid of a region.kwargs
is used to sent arguments to.interp()
, includinglon
andlat
.region
can alternatively be used to send a gridpoint, bbox, or shapefile and compute the centroid. This argument is a dictionary that follows the same requirements as the one forxs.extract
described previously.
method: xesmf
will perform a call to xESMF’s SpatialAverager. This method is the most precise, especially for irregular regions, but can be much slower.kwargs
is used to sent arguments toxesmf.SpatialAverager
.region
is used to send a bbox or shapefile to theSpatialAverager
. This argument is a dictionary that follows the same requirements as the one forxs.extract
described previously.simplify_tolerance
is a float that can be used to change the precision (in degree) of a shapefile before sending it toSpatialAverager
.
[29]:
ds_dict = pcat.search(processing_level="deltas", domain="finer-grid").to_dataset_dict()
for key, ds in ds_dict.items():
ds_savg = xs.spatial_mean(
ds=ds,
method="interp_centroid",
kwargs={"method": "linear", "lon": -74.5, "lat": 47},
to_domain="aggregated",
)
# Save to zarr
filename = str(
output_folder
/ f"{ds_savg.attrs['cat:id']}.{ds_savg.attrs['cat:domain']}.{ds_savg.attrs['cat:processing_level']}.{ds_savg.attrs['cat:frequency']}.zarr"
)
chunks = xs.io.estimate_chunks(ds, dims=["time"], target_mb=50)
xs.save_to_zarr(ds=ds_savg, filename=filename, rechunk=chunks, mode="o")
pcat.update_from_ds(ds_savg, path=filename, format="zarr")
--> The keys in the returned dictionary of datasets are constructed as follows:
'id.domain.processing_level.xrfreq'
[30]:
# Aggregated deltas over the study area
display(ds_savg)
<xarray.Dataset> Size: 316B Dimensions: (time: 5) Coordinates: horizon (time) <U9 180B dask.array<chunksize=(5,), meta=np.ndarray> * time (time) object 40B 1981-01-01 00:00:0... lon float64 8B -74.5 lat int64 8B 47 Data variables: growing_degree_days_delta_1981_2010 (time) float64 40B dask.array<chunksize=(5,), meta=np.ndarray> tg_min_delta_1981_2010 (time) float64 40B dask.array<chunksize=(5,), meta=np.ndarray> Attributes: (12/25) cat:_data_format_: zarr cat:activity: ScenarioMIP cat:bias_adjust_institution: Ouranos cat:bias_adjust_project: xscen-tutorial cat:date_end: 2021-01-01 00:00:00 cat:date_start: 1981-01-01 00:00:00 ... ... comment: This is a test file created for the xscen t... history: [2024-05-07 19:20:17] xarray.interp(**{'met... intake_esm_dataset_key: CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r2i... intake_esm_vars: ('growing_degree_days_delta_1981_2010', 'tg... regrid_method: bilinear version: v20191108
2.7. Ensemble statistics
2.7.1. Weights
Typically, if an ensemble is inhomogeneous (uneven number of realizations per model, mix of GCMs and RCMs, etc.), the first step should be to call xs.generate_weights
to create an adequate guess of what the weights should be between the various datasets. Do note, however, that this function does not replace an explicit assessment of the performance or independence of the simulations, and the results provided should be taken with a grain of salt.
The arguments are as follows:
independence_level
instruct on which weighting scheme to use and strongly influences the outputs. One of ‘model’, ‘GCM’, ‘institution’.experiment_weights
can be used to assign a given total weight to each experiment (currently only supports giving 1 to each experiment).skipna
instructs on whether the weights should account for simulations with missing data.v_for_skipna
is the variable to use in the case ofskipna=False
.standardize
to make the weights sum to 1 for each instance of ‘horizon’ or ‘time’.
NOTE
generate_weights
was built with xscen
in mind, and thus relies on the cat:
attributes automatically generated by intake-esm
when data is loaded from a catalog. In the case of data generated elsewhere, the required and recommended attributes should minimally be added to the dataset before using this function.
[31]:
# We don't have many simulations in this example to perform ensemble statistics, but we'll use the two SSP2-4.5 realizations
datasets = pcat.search(
processing_level="deltas", domain="aggregated", experiment="ssp245"
).to_dataset_dict()
weights = xs.generate_weights(datasets, independence_level="model")
display(weights)
--> The keys in the returned dictionary of datasets are constructed as follows:
'id.domain.processing_level.xrfreq'
<xarray.DataArray (realization: 2)> Size: 16B array([0.5, 0.5]) Coordinates: * realization (realization) <U88 704B 'CMIP6_ScenarioMIP_NCC_NorESM2-MM_ss...
2.7.2. Ensemble stats
xs.ensemble_stats
creates an ensemble out of many datasets and computes statistics on that ensemble (min, max, mean, percentiles, etc.) using the xclim.ensembles
module. The inputs can be given in the form of a list or a dictionary of xr.Dataset or of paths.
The arguments are as follows:
statistics
is a dictionary that instructs on whichxclim.ensembles
statistics to call. It follows the format {function: arguments}.weights
is used to weight the various datasets, if needed.common_attrs_only
:xclim.ensembles.create_ensemble
copies the attributes from the first dataset, but this might not be representative of the new ensemble. Ifcommon_attrs_only
is True, it only keeps the global attributes that are the same for all datasets and generates a new ID.create_kwargs
: If given a set of paths,xclim.ensembles.create_ensemble
will ignore the chunking on disk and open the datasets with only a chunking along the newrealization
dimension. Thus, for large datasets, this should be used to explicitely specify chunks.
[32]:
ens_stats = xs.ensemble_stats(
datasets=datasets,
statistics={
"ensemble_percentiles": {"split": False}
}, # should be an existing function in xclim.ensembles
weights=weights,
common_attrs_only=True,
)
path = output_folder / f"ensemble_{ens_stats.attrs['cat:id']}.zarr"
xs.save_to_zarr(ens_stats, filename=path, mode="o")
pcat.update_from_ds(ds=ens_stats, path=path, format="zarr")
[33]:
display(ens_stats)
<xarray.Dataset> Size: 500B Dimensions: (time: 5, percentiles: 3) Coordinates: horizon (time) <U9 180B dask.array<chunksize=(5,), meta=np.ndarray> lat int64 8B 47 lon float64 8B -74.5 * time (time) object 40B 1981-01-01 00:00:0... * percentiles (percentiles) int64 24B 10 50 90 Data variables: growing_degree_days_delta_1981_2010 (percentiles, time) float64 120B dask.array<chunksize=(3, 5), meta=np.ndarray> tg_min_delta_1981_2010 (percentiles, time) float64 120B dask.array<chunksize=(3, 5), meta=np.ndarray> Attributes: (12/20) cat:_data_format_: zarr cat:activity: ScenarioMIP cat:bias_adjust_institution: Ouranos cat:bias_adjust_project: xscen-tutorial cat:domain: aggregated cat:experiment: ssp245 ... ... comment: This is a test file created for the xscen t... intake_esm_vars: ('growing_degree_days_delta_1981_2010', 'tg... regrid_method: bilinear version: v20191108 cat:id: xscen-tutorial_CMIP6_ScenarioMIP_NCC_NorESM... ensemble_size: 2
2.8. Clean up
At any time, such as after bias adjustment, xs.clean_up
can be called to perform a number of small modifications to the datasets. That function can:
convert the variables to non-CF units using
xs.utils.change_units
call the
xs.utils.maybe_unstack
functionconvert the calendar and interpolate over missing dates
remove, remove everything but, and/or add a list of attributes
change the prefix of the catalog attrs (by default:
cat:
)
in that order.
2.8.1. Calendars
During the bias adjustment step, it is frequent to convert the calendar to ‘noleap’. However, once that step has been processed, we might want to put back all the February 29th (or other missing data in the case of ‘360_day’ calendar). This can be done using the convert_calendar_kwargs
argument of xs.clean_up
, which passes a dictionary to xclim.core.calendar.convert_calendar
.
Usually, we want to linearly interpolate the missing temperatures, but put 0 mm/day for missing precipitation. If our dataset has many variables, the missing
argument (for convert_calendar
) can be set for each variable with missing_by_var
. If missing_by_var
is given ‘interpolate’, the missing data will be filled with NaNs, then linearly interpolated over time.
Eg. {'tasmax':'interpolate', 'pr':[0]}
[34]:
convert_calendar_kwargs = {"target": "standard"}
missing_by_var = {"tas": "interpolate"}
2.8.2. Attributes
We might want to add, remove or modify the attributes.
It is possible to write a list of attributes to remove with attrs_to_remove
, or a list of attributes to keep and remove everything else with remove_all_attrs_except
. Both take the shape of a dictionnary where the keys are the variables (and ‘global’ for global attrs) and the values are the list.
The element of the list can be exact matches for the attribute names or use the same regex matching rules as intake_esm
:
ending with a ‘*’ means checks if the substring is contained in the string
starting with a ‘^’ means check if the string starts with the substring.
Attributes can also be added to datasets using add_attrs
. This is a dictionary where the keys are the variables and the values are a another dictionary of attributes.
It is also possible to modify the catalogue prefix ‘cat:’ by a new string with change_attr_prefix
. Don’t use this if this is not the last step of your workflow.
[35]:
attrs_to_remove = {
"tas": ["name*"]
} # remove tas attrs that contain the substring 'name'
remove_all_attrs_except = {
"global": ["^cat:"]
} # remove all the global attrs EXCEPT for the one starting with cat:
add_attrs = {
"tas": {"notes": "some crucial information"}
} # add a new tas attribute named 'notes' with value 'some crucial information'
change_attr_prefix = "dataset:" # change /cat to dataset:
[36]:
ds = pcat.search(
processing_level="biasadjusted", variable="tas", experiment="ssp245", member="r1.*"
).to_dataset()
ds_clean = xs.clean_up(
ds=ds,
variables_and_units={"tas": "degC"}, # convert units
convert_calendar_kwargs=convert_calendar_kwargs,
missing_by_var=missing_by_var,
attrs_to_remove=attrs_to_remove,
remove_all_attrs_except=remove_all_attrs_except,
add_attrs=add_attrs,
change_attr_prefix=change_attr_prefix,
)
[37]:
from xclim.core.calendar import get_calendar
# Inspect calendars and the interpolated values
print("Initial calendar: ", get_calendar(ds.time))
print(ds.time.sel(time=slice("2000-02-28", "2000-03-01")).values)
print(ds.tas.sel(time=slice("2000-02-28", "2000-03-01")).isel(lat=1, lon=1).values)
print("Final calendar: ", get_calendar(ds_clean.time))
print(ds_clean.time.sel(time=slice("2000-02-28", "2000-03-01")).values)
print(
ds_clean.tas.sel(time=slice("2000-02-28", "2000-03-01")).isel(lat=1, lon=1).values
)
print("")
print("Inspect initial attributes")
display(ds)
print("")
print("Inspect final attributes")
display(ds_clean)
Initial calendar: noleap
[cftime.DatetimeNoLeap(2000, 2, 28, 0, 0, 0, 0, has_year_zero=True)
cftime.DatetimeNoLeap(2000, 3, 1, 0, 0, 0, 0, has_year_zero=True)]
[10.1320762 8.56638259]
Final calendar: standard
[cftime.DatetimeGregorian(2000, 2, 28, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(2000, 2, 29, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(2000, 3, 1, 0, 0, 0, 0, has_year_zero=False)]
[10.1320762 9.34922939 8.56638259]
Inspect initial attributes
<xarray.Dataset> Size: 4MB Dimensions: (time: 25550, lon: 4, lat: 5) Coordinates: * lat (lat) float64 40B 45.27 45.82 46.37 46.92 47.47 * lon (lon) float64 32B -74.88 -74.62 -74.38 -74.12 * time (time) object 204kB 1981-01-01 00:00:00 ... 2050-12-31 00:00:00 Data variables: tas (time, lon, lat) float64 4MB dask.array<chunksize=(25550, 4, 5), meta=np.ndarray> Attributes: (12/25) cat:_data_format_: zarr cat:activity: ScenarioMIP cat:bias_adjust_institution: Ouranos cat:bias_adjust_project: xscen-tutorial cat:date_end: 2050-12-31 00:00:00 cat:date_start: 1981-01-01 00:00:00 ... ... comment: This is a test file created for the xscen t... history: [2024-05-07 19:19:29] regridded with argume... intake_esm_dataset_key: CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i... intake_esm_vars: ['tas'] regrid_method: bilinear version: v20191108
Inspect final attributes
<xarray.Dataset> Size: 4MB Dimensions: (time: 25567, lon: 4, lat: 5) Coordinates: * lat (lat) float64 40B 45.27 45.82 46.37 46.92 47.47 * lon (lon) float64 32B -74.88 -74.62 -74.38 -74.12 * time (time) object 205kB 1981-01-01 00:00:00 ... 2050-12-31 00:00:00 Data variables: tas (time, lon, lat) float64 4MB dask.array<chunksize=(25567, 4, 5), meta=np.ndarray> Attributes: (12/19) dataset:_data_format_: zarr dataset:activity: ScenarioMIP dataset:bias_adjust_institution: Ouranos dataset:bias_adjust_project: xscen-tutorial dataset:date_end: 2050-12-31 00:00:00 dataset:date_start: 1981-01-01 00:00:00 ... ... dataset:path: /home/docs/checkouts/readthedocs.org/us... dataset:processing_level: biasadjusted dataset:source: NorESM2-MM dataset:type: simulation dataset:variable: tas dataset:xrfreq: D