{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Package Comparison\n", "\n", "*Martin Vonk - 2023*\n", "\n", "This notebooks compares the calculated drought indices to other (Python) packages or time series retrieved from other locations.\n", "Current comparisons include: \n", "* standard_precip (Python)\n", "* climate_indices (Python)\n", "* pastas (Python)\n", "* SPEI (R)\n", "\n", "Please note that it can be difficult to install these packages. SPEI (R) requires the R library. Pastas depends on Numba which has strict requirements for NumPy. Climate Indices only supports Python 3.11 and lower. Therefore running this notebook can be cumbersome.\n", "\n", "Future comparisons:\n", "* [KNMI](https://gitlab.com/KNMI-OSS/climexp/climexp_numerical/-/blob/be0f081a9d62856e4c52a370e70fec2ddfc45cfa/src/calcSPI3.f)\n", "\n", "## Required packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import scipy.stats as scs\n", "\n", "import spei as si\n", "\n", "print(si.show_versions())" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Read Precipitation Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"data/DEBILT.csv\", index_col=0, parse_dates=True)\n", "df.index.name = \"date\"\n", "prec = df[\"Prec [m/d] 260_DEBILT\"].multiply(1e3).rename(\"rain\")\n", "head = df[\"Head [m] B32C0572_DEBILT\"].rename(\"B32C0572\").dropna()\n", "\n", "_ = prec.plot(grid=True, linewidth=0.5, title=\"Precipitation\", figsize=(6.5, 4))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# get rolling sum\n", "prec_rsum = prec.resample(\"ME\").sum()\n", "_ = prec_rsum.plot(\n", " grid=True, linewidth=0.5, title=\"Precipitation, monthly sum\", figsize=(6.5, 4)\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Compute Standardized Precipitation Index\n", "\n", "### Using SPEI package" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spi = si.spi(prec_rsum, dist=scs.gamma, prob_zero=True, timescale=3, fit_freq=\"ME\")\n", "spi # pandas Series" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Using standard_precip package" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from standard_precip import spi as sp_spi\n", "\n", "# standard_precip also needs rolling sum dataframe, even though you provide freq=\"M\" and scale = 1\n", "precdf = prec_rsum.to_frame().reset_index().copy()\n", "\n", "# initialize spi\n", "standardp_spi_inst = sp_spi.SPI()\n", "\n", "# caclulate index with many parameters\n", "standardp_spi = standardp_spi_inst.calculate(\n", " precdf,\n", " date_col=\"date\",\n", " precip_cols=\"rain\",\n", " freq=\"M\",\n", " scale=3, # note that scale is not the same for the standard deviation in SciPy\n", " fit_type=\"mle\",\n", " dist_type=\"gam\",\n", ")\n", "standardp_spi.index = standardp_spi.loc[\n", " :, \"date\"\n", "].values # create datetimeindex because date had to be a column\n", "\n", "standardp_spi # pandas DataFrame" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Using climate_indices package\n", "\n", "Previously there was a significant difference beteween the SPEI and climate_indices package, not sure why. I thought it had something to do with the fitting method used for the gamma distribution. In issue [#61](https://github.com/martinvonk/SPEI/issues/61) it was mentioned that the same outcome could be achieved. However, I found it difficult to install `climate_indces` due to lack of support (for newer python versions)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# from climate_indices.compute import scale_values, Periodicity\n", "# from climate_indices import compute, indices, utils" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# initial_year = prec_rsum.index[0].year\n", "# calibration_year_initial = prec_rsum.index[0].year\n", "# calibration_year_final = prec_rsum.index[-1].year\n", "# period_times = 366\n", "# scale = 1\n", "# periodicity = compute.Periodicity.daily\n", "\n", "# values = prec_rsum.values\n", "\n", "# scaled_values = compute.scale_values(\n", "# values,\n", "# scale=scale,\n", "# periodicity=periodicity,\n", "# )\n", "\n", "# alphas, betas = compute.gamma_parameters(\n", "# scaled_values,\n", "# data_start_year=initial_year,\n", "# calibration_start_year=calibration_year_initial,\n", "# calibration_end_year=calibration_year_final,\n", "# periodicity=periodicity,\n", "# )\n", "\n", "# gamma_params = {\"alpha\": alphas, \"beta\": betas}\n", "\n", "# spival = indices.spi(\n", "# values,\n", "# scale=scale,\n", "# distribution=indices.Distribution.gamma,\n", "# data_start_year=initial_year,\n", "# calibration_year_initial=calibration_year_initial,\n", "# calibration_year_final=calibration_year_final,\n", "# periodicity=compute.Periodicity.daily,\n", "# fitting_params=gamma_params,\n", "# )\n", "\n", "# climateind_spi = pd.Series(spival, index=prec_rsum.index, name=\"Climate Index SPI\")\n", "# climateind_spi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using SPEI R package" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from rpy2.robjects import pandas2ri\n", "from rpy2.robjects.packages import importr\n", "\n", "sr = importr(\"SPEI\")\n", "\n", "with pandas2ri.converter.context(): # pandas2ri.activate()\n", " spir_res = sr.spi(prec_rsum.values, scale=3)\n", "\n", "r_spi = pd.Series(spir_res[2].ravel(), index=prec_rsum.index, name=\"SPI\")\n", "r_spi" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Plot and compare" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "f, ax = plt.subplot_mosaic(\n", " [[\"SPI\"], [\"DIFF\"]],\n", " figsize=(8, 4),\n", " sharex=True,\n", " height_ratios=[2, 1],\n", ")\n", "spi.plot(ax=ax[\"SPI\"], grid=True, linestyle=\"-\", label=\"SPI\")\n", "standardp_spi.iloc[:, -1].plot(\n", " ax=ax[\"SPI\"],\n", " color=\"C1\",\n", " grid=True,\n", " linestyle=\"--\",\n", " label=\"standard_precip\",\n", ")\n", "# climateind_spi.plot(\n", "# ax=ax[\"SPI\"], color=\"C2\", grid=True, linestyle=\":\", label=\"climate_indices\"\n", "# )\n", "# r_spi.plot(ax=ax[\"SPI\"], color=\"C2\", grid=True, linestyle=\":\", label=\"R package\")\n", "\n", "(ax[\"SPI\"].set_ylim(-3.5, 3.5),)\n", "(ax[\"SPI\"].set_title(\"Comparison\"),)\n", "(ax[\"SPI\"].set_ylabel(\"SPI\"),)\n", "ax[\"SPI\"].legend(ncol=3)\n", "\n", "(spi - standardp_spi.iloc[:, -1]).plot(\n", " ax=ax[\"DIFF\"], color=\"C4\", label=\"SPEI - standard_precip\", grid=True\n", ")\n", "# (spi - r_spi).plot(ax=ax[\"DIFF\"], color=\"C3\", label=\"SPEI - R Package\")\n", "\n", "# ax[\"DIFF1\"].set_ylim(-0.05, 0.05)\n", "ax[\"DIFF\"].legend(ncol=2)\n", "ax[\"DIFF\"].set_title(\"SPEI minus other package\")\n", "ax[\"DIFF\"].set_ylabel(\"Difference\")\n", "ax[\"DIFF\"].set_xlim(\"1996\", \"1999\")\n", "f.tight_layout()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Difference is very small between SPEI an the standard_precip package.\n", "\n", "The standard_precip package does not explicitely support the Standardized Precipitation Evaporation Index, as far as I can see. However, the SPI class in standard_precip could probably be used, even though the naming of `precip_cols` is not universal. In general, the standard_precip package needs much more keyword arguments while the SPEI package makes more use of all the nice logic already available in SciPy and Pandas.\n", "\n", "The climate_indices package needs even more code.\n", "\n", "The SPEI R package also has a similar result but seems to vary a bit more. More research is needed to understand why that is the case. Most likely is the differences in fitting the gamma distribution." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Compute Standardized Groundwater Index" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pastas as ps\n", "\n", "sgi = si.sgi(head, fit_freq=\"ME\")\n", "sgi_pastas = ps.stats.sgi(head)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pd.concat([sgi, sgi_pastas], axis=1).rename(columns={0: \"SGI\", \"head\": \"Pastas\"})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "f, ax = plt.subplot_mosaic(\n", " [[\"SGI\"], [\"DIFF\"]],\n", " figsize=(8, 4),\n", " sharex=True,\n", " height_ratios=[2, 1],\n", ")\n", "sgi.plot(ax=ax[\"SGI\"], grid=True, linestyle=\"-\", label=\"SGI\")\n", "sgi_pastas.plot(ax=ax[\"SGI\"], color=\"C1\", grid=True, linestyle=\"--\", label=\"pastas\")\n", "(ax[\"SGI\"].set_ylim(-3.5, 3.5),)\n", "(ax[\"SGI\"].set_title(\"Comparison\"),)\n", "(ax[\"SGI\"].set_ylabel(\"SGI\"),)\n", "ax[\"SGI\"].legend(ncol=3)\n", "\n", "(sgi - sgi_pastas).plot(ax=ax[\"DIFF\"], color=\"C3\", label=\"SGI - pastas\")\n", "\n", "ax[\"DIFF\"].legend(ncol=2)\n", "ax[\"DIFF\"].set_title(\"SPEI minus other package\")\n", "ax[\"DIFF\"].set_ylabel(\"Difference\")\n", "ax[\"DIFF\"].set_xlim(\"1996\", \"1999\")\n", "f.tight_layout()" ] } ], "metadata": { "kernelspec": { "display_name": "SPEI", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }