{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Package Comparison\n",
    "\n",
    "*Martin Vonk - 2023*\n",
    "\n",
    "This notebooks compares the calculated drought indices to other (Python) packages or time series retrieved from other locations.\n",
    "Current comparisons include: \n",
    "* standard_precip (Python)\n",
    "* climate_indices (Python)\n",
    "* pastas (Python)\n",
    "* SPEI (R)\n",
    "\n",
    "Please note that it can be difficult to install these packages. SPEI (R) requires the R library. Pastas depends on Numba which has strict requirements for NumPy. Climate Indices only supports Python 3.11 and lower. Therefore running this notebook can be cumbersome.\n",
    "\n",
    "Future comparisons:\n",
    "* [KNMI](https://gitlab.com/KNMI-OSS/climexp/climexp_numerical/-/blob/be0f081a9d62856e4c52a370e70fec2ddfc45cfa/src/calcSPI3.f)\n",
    "\n",
    "## Required packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "import scipy.stats as scs\n",
    "\n",
    "import spei as si\n",
    "\n",
    "print(si.show_versions())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Read Precipitation Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"data/DEBILT.csv\", index_col=0, parse_dates=True)\n",
    "df.index.name = \"date\"\n",
    "prec = df[\"Prec [m/d] 260_DEBILT\"].multiply(1e3).rename(\"rain\")\n",
    "head = df[\"Head [m] B32C0572_DEBILT\"].rename(\"B32C0572\").dropna()\n",
    "\n",
    "_ = prec.plot(grid=True, linewidth=0.5, title=\"Precipitation\", figsize=(6.5, 4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get rolling sum\n",
    "prec_rsum = prec.resample(\"ME\").sum()\n",
    "_ = prec_rsum.plot(\n",
    "    grid=True, linewidth=0.5, title=\"Precipitation, monthly sum\", figsize=(6.5, 4)\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compute Standardized Precipitation Index\n",
    "\n",
    "### Using SPEI package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spi = si.spi(prec_rsum, dist=scs.gamma, prob_zero=True, timescale=3, fit_freq=\"ME\")\n",
    "spi  # pandas Series"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using standard_precip package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from standard_precip import spi as sp_spi\n",
    "\n",
    "# standard_precip also needs rolling sum dataframe, even though you provide freq=\"M\" and scale = 1\n",
    "precdf = prec_rsum.to_frame().reset_index().copy()\n",
    "\n",
    "# initialize spi\n",
    "standardp_spi_inst = sp_spi.SPI()\n",
    "\n",
    "# caclulate index with many parameters\n",
    "standardp_spi = standardp_spi_inst.calculate(\n",
    "    precdf,\n",
    "    date_col=\"date\",\n",
    "    precip_cols=\"rain\",\n",
    "    freq=\"M\",\n",
    "    scale=3,  # note that scale is not the same for the standard deviation in SciPy\n",
    "    fit_type=\"mle\",\n",
    "    dist_type=\"gam\",\n",
    ")\n",
    "standardp_spi.index = standardp_spi.loc[\n",
    "    :, \"date\"\n",
    "].values  # create datetimeindex because date had to be a column\n",
    "\n",
    "standardp_spi  # pandas DataFrame"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using climate_indices package\n",
    "\n",
    "Previously there was a significant difference beteween the SPEI and climate_indices package, not sure why. I thought it had something to do with the fitting method used for the gamma distribution. In issue [#61](https://github.com/martinvonk/SPEI/issues/61) it was mentioned that the same outcome could be achieved. However, I found it difficult to install `climate_indces` due to lack of support (for newer python versions)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# from climate_indices.compute import scale_values, Periodicity\n",
    "# from climate_indices import compute, indices, utils"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# initial_year = prec_rsum.index[0].year\n",
    "# calibration_year_initial = prec_rsum.index[0].year\n",
    "# calibration_year_final = prec_rsum.index[-1].year\n",
    "# period_times = 366\n",
    "# scale = 1\n",
    "# periodicity = compute.Periodicity.daily\n",
    "\n",
    "# values = prec_rsum.values\n",
    "\n",
    "# scaled_values = compute.scale_values(\n",
    "#     values,\n",
    "#     scale=scale,\n",
    "#     periodicity=periodicity,\n",
    "# )\n",
    "\n",
    "# alphas, betas = compute.gamma_parameters(\n",
    "#     scaled_values,\n",
    "#     data_start_year=initial_year,\n",
    "#     calibration_start_year=calibration_year_initial,\n",
    "#     calibration_end_year=calibration_year_final,\n",
    "#     periodicity=periodicity,\n",
    "# )\n",
    "\n",
    "# gamma_params = {\"alpha\": alphas, \"beta\": betas}\n",
    "\n",
    "# spival = indices.spi(\n",
    "#     values,\n",
    "#     scale=scale,\n",
    "#     distribution=indices.Distribution.gamma,\n",
    "#     data_start_year=initial_year,\n",
    "#     calibration_year_initial=calibration_year_initial,\n",
    "#     calibration_year_final=calibration_year_final,\n",
    "#     periodicity=compute.Periodicity.daily,\n",
    "#     fitting_params=gamma_params,\n",
    "# )\n",
    "\n",
    "# climateind_spi = pd.Series(spival, index=prec_rsum.index, name=\"Climate Index SPI\")\n",
    "# climateind_spi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using SPEI R package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rpy2.robjects import pandas2ri\n",
    "from rpy2.robjects.packages import importr\n",
    "\n",
    "sr = importr(\"SPEI\")\n",
    "\n",
    "with pandas2ri.converter.context():  # pandas2ri.activate()\n",
    "    spir_res = sr.spi(prec_rsum.values, scale=3)\n",
    "\n",
    "r_spi = pd.Series(spir_res[2].ravel(), index=prec_rsum.index, name=\"SPI\")\n",
    "r_spi"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plot and compare"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f, ax = plt.subplot_mosaic(\n",
    "    [[\"SPI\"], [\"DIFF\"]],\n",
    "    figsize=(8, 4),\n",
    "    sharex=True,\n",
    "    height_ratios=[2, 1],\n",
    ")\n",
    "spi.plot(ax=ax[\"SPI\"], grid=True, linestyle=\"-\", label=\"SPI\")\n",
    "standardp_spi.iloc[:, -1].plot(\n",
    "    ax=ax[\"SPI\"],\n",
    "    color=\"C1\",\n",
    "    grid=True,\n",
    "    linestyle=\"--\",\n",
    "    label=\"standard_precip\",\n",
    ")\n",
    "# climateind_spi.plot(\n",
    "#     ax=ax[\"SPI\"], color=\"C2\", grid=True, linestyle=\":\", label=\"climate_indices\"\n",
    "# )\n",
    "# r_spi.plot(ax=ax[\"SPI\"], color=\"C2\", grid=True, linestyle=\":\", label=\"R package\")\n",
    "\n",
    "(ax[\"SPI\"].set_ylim(-3.5, 3.5),)\n",
    "(ax[\"SPI\"].set_title(\"Comparison\"),)\n",
    "(ax[\"SPI\"].set_ylabel(\"SPI\"),)\n",
    "ax[\"SPI\"].legend(ncol=3)\n",
    "\n",
    "(spi - standardp_spi.iloc[:, -1]).plot(\n",
    "    ax=ax[\"DIFF\"], color=\"C4\", label=\"SPEI - standard_precip\", grid=True\n",
    ")\n",
    "# (spi - r_spi).plot(ax=ax[\"DIFF\"], color=\"C3\", label=\"SPEI - R Package\")\n",
    "\n",
    "# ax[\"DIFF1\"].set_ylim(-0.05, 0.05)\n",
    "ax[\"DIFF\"].legend(ncol=2)\n",
    "ax[\"DIFF\"].set_title(\"SPEI minus other package\")\n",
    "ax[\"DIFF\"].set_ylabel(\"Difference\")\n",
    "ax[\"DIFF\"].set_xlim(\"1996\", \"1999\")\n",
    "f.tight_layout()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Difference is very small between SPEI an the standard_precip package.\n",
    "\n",
    "The standard_precip package does not explicitely support the Standardized Precipitation Evaporation Index, as far as I can see. However, the SPI class in standard_precip could probably be used, even though the naming of `precip_cols` is not universal. In general, the standard_precip package needs much more keyword arguments while the SPEI package makes more use of all the nice logic already available in SciPy and Pandas.\n",
    "\n",
    "The climate_indices package needs even more code.\n",
    "\n",
    "The SPEI R package also has a similar result but seems to vary a bit more. More research is needed to understand why that is the case. Most likely is the differences in fitting the gamma distribution."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compute Standardized Groundwater Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pastas as ps\n",
    "\n",
    "sgi = si.sgi(head, fit_freq=\"ME\")\n",
    "sgi_pastas = ps.stats.sgi(head)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.concat([sgi, sgi_pastas], axis=1).rename(columns={0: \"SGI\", \"head\": \"Pastas\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f, ax = plt.subplot_mosaic(\n",
    "    [[\"SGI\"], [\"DIFF\"]],\n",
    "    figsize=(8, 4),\n",
    "    sharex=True,\n",
    "    height_ratios=[2, 1],\n",
    ")\n",
    "sgi.plot(ax=ax[\"SGI\"], grid=True, linestyle=\"-\", label=\"SGI\")\n",
    "sgi_pastas.plot(ax=ax[\"SGI\"], color=\"C1\", grid=True, linestyle=\"--\", label=\"pastas\")\n",
    "(ax[\"SGI\"].set_ylim(-3.5, 3.5),)\n",
    "(ax[\"SGI\"].set_title(\"Comparison\"),)\n",
    "(ax[\"SGI\"].set_ylabel(\"SGI\"),)\n",
    "ax[\"SGI\"].legend(ncol=3)\n",
    "\n",
    "(sgi - sgi_pastas).plot(ax=ax[\"DIFF\"], color=\"C3\", label=\"SGI - pastas\")\n",
    "\n",
    "ax[\"DIFF\"].legend(ncol=2)\n",
    "ax[\"DIFF\"].set_title(\"SPEI minus other package\")\n",
    "ax[\"DIFF\"].set_ylabel(\"Difference\")\n",
    "ax[\"DIFF\"].set_xlim(\"1996\", \"1999\")\n",
    "f.tight_layout()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "SPEI",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}