{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Distributions\n", "\n", "*Martin Vonk - 2025*\n", "\n", "This notebook guides you through the process of selecting and fitting probability distributions to time series data, with the aim of calculating a Standardized Drought Index. It is divided into three main parts:\n", "\n", "1. SciPy distributions: understanding continuous distribution functions available in `scipy.stats`\n", "2. Comparing data and distributions: fit candidate distributions to data and assess their suitability\n", "3. Flexible time scales and distribution fitting: apply fitting procedures to different time scales and understand their impact on the standardized index\n", "\n", "This notebook is meant to improve understanding of the statistical underpinnings behind drought index modeling. While some backend code is shown, you don’t need to use of it to compute an SDI in practice.\n", "\n", "## Prerequisites\n", "\n", "### Packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from calendar import month_name\n", "from typing import Literal\n", "\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import scipy.stats as sps\n", "from matplotlib import patheffects\n", "from scipy.stats._survival import EmpiricalDistributionFunction\n", "\n", "import spei as si # si for standardized index\n", "\n", "print(si.show_versions())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Helper functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_ecdf(\n", " ax: plt.Axes,\n", " data: pd.Series,\n", " ecdf: EmpiricalDistributionFunction,\n", " s: float,\n", " color: str,\n", " label: str,\n", " cdf: pd.Series | None = None,\n", " **kwargs,\n", ") -> None:\n", " data = data.drop_duplicates()\n", " ax.scatter(\n", " data,\n", " ecdf.probabilities,\n", " s=s,\n", " facecolor=color,\n", " label=label,\n", " **kwargs,\n", " )\n", " if cdf is not None:\n", " for idata, icdf, iecdf in zip(data, cdf, ecdf.probabilities):\n", " ax.plot(\n", " [idata, idata],\n", " [iecdf, icdf],\n", " color=color,\n", " linewidth=0.5,\n", " **kwargs,\n", " )\n", "\n", "\n", "def axes_indicator(\n", " ax: plt.Axes,\n", " letter: str,\n", " x: float,\n", " y: float,\n", " ha: Literal[\"left\", \"right\"],\n", " va: Literal[\"top\", \"bottom\"],\n", ") -> None:\n", " \"\"\"Add an indicator to the axes.\"\"\"\n", " ax.annotate(\n", " f\"({letter})\",\n", " xy=(x, y),\n", " xycoords=\"axes fraction\",\n", " fontsize=mpl.rcParams[\"axes.titlesize\"],\n", " horizontalalignment=ha,\n", " verticalalignment=va,\n", " path_effects=[\n", " patheffects.Stroke(linewidth=1, foreground=\"white\"),\n", " patheffects.Normal(),\n", " ],\n", " )\n", "\n", "\n", "def plot_box(ax: plt.Axes, dist: si.dist.Dist, bbox_edgecolor: str = \"k\") -> None:\n", " textstr = f\"{dist.dist.name}\\nparameters\\n\"\n", " textstr += f\"c = {dist.pars[0]:0.2f}\\n\"\n", " textstr += f\"loc = {dist.loc:0.1f}\\n\"\n", " textstr += f\"scale = {dist.scale:0.1f}\"\n", " ax.text(\n", " xmax - bin * 1.5,\n", " 0.05,\n", " textstr,\n", " fontsize=mpl.rcParams[\"legend.fontsize\"] - 0.5,\n", " horizontalalignment=\"right\",\n", " verticalalignment=\"bottom\",\n", " bbox=dict(\n", " boxstyle=\"round\", facecolor=\"white\", alpha=0.5, edgecolor=bbox_edgecolor\n", " ),\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Time series data\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dfi = pd.read_csv(\n", " \"data/CABAUW.csv\",\n", " index_col=0,\n", " parse_dates=True,\n", ")\n", "surplus = (dfi[\"prec\"] - dfi[\"evap\"]).rename(\"surplus\")\n", "surplusm = surplus.resample(\"MS\").sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SciPy distributions\n", "\n", "`scipy.stats` contains a wide array of probability distributions that you can use to model data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scipy_continious_dists = sps._continuous_distns.__all__\n", "print(f\"Number of continuous distributions: {len(scipy_continious_dists)}\")\n", "print(f\"Examples: {scipy_continious_dists[0:10]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets inspect one of the available distributions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pearson3 = sps.pearson3\n", "pearson3?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As can be seen the continuous distribtion has a lot of different methods. Most important for this notebook is the fit method. The fit method, by default, uses a [maximum likelihood estimate](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.fit.html#scipy.stats.rv_continuous.fit). The method returns the shape parameters (loc, scale and others) of the distribution. Lets see what that looks like for one month of data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "surplus_feb = surplusm[surplusm.index.month == 2].sort_values()\n", "\n", "fit_parameters = pearson3.fit(surplus_feb)\n", "fit_cdf = pearson3.cdf(surplus_feb, *fit_parameters)\n", "print(\n", " \"Fit parameters pearson3: loc={:.2f}, scale={:.1f}, skew={:.1f}\".format(\n", " *fit_parameters\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compare the fitted distribution to the Emperical Cumulative Density Function (similar to the cumulative histogram)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "f, ax = plt.subplots(figsize=(5.0, 3.5))\n", "\n", "ax.plot(surplus_feb, fit_cdf, color=\"C1\", label=(\"Fitted pearson3 CDF\"))\n", "e_cdf = sps.ecdf(surplus_feb).cdf\n", "plot_ecdf(\n", " ax=ax, data=surplus_feb, ecdf=e_cdf, s=10, color=\"C0\", label=\"ECDF\", cdf=fit_cdf\n", ")\n", "ax.hist(\n", " surplus_feb,\n", " bins=e_cdf.quantiles,\n", " cumulative=True,\n", " density=True,\n", " alpha=0.3,\n", " color=\"C0\",\n", " label=\"Cumulative histogram\",\n", ")\n", "ax.legend()\n", "ax.grid(True)\n", "ax.set_xlim(-20.0, 100.0)\n", "ax.set_ylim(0.0, 1.0)\n", "ax.yaxis.set_major_locator(mpl.ticker.MultipleLocator(0.1))\n", "ax.xaxis.set_major_locator(mpl.ticker.MultipleLocator(10.0))\n", "ax.set_xlabel(\"Precipitation surplus [mm]\")\n", "ax.set_ylabel(\"Cumulative probability\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the SPEI package there is a little wrapper around this dist class that helps with doing some analysis on the distribution. For instance the two-sided Kolmogorov-Smirnov test for goodness of fit. The null hypothesis two-sided test is that the two distributions are identical, the alternative is that they are not identical." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pearson3_pvalue = si.dist.Dist(surplus_feb, pearson3).ks_test()\n", "print(f\"Pearson3 Kolmogorov-Smirnov test p-value: {pearson3_pvalue:0.3f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Say we choose a confidence level of 95%; that is, we will reject the null hypothesis in favor of the alternative if the p-value is less than 0.05. For e.g. march the p-value is lower than our threshold of 0.05, so we reject the null hypothesis in favor of the default “two-sided” alternative: the data are not distributed according to the fitted pearson3 distribution. But not finding the appropriate distribution is one of the big uncertainties of the standardized index method. However, not a perfect fit does not mean this distribution is not fit-for-purpose of calculating a drought index. That is up to the modeller to decide. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparing data and distributions\n", "\n", "We'll reproduce some steps that are normally done internally. Therefor we need to create the SI class and fit the distribution. Well do that for two distributions, the pearson3 and fisk distribution. We'll make a subplot per month and compare the fitted distribution." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spei_pearson3 = si.SI(surplusm, dist=pearson3)\n", "spei_pearson3.fit_distribution()\n", "\n", "spei_fisk = si.SI(surplusm, dist=sps.fisk)\n", "spei_fisk.fit_distribution()\n", "\n", "f, axl = plt.subplots(\n", " 4, 3, figsize=(9.0, 9.0), sharey=True, sharex=False, layout=\"tight\"\n", ")\n", "axsr = axl.ravel()\n", "for (date, distf), (_, distp) in zip(\n", " spei_fisk._dist_dict.items(), spei_pearson3._dist_dict.items()\n", "):\n", " i = date.month - 1\n", " cdf_fisk = distf.cdf().sort_values()\n", " cdf_pearson3 = distp.cdf().sort_values()\n", " data = distf.data.loc[cdf_fisk.index]\n", " e_cdf = sps.ecdf(data).cdf\n", " plot_ecdf(ax=axsr[i], data=data, ecdf=e_cdf, s=2.0, color=\"C3\", label=\"ECDF\")\n", " axsr[i].plot(\n", " data.values,\n", " cdf_fisk.values,\n", " color=\"C0\",\n", " linewidth=1.5,\n", " label=f\"Fisk {distf.ks_test():0.2f}\",\n", " )\n", " axsr[i].plot(\n", " data.values,\n", " cdf_pearson3.values,\n", " color=\"C1\",\n", " linewidth=1.5,\n", " label=f\"Pearson3 {distp.ks_test():0.2f}\",\n", " linestyle=\"--\",\n", " )\n", " axsr[i].grid(True)\n", " axsr[i].legend(fontsize=6, title=month_name[date.month], title_fontsize=8)\n", "\n", "_ = [ax.set_ylabel(\"Cumulative\\nprobability [-]\") for ax in axl[:, 0]]\n", "_ = [ax.set_xlabel(\"Surplus [mm/d]\") for ax in axl[-1, :]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both the Fisk and Pearson3 distirbutions seem to describe the precipitation surplus pretty well visually. The p-value of the KS-test, as shown in the legend, is more than 0.05 for all months and distributions however. This might not be an issue if the series are fit for purpose. For instance, the pearson3 distirbutions seems to fit a better for larger surplus numbers. Only for November the Fisk distribution looks significantly better/smoother than the pearson3 distribution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Flexible time scales and distribution fitting\n", "Meteorological and hydrological time series are nowadays typically available at a daily frequency. To accommodate this, the `timescale` argument in the drought index function is designed to be flexible, with units that match the frequency of the input time series. For example, when using daily data, a `timescale` value of `30` corresponds approximately to a one-month drought index, `90` for three months, `180` for six months, and so on.\n", "\n", "The frequency at which distributions are fitted (`fit_freq`) determines how many different distributions are fitted. With a daily fit frequency (`fit_freq=\"D\"`), one distribution is fitted for every day of the year — 365 or 366 in total, depending on leap years. In contrast, a monthly fit (`fit_freq=\"MS\"` or `\"ME\"`) fits a distribution for every month of the year; 12 in total. Although daily fitting is more computationally intensive, it can yield more precise results, as shown in later sections. Therefore, the number of data points available for each distribution fit depends on both `fit_freq`, the frequency, and the time length of the time series. For instance, with 30 years of monthly data and `fit_freq=\"MS\"`, each monthly distribution is based on 30 data points. However, fitting a distribution to just 30 values can be challenging — especially for daily data, which is more prone to noise and outliers. By default, the package attempts to infer `fit_freq` based on the time series frequency. If inference fails, it defaults to a monthly fit. Users can also specify `fit_freq` manually for full control.\n", "\n", "To improve fit stability, the `fit_window` argument allows users to include additional data points around each time step. The window size is specified in the same units as the fit frequency. For example for daily data and fit frequency, `fit_window=3` includes data from the day before and after a given date (e.g., March 14th–16th for March 15th) for the fitting of the distribution. A `fit_window=31` for daily data provides a sample size similar to monthly fitting, while retaining daily resolution. Though experimental, this fit window feature has shown to improve the robustness of daily fits. A fit window with an even number is automaticaly transformed to an odd number." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "timescale = 30\n", "\n", "spei_d = si.SI(\n", " series=surplus,\n", " dist=sps.fisk,\n", " timescale=timescale,\n", " fit_freq=\"D\",\n", " fit_window=0,\n", ")\n", "spei_d.fit_distribution()\n", "\n", "fit_window_dw = 31\n", "spei_dw = si.SI(\n", " series=surplus,\n", " dist=sps.fisk,\n", " timescale=timescale,\n", " fit_freq=\"D\",\n", " fit_window=fit_window_dw,\n", ")\n", "spei_dw.fit_distribution()\n", "\n", "spei_m = si.SI(\n", " series=surplus,\n", " dist=sps.fisk,\n", " timescale=timescale,\n", " fit_freq=\"MS\",\n", " fit_window=0,\n", ")\n", "spei_m.fit_distribution()\n", "\n", "speid = spei_d.norm_ppf()\n", "speidw = spei_dw.norm_ppf()\n", "speim = spei_m.norm_ppf()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "month = 4\n", "ts_d = pd.Timestamp(f\"2000-{month}-15\")\n", "ts_m = pd.Timestamp(f\"2000-{month}-01\")\n", "dist_d = spei_d._dist_dict[ts_d]\n", "dist_dw = spei_dw._dist_dict[ts_d]\n", "dist_m = spei_m._dist_dict[ts_m]\n", "data_d = dist_d.data.sort_values()\n", "data_dw = dist_dw.data_window.sort_values()\n", "data_m = dist_m.data.sort_values()\n", "cdf_d = dist_d.cdf().loc[data_d.index]\n", "cdf_dw = dist_dw.dist.cdf(data_dw.values, *dist_dw.pars, dist_dw.loc, dist_dw.scale)\n", "cdf_m = dist_m.cdf().loc[data_m.index]\n", "ecdf_d = sps.ecdf(data_d).cdf\n", "ecdf_dw = sps.ecdf(data_dw).cdf\n", "ecdf_m = sps.ecdf(data_m).cdf\n", "\n", "fig, axd = plt.subplot_mosaic(\n", " [[\"d\", \"dw\", \"m\", \"cdf\"], [\"si\", \"si\", \"si\", \"si\"]],\n", " layout=\"constrained\",\n", " figsize=(9.0, 5.0),\n", " height_ratios=[1.0, 1.2],\n", ")\n", "scatter_kwargs = dict(\n", " alpha=0.6,\n", " zorder=2,\n", " edgecolors=\"k\",\n", ")\n", "plot_ecdf(\n", " ax=axd[\"d\"],\n", " data=data_d,\n", " ecdf=ecdf_d,\n", " color=\"C0\",\n", " label=f\"Data {ts_d.strftime('%B %d')}th\",\n", " linewidths=0.5,\n", " s=5.0,\n", " **scatter_kwargs,\n", ")\n", "plot_ecdf(\n", " ax=axd[\"dw\"],\n", " data=data_dw,\n", " ecdf=ecdf_dw,\n", " color=\"C1\",\n", " label=f\"Data {ts_d.strftime('%B %d')}th and\\n{fit_window_dw} day window\",\n", " linewidths=0.2,\n", " s=4.0,\n", " **scatter_kwargs,\n", ")\n", "plot_ecdf(\n", " ax=axd[\"m\"],\n", " data=data_m,\n", " ecdf=ecdf_m,\n", " color=\"C2\",\n", " label=f\"Data {ts_m.strftime('%B')}\",\n", " linewidths=0.2,\n", " s=4.0,\n", " **scatter_kwargs,\n", ")\n", "bin = 5.0\n", "xmin = min(data_d.min(), data_dw.min(), data_m.min())\n", "xmax = max(data_d.max(), data_dw.max(), data_m.max())\n", "bins = np.arange(xmin // bin * bin, xmax + bin, bin)\n", "axd[\"d\"].set_xlim(xmin, xmax)\n", "axd[\"d\"].xaxis.set_minor_locator(mpl.ticker.MultipleLocator(bin))\n", "axd[\"d\"].xaxis.set_major_locator(mpl.ticker.MultipleLocator(bin * 5))\n", "axd[\"d\"].set_ylim(0.0, 1.0)\n", "axd[\"d\"].yaxis.set_major_locator(mpl.ticker.MultipleLocator(0.1))\n", "axd[\"d\"].yaxis.set_major_formatter(mpl.ticker.PercentFormatter(1.0))\n", "for iax in [axd[\"dw\"], axd[\"m\"], axd[\"cdf\"]]:\n", " iax.sharex(axd[\"d\"])\n", " iax.sharey(axd[\"d\"])\n", " for t in iax.get_yticklabels():\n", " t.set_visible(False)\n", "axd[\"d\"].set_ylabel(\"Cumulative probability\")\n", "axd[\"d\"].set_xlabel(\"Precipitation\\nsurplus (mm)\")\n", "axd[\"dw\"].set_xlabel(\"Precipitation\\nsurplus (mm)\")\n", "axd[\"m\"].set_xlabel(\"Precipitation\\nsurplus (mm)\")\n", "axd[\"cdf\"].set_xlabel(\"Precipitation\\nsurplus (mm)\")\n", "\n", "axd[\"d\"].plot(\n", " bins,\n", " dist_d.dist.cdf(bins, *dist_d.pars, loc=dist_d.loc, scale=dist_d.scale),\n", " color=\"C0\",\n", " linewidth=0.5,\n", ")\n", "axd[\"dw\"].plot(\n", " bins,\n", " dist_dw.dist.cdf(bins, *dist_dw.pars, loc=dist_dw.loc, scale=dist_dw.scale),\n", " color=\"C1\",\n", " linewidth=0.5,\n", ")\n", "axd[\"m\"].plot(\n", " bins,\n", " dist_m.dist.cdf(bins, *dist_m.pars, loc=dist_m.loc, scale=dist_m.scale),\n", " color=\"C2\",\n", " linewidth=0.5,\n", ")\n", "axd[\"cdf\"].plot(\n", " [],\n", " [],\n", " color=\"k\",\n", " linewidth=1.5,\n", " label=f\"Fitted {dist_d.dist.name} distribution\",\n", ")\n", "axd[\"cdf\"].plot(\n", " bins,\n", " dist_d.dist.cdf(bins, *dist_d.pars, loc=dist_d.loc, scale=dist_d.scale),\n", " color=\"C0\",\n", " linewidth=2.0,\n", ")\n", "axd[\"cdf\"].plot(\n", " bins,\n", " dist_dw.dist.cdf(bins, *dist_dw.pars, loc=dist_dw.loc, scale=dist_dw.scale),\n", " color=\"C1\",\n", " linewidth=1.5,\n", ")\n", "axd[\"cdf\"].plot(\n", " bins,\n", " dist_m.dist.cdf(bins, *dist_m.pars, loc=dist_m.loc, scale=dist_m.scale),\n", " color=\"C2\",\n", " linewidth=1.0,\n", ")\n", "\n", "mpl.rcParams[\"legend.fontsize\"] = 8.0\n", "\n", "plot_box(axd[\"d\"], dist_d, bbox_edgecolor=\"C0\")\n", "plot_box(axd[\"dw\"], dist_dw, bbox_edgecolor=\"C1\")\n", "plot_box(axd[\"m\"], dist_m, bbox_edgecolor=\"C2\")\n", "\n", "axd[\"d\"].legend(\n", " loc=(0, 1), frameon=False, fontsize=mpl.rcParams[\"legend.fontsize\"] - 0.5\n", ")\n", "axd[\"dw\"].legend(\n", " loc=(0, 1), frameon=False, fontsize=mpl.rcParams[\"legend.fontsize\"] - 0.5\n", ")\n", "axd[\"m\"].legend(\n", " loc=(0, 1), frameon=False, fontsize=mpl.rcParams[\"legend.fontsize\"] - 0.5\n", ")\n", "axd[\"cdf\"].legend(\n", " loc=(0, 1), frameon=False, fontsize=mpl.rcParams[\"legend.fontsize\"] - 0.5\n", ")\n", "\n", "axd[\"si\"].plot([], [], color=\"k\", label=\"SPEI-1\")\n", "axd[\"si\"].plot(\n", " speid.index,\n", " speid.values,\n", " linewidth=2.0,\n", " color=\"C0\",\n", " label=\"Daily fit\",\n", ")\n", "axd[\"si\"].plot(\n", " speidw.index,\n", " speidw.values,\n", " linewidth=1.5,\n", " color=\"C1\",\n", " label=\"Daily fit with window\",\n", ")\n", "axd[\"si\"].plot(\n", " speim.index,\n", " speim.values,\n", " linewidth=1.0,\n", " color=\"C2\",\n", " label=\"Monthly fit\",\n", ")\n", "\n", "year = 2001\n", "axd[\"si\"].fill_betweenx(\n", " [-3.0, 3.0],\n", " ts_m + pd.Timedelta(days=365),\n", " ts_m + pd.Timedelta(days=365 + 30),\n", " color=\"k\",\n", " alpha=0.1,\n", " linewidth=0,\n", ")\n", "axd[\"si\"].fill_betweenx(\n", " [-3.0, 3.0],\n", " ts_d + pd.Timedelta(days=365),\n", " ts_d + pd.Timedelta(days=365 + 1),\n", " color=\"k\",\n", " alpha=0.1,\n", " linewidth=0,\n", ")\n", "axd[\"si\"].set_xlim(pd.Timestamp(f\"{year}\"), pd.Timestamp(f\"{year + 1}\"))\n", "axd[\"si\"].xaxis.set_major_locator(mpl.dates.MonthLocator())\n", "axd[\"si\"].xaxis.set_major_formatter(mpl.dates.DateFormatter(\"%b%_d\\n%Y\"))\n", "axd[\"si\"].set_ylabel(\"Z-score\")\n", "axd[\"si\"].grid(False)\n", "axd[\"si\"].legend(ncol=4, loc=\"lower left\")\n", "axd[\"si\"].set_ylim(-3.0, 3.0)\n", "axd[\"si\"].yaxis.set_major_locator(mpl.ticker.MultipleLocator())\n", "\n", "axes_indicator(axd[\"d\"], \"a\", 0.01, 0.98, ha=\"left\", va=\"top\")\n", "axes_indicator(axd[\"dw\"], \"b\", 0.01, 0.98, ha=\"left\", va=\"top\")\n", "axes_indicator(axd[\"m\"], \"c\", 0.01, 0.98, ha=\"left\", va=\"top\")\n", "axes_indicator(axd[\"si\"], \"e\", 0.995, 0.98, ha=\"right\", va=\"top\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The figure above illustrates the influence of different distribution fitting strategies — namely, `fit_freq` and `fit_window` on the calculation of the SPEI-1 index over the year 2001. The top row displays the cumulative distribution functions of precipitation surplus data for an excerpt of the time series of April (the 15th). Subplot a shows the case where distributions are fitted daily (`fit_freq=\"D\"`) without using a fitting window. Here, the fit is based solely on data from April 15th across 30 years, resulting in a limited sample size and consequently a noisier empirical distribution with a less stable fit. Subplot b also uses a daily fitting frequency but applies a 31-day fitting window (`fit_window=31`) centered on April 15th. This expands the sample to include 31 days of data, significantly increasing the total number of observations and yielding a much smoother and more robust distribution fit. In contrast, Subplot c shows a monthly fitting approach (`fit_freq=\"MS\"`) with no fit window, where all April data from each year is used. This produces a stable fit, but because each month is treated separately, sharp transitions can occur at month boundaries, which may introduce artificial discontinuities into the resulting index. This is shown in the red line of subplot e, which is smoother overall but exhibits abrupt changes at the start of each month (e.g., April 1st and November 1st), due to transitions between monthly distributions. These settings allow users to tailor the standardization process to their data and desired level of temporal precision. Subplot d shows the specific fisk distributions from each parameter set, shown in the text boxes of subplots a,b and c. The changes to the fitted parameter values are obvious, but the fitted distrubtions look similar as seen in subplot d. The result for the z-score is minimal on april 15th but larger for different dates as seen in subplot e.\n" ] } ], "metadata": { "kernelspec": { "display_name": "SPEI", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }