Implement Generalized Pareto distribution #294

xieyj17 · 2024-01-01T19:42:54Z

Generalized Pareto distribution is a commonly used distribution for modelling the tail of another distribution. It has wide applications in risk management, finance, ans quality assuarance. See wiki page.

I added a new distribution to the pymc-experimental branch.

xieyj17 · 2024-01-01T19:47:14Z

When I am rebasing, I accidently added the recent PR by @zaxtax .

ricardoV94

Thanks for opening up the PR. I left some comments and questions.

For the rebase, what did you try to do exactly?

ricardoV94 · 2024-01-02T10:29:33Z

pymc_experimental/distributions/continuous.py

@@ -221,6 +221,163 @@ def moment(rv, size, mu, sigma, xi):
        return mode


+# Generalized Pareto Distribution
+class GenParetoRV(RandomVariable):
+    name: str = "Generalized Pareto Distribution"


Suggested change

name: str = "Generalized Pareto Distribution"

name: str = "Generalized Pareto"

ricardoV94 · 2024-01-02T10:30:17Z

pymc_experimental/distributions/continuous.py

@@ -221,6 +221,163 @@ def moment(rv, size, mu, sigma, xi):
        return mode


+# Generalized Pareto Distribution
+class GenParetoRV(RandomVariable):


Should subclass ScipyRandomVariable because Scipy RVs (sometimes) do something dumb with size=(1,)

Suggested change

class GenParetoRV(RandomVariable):

class GenParetoRV(ScipyRandomVariable):

ricardoV94 · 2024-01-02T10:31:03Z

pymc_experimental/distributions/continuous.py

+    def __call__(self, mu=0.0, sigma=1.0, xi=1.0, size=None, **kwargs) -> TensorVariable:
+        return super().__call__(mu, sigma, xi, size=size, **kwargs)


This is not strictly necessary because most users will never call the RV directly. We usually provide default values through the PyMC distribution class

Suggested change

def __call__(self, mu=0.0, sigma=1.0, xi=1.0, size=None, **kwargs) -> TensorVariable:

return super().__call__(mu, sigma, xi, size=size, **kwargs)

ricardoV94 · 2024-01-02T10:31:28Z

pymc_experimental/distributions/continuous.py

+        return stats.genpareto.rvs(c=xi, loc=mu, scale=sigma, random_state=rng, size=size)
+
+
+gp = GenParetoRV()


Suggested change

gp = GenParetoRV()

gen_pareto = GenParetoRV()

ricardoV94 · 2024-01-02T10:31:47Z

pymc_experimental/distributions/continuous.py

+    ndim_supp: int = 0
+    ndims_params: List[int] = [0, 0, 0]
+    dtype: str = "floatX"
+    _print_name: Tuple[str, str] = ("Generalized Pareto Distribution", "\\operatorname{GP}")


Suggested change

_print_name: Tuple[str, str] = ("Generalized Pareto Distribution", "\\operatorname{GP}")

_print_name: Tuple[str, str] = ("Generalized Pareto", "\\operatorname{GenPareto}")

ricardoV94 · 2024-01-02T10:34:03Z

pymc_experimental/tests/distributions/test_continuous.py

+
+    def test_logp(self):
+        def ref_logp(value, mu, sigma, xi):
+            if xi == 0:


Scipy genpareto logpdf fails for xi = 0?

Yes, I do noticed a tiny bug in scipy's function when calculating pdf for general pareto distribution with xi==0. Will double check and sumbit a PR to fix that as well.

ricardoV94 · 2024-01-02T10:34:50Z

pymc_experimental/tests/distributions/test_continuous.py

+            decimal=select_by_precision(float64=6, float32=2),
+            skip_paramdomain_outside_edge_test=True,


Why are you skipping the outside edge test?

Since I am bounding the xi to be >= 0, I'd like to skip the outside edge test.

The point of that test is to make sure the bounding is defined correctly, so you shouldn't skip

ricardoV94 · 2024-01-02T10:38:34Z

pymc_experimental/distributions/continuous.py

+    xi : float
+        Shape parameter (xi >= 0)


Could be worth a note saying this is more restrictive than other definitions of the GenPareto (in wikipedia there seems to be special cases for xi < 0?)

xi < 0 are less seen for modelling extreme values. I will add a note here.

ricardoV94 · 2024-01-02T10:40:19Z

pymc_experimental/distributions/continuous.py

+
+    def moment(rv, size, mu, sigma, xi):
+        r"""
+        Mean is defined when :math:`\xi < 1`


We don't need to provide a real "moment", just anything that always has finite logp. So in this case moment = mu may be good enough?

Are you suggesting that we only need to return mu instead of the true mean? Or shall I just leave it as it?

Yes, that you can just return mu and take away the check_parameter part. Just make sure you broadcast mu with the other parameters in case size is None

ricardoV94 · 2024-01-02T10:42:53Z

You should add an entry in the docs API: https://github.com/pymc-devs/pymc-experimental/blob/main/docs/api_reference.rst

xieyj17 · 2024-01-04T04:04:46Z

@ricardoV94 Thanks for you comments! I have addressed your suggested changes. Thank you!

ricardoV94

A few more comments, thanks for the work so far!

ricardoV94 · 2024-01-04T10:17:19Z

pymc_experimental/distributions/continuous.py

+        """
+        Calculate log-probability of Generalized Pareto distribution
+        at specified value.
+
+        Parameters
+        ----------
+        value: numeric
+            Value(s) for which log-probability is calculated. If the log probabilities for multiple
+            values are desired the values must be provided in a numpy array or Pytensor tensor
+
+        Returns
+        -------
+        TensorVariable
+        """


These docstrings are incomplete, and the logp function is not really user facing, so it's better to not include anything at all

I think the logp for Generalized Pareto distribution should be
我认为广义帕累托分布的 logp 应该是
def logp(value, mu, sigma, xi):
"""
Calculate log-probability of Generalized Pareto distribution
计算广义帕累托分布的对数概率
at specified value. 在指定值。

Parameters ---------- value: numeric Value(s) for which log-probability is calculated. If the log probabilities for multiple values are desired the values must be provided in a numpy array or Pytensor tensor Returns ------- TensorVariable """ scaled = (value - mu) / sigma logp_expression = pt.switch( pt.isclose(xi, 0), -1 * scaled, -1 * pt.log(sigma) - ((xi + 1) / xi) * pt.log1p(xi * scaled), ) logp = pt.switch(pt.gt(1 + xi * scaled, 0), logp_expression, -np.inf) return check_parameters(logp, sigma > 0, pt.and_(xi > -1, xi < 1), msg="sigma > 0 or -1 < xi < 1")

ricardoV94 · 2024-01-04T10:17:28Z

pymc_experimental/distributions/continuous.py

+        """
+        Compute the log of the cumulative distribution function for Generalized Pareto
+        distribution at the specified value.
+
+        Parameters
+        ----------
+        value: numeric or np.ndarray or `TensorVariable`
+            Value(s) for which log CDF is calculated. If the log CDF for
+            multiple values are desired the values must be provided in a numpy
+            array or `TensorVariable`.
+
+        Returns
+        -------
+        TensorVariable
+        """


ricardoV94 · 2024-01-04T10:18:19Z

pymc_experimental/tests/distributions/test_continuous.py

@@ -138,6 +138,64 @@ class TestGenExtreme(BaseTestDistributionRandom):
    ]


+class TestGenParetoClass:


Missing the test for moment

I think the logp for Generalized Pareto distribution should be
我认为广义帕累托分布的 logp 应该是
def logp(value, mu, sigma, xi):
"""
Calculate log-probability of Generalized Pareto distribution
计算广义帕累托分布的对数概率
at specified value. 在指定值。

Parameters ---------- value: numeric Value(s) for which log-probability is calculated. If the log probabilities for multiple values are desired the values must be provided in a numpy array or Pytensor tensor Returns ------- TensorVariable """ scaled = (value - mu) / sigma logp_expression = pt.switch( pt.isclose(xi, 0), -1 * scaled, -1 * pt.log(sigma) - ((xi + 1) / xi) * pt.log1p(xi * scaled), ) logp = pt.switch(pt.gt(1 + xi * scaled, 0), logp_expression, -np.inf) return check_parameters(logp, sigma > 0, pt.and_(xi > -1, xi < 1), msg="sigma > 0 or -1 < xi < 1")

rxd199682 · 2024-02-14T12:25:09Z

I think the logp for Generalized Pareto distribution should be
def logp(value, mu, sigma, xi):
"""
Calculate log-probability of Generalized Pareto distribution
at specified value.

    Parameters
    ----------
    value: numeric
        Value(s) for which log-probability is calculated. If the log probabilities for multiple
        values are desired the values must be provided in a numpy array or Pytensor tensor

    Returns
    -------
    TensorVariable
    """

    scaled = (value - mu) / sigma

    logp_expression = pt.switch(
        pt.isclose(xi, 0),
        -1 * scaled,
        -1 * pt.log(sigma) - ((xi + 1) / xi) * pt.log1p(xi * scaled),
    )
    logp = pt.switch(pt.gt(1 + xi * scaled, 0), logp_expression, -np.inf)
    return check_parameters(logp, sigma > 0, pt.and_(xi > -1, xi < 1), msg="sigma > 0 or -1 < xi < 1")

xieyj17 and others added 3 commits December 31, 2023 14:20

Add continuous class for Generalized Pareto distribution.

7d8a5c5

Remove use of treedict, addresses pymc-devs#291

22a3b0b

add unit testss

cb0ac01

xieyj17 marked this pull request as ready for review January 1, 2024 19:46

ricardoV94 reviewed Jan 2, 2024

View reviewed changes

ricardoV94 added the enhancements New feature or request label Jan 2, 2024

ricardoV94 changed the title ~~Adding Generalized Pareto distribution~~ Implement Generalized Pareto distribution Jan 2, 2024

ricardoV94 and others added 3 commits January 3, 2024 23:00

Update version.txt

c5ceda2

address comments

fd11271

address comments

140a7b8

add entry to api reference

d5054b2

ricardoV94 reviewed Jan 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Generalized Pareto distribution #294

Implement Generalized Pareto distribution #294

xieyj17 commented Jan 1, 2024

xieyj17 commented Jan 1, 2024

ricardoV94 left a comment

ricardoV94 Jan 2, 2024

ricardoV94 Jan 2, 2024

ricardoV94 Jan 2, 2024

ricardoV94 Jan 2, 2024

ricardoV94 Jan 2, 2024

ricardoV94 Jan 2, 2024

xieyj17 Jan 4, 2024

ricardoV94 Jan 2, 2024

xieyj17 Jan 4, 2024

ricardoV94 Jan 4, 2024

ricardoV94 Jan 2, 2024

xieyj17 Jan 4, 2024

xieyj17 Jan 4, 2024

ricardoV94 Jan 2, 2024

xieyj17 Jan 4, 2024

ricardoV94 Jan 4, 2024 •

edited

Loading

ricardoV94 commented Jan 2, 2024

xieyj17 commented Jan 4, 2024

ricardoV94 left a comment

ricardoV94 Jan 4, 2024 •

edited

Loading

rxd199682 Feb 15, 2024

ricardoV94 Jan 4, 2024

ricardoV94 Jan 4, 2024

rxd199682 Feb 15, 2024

rxd199682 commented Feb 14, 2024 •

edited

Loading

	name: str = "Generalized Pareto Distribution"
	name: str = "Generalized Pareto"

	class GenParetoRV(RandomVariable):
	class GenParetoRV(ScipyRandomVariable):

		def __call__(self, mu=0.0, sigma=1.0, xi=1.0, size=None, **kwargs) -> TensorVariable:
		return super().__call__(mu, sigma, xi, size=size, **kwargs)

		return stats.genpareto.rvs(c=xi, loc=mu, scale=sigma, random_state=rng, size=size)


		gp = GenParetoRV()

	_print_name: Tuple[str, str] = ("Generalized Pareto Distribution", "\\operatorname{GP}")
	_print_name: Tuple[str, str] = ("Generalized Pareto", "\\operatorname{GenPareto}")

		decimal=select_by_precision(float64=6, float32=2),
		skip_paramdomain_outside_edge_test=True,

		@@ -138,6 +138,64 @@ class TestGenExtreme(BaseTestDistributionRandom):
		]


		class TestGenParetoClass:

Implement Generalized Pareto distribution #294

Are you sure you want to change the base?

Implement Generalized Pareto distribution #294

Conversation

xieyj17 commented Jan 1, 2024

xieyj17 commented Jan 1, 2024

ricardoV94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Jan 4, 2024 • edited Loading

Choose a reason for hiding this comment

ricardoV94 commented Jan 2, 2024

xieyj17 commented Jan 4, 2024

ricardoV94 left a comment

Choose a reason for hiding this comment

ricardoV94 Jan 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxd199682 commented Feb 14, 2024 • edited Loading

ricardoV94 Jan 4, 2024 •

edited

Loading

ricardoV94 Jan 4, 2024 •

edited

Loading

rxd199682 commented Feb 14, 2024 •

edited

Loading