The Total Area Within Any Continuous Probability Distribution is Equal to 1 00

Probability and Sampling Distributions

Donna L. Mohr , ... Rudolf J. Freund , in Statistical Methods (Fourth Edition), 2022

2.4.1 Characteristics of a Continuous Probability Distribution

The characteristics of a continuous probability distribution are as follows:

1.: The graph of the distribution (the equivalent of a bar graph for a discrete distribution) is usually a smooth curve. A typical example is seen in Fig. 2.2. The curve is described by an equation or a function that we call $f (y)$ . This equation is often called the probability density and corresponds to the $p (y)$ we used for discrete variables in the previous section (see additional discussion following).

Figure 2.2. Graph of a Continuous Distribution.
2.: The total area under the curve is one. This corresponds to the sum of the probabilities being equal to 1 in the discrete case.
3.: The area between the curve and horizontal axis from the value $a$ to the value $b$ represents the probability of the random variable taking on a value in the interval $(a, b)$ . In Fig. 2.2 the area under the curve between the values $- 1$ and $0.5,$ for example, is the probability of finding a value in this interval. This corresponds to adding probabilities of mutually exclusive outcomes from a discrete probability distribution.

There are similarities but also some important differences between continuous and discrete probability distributions. Some of the most important differences are as follows:

1.: The equation $f (y)$ does not give the probability that $Y = y$ as did $p (y)$ in the discrete case. This is because $Y$ can take on an infinite number of values (any value in an interval), and therefore it is impossible to assign a probability value for each $y$ . In fact the value of $f (y)$ is not a probability at all; hence $f (y)$ can take any nonnegative value, including values greater than 1.
2.: Since the area under any curve corresponding to a single point is (for practical purposes) zero, the probability of obtaining exactly a specific value is zero. Thus, for a continuous random variable, $P (a \leq Y \leq b)$ and $P (a < Y < b)$ are equivalent, which is certainly not true for discrete distributions.
3.: Finding areas under curves representing continuous probability distributions involves the use of calculus and may become quite difficult. For some distributions, areas cannot even be directly computed and require special numerical techniques. For this reason, the areas required to calculate probabilities for the most frequently used distributions have been calculated and appear in tabular form in this and other texts, as well as in books devoted entirely to tables (e.g., Pearson and Hartley, 1972). Of course statistical computer programs easily calculate such probabilities.

In some cases, recording limitations may exist that make continuous random variables look as if they are discrete. The round-off of values may result in a continuous variable being represented in a discrete manner. For example, people's weight is almost always recorded to the nearest pound, even though the variable weight is conceptually continuous. Therefore, if the variable is continuous, then the probability distribution describing it is continuous, regardless of the type of recording procedure.

As in the case of discrete distributions, several common continuous distributions are used in statistical inference. This section discusses most of the distributions used in this text.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128230435000023

Information Geometry

Angelo Plastino , ... Diana Monteoliva , in Handbook of Statistics, 2021

2.5 Obtaining fisher information measure

This is our main objective. Given a continuous probability distribution function (PDF) f(x) with $x \in Δ \subset R$ and ∫ _Δ f(x) dx = 1, its concomitant Shannon Entropy S is

(27) $\begin{array}{l} S (f) = - \int_{Δ} f ln (f) d x \end{array}$

a quantifier of global nature, as it is well known. The entropy is not very sensitive to strong changes in the distribution that may take place in a small-sized zone. Precisely, such is definitely not the case for FIM, that we denote by $(F)$ . It measures gradient contents (Roy Frieden, 2004). One has

(28) $\begin{array}{l} F (f) = \int_{Δ} \frac{1}{f (x)} {[\frac{d f (x)}{d x}]}^{2} d x = 4 \int_{Δ} {[\frac{d ψ (x)}{d x}]}^{2} \end{array}$

FIM can be looked at in several ways. (i) As a quantifier of the capacity to estimate a parameter. (ii) As the information amount of what can be gathered from a set of measurements. (iii) As quantifying the degree of order of a system (or phenomenon) (Roy Frieden, 2004), as has been strongly emphasized recently (Frieden and Hawkins, 2010). The division by f(x) in the FIM definition is best avoided if f(x) → 0 at certain x-values. We overcome this trouble by appealing to real probability amplitudes f(x) = ψ ²(x) (Roy Frieden, 2004), which yields a simpler form (with no divisors), as seen at the extreme right of the equation above. Accordingly, FIM is called a local measure (Roy Frieden, 2004).

Let us begin by considering the density probability f

(29) $\begin{array}{l} f = \frac{e^{\frac{β U_{0}}{r}}}{Z} . \end{array}$

The information that it contains is conveyed by Fisher's information measure, a functional of f,

(30) $\begin{array}{l} F (f) = \int_{Δ} \frac{1}{f (x)} {[\nabla f]}^{2} d x \end{array}$

In our case, we have

(31) $\begin{array}{l} F (f) = \frac{1}{Z} {(β U_{0})}^{2} \int r^{- 4} e^{\frac{β U_{0}}{r}} d^{3} x . \end{array}$

This integral must be regularized as well.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0169716121000213

Truth, Possibility and Probability

In North-Holland Mathematics Studies, 1991

Theorem X.6

Let Ω → *ℝ be a random variable with a nearly continuous probability distribution , Pr _X, with density function f. Then:

(1)

if $\int_{- \infty}^{\infty} ‖ x ‖ f (x) d x$ exists and the series $Σ_{x \in Λ_{X}} ‖ x ‖ Pr (x)$ nearly converges, we have

$E X \approx \int_{- \infty}^{\infty} x f (x) d x \cdot$

(2)

if $\int_{- \infty}^{\infty} {(x - E X)}^{2} f (x) d x$ exists and the series $Σ_{x \in Λ_{X}} {(x - E X)}^{2} Pr (x)$ nearly converges we have

$Var (X) \approx \int_{- \infty}^{\infty} {(x - E X)}^{2} f (x) d x \cdot$

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0304020808724216

Probability, Statistics, and Experimental Errors

Robert G. Mortimer , in Mathematics for Physical Chemistry (Fourth Edition), 2013

Continuous Probability Distributions

In this case, a numerical property of a member of a population can take on any value within a certain range. For a given independent variable (a random variable), x, we define a continuous probability distribution $f (x)$ ,or probability density such that

(15.18) $\begin{matrix} probability of values of x between x^{'} and \\ x^{'} + d x = f (x^{'}) d x, \end{matrix}$

where dx is an infinitesimal range of values of x and $x^{'}$ is a particular value of x. The probability is proportional to dx, so the function $f (x)$ depends on x but is independent of dx. If c and d are two values of x within the range of possible values of x, the probability that x will lie between c and d is the integral

(15.19) $probability that c ≺ x ≺ d = \int_{c}^{d} f (x) d x .$

If $x_{min}$ is the smallest possible value of x and $x_{max}$ is the largest possible value of x, the total probability of all occurrences is given by

(15.20) $total probability = \int_{x_{min}}^{x_{max}} f (x) d x .$

If the probability distribution is normalized, the total probability is equal to unity:

(15.21) $\int_{x_{min}}^{x_{max}} f (x) d x = 1 (normalized probability distribution).$

The mean value of x is given by

(15.22)

We will frequently use the symbol $μ$ to stand for a population mean value. If the probability distribution is normalized, the mean is given by

(15.23) $\begin{matrix} μ = 〈 x 〉 = \int_{x_{min}}^{x_{max}} xf (x) d x \\ (normalized probability distribution). \end{matrix}$

Unless otherwise stated, we will assume that all probability distributions are normalized.

The mean-square value of x is given by

(15.24) $〈 x^{2} 〉 = \int_{x_{min}}^{x_{max}} x^{2} f (x) d x .$

The root-mean-square value of x is given by

(15.25) $x_{rms} = 〈 x^{2} 〉^{1 / 2} = {(\int_{x_{min}}^{x_{max}} x^{2} f (x) d x)}^{1 / 2} .$

The variance is defined by

(15.26)

where we add a subscript x to remind us that x is the variable being discussed. The square root of the variance is called the standard deviation.

(15.27) $σ_{x} = {(\int_{x_{min}}^{x_{max}} (x - μ)^{2} f (x) d x)}^{1 / 2} (definition).$

The variance and the standard deviation are measures of the spread of the distribution. For most distributions with a single peak, about two-thirds of the members of a population lie in the region between $μ - σ_{x}$ and $μ + σ_{x}$ .

We can write a more convenient formula for the variance:

$\begin{matrix} σ_{x}^{2} = & \int_{x_{min}}^{x_{max}} (x - μ)^{2} f (x) d x = \int_{x_{min}}^{x_{max}} x^{2} f (x) d x \\ - 2 μ \int_{a}^{b} xf (x) d x + μ^{2} \int_{x_{min}}^{x_{max}} f (x) d x . \end{matrix}$

The second integral on the right-hand side of this equation is equal to $μ$ , and since we assume that the probability distribution is normalized, the third integral on the right-hand side is equal to unity, so that

(15.28) $\begin{matrix} σ_{x}^{2} = & \int_{x_{min}}^{x_{max}} (x - μ)^{2} f (x) d x = 〈 x^{2} 〉 - 2 μ^{2} + μ^{2} \\ = & 〈 x^{2} 〉 - μ^{2} = 〈 x^{2} 〉 - 〈 x 〉^{2} . \end{matrix}$

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978012415809200015X

Probability and Statistics

Frank E. Harris , in Mathematics for Physical Science and Engineering, 2014

Exercises

18.2.3

Repeat the analysis of Example 18.2.3 for a box that originally contained 4 black balls and 2 red balls.

18.2.4

Two cards are drawn from a standard 52-card deck of playing cards. The first card is not returned to the deck before drawing the second card. Let $X$ be a random variable with value 1 if the first card is a spade and zero otherwise, let $Y$ be a random variable with value 1 if the second card is a spade and zero otherwise, and let $Z$ be a random variable with value 1 if the second card is red and zero otherwise.

(a): Calculate the mean values and variances of $X, Y$ , and $Z$ .
(b): Calculate the covariance and correlation of each pair of two variables $(X, Y), (X, Z)$ , and $(Y, Z)$ .
(c): Make comments that provide qualitative explanations for the results of part (b).

18.2.5

Repeat Exercise 18.2.4 for a situation in which the first card drawn is returned to the deck and the deck is reshuffled before drawing the second card. Your answer should include the comments demanded in part (c).

Continuous Distributions

Up to this point, we have restricted attention to discrete random variables. But there are many problems in which a random variable can have any value in a continuous range. An example would be the time at which a radioactive particle decays; the decay can take place at any time and is caused by a nuclear reaction whose behavior is only predictable statistically.

Our method for dealing with continuous probability distributions must be by a limiting process similar to those used often used in calculus. Continuing with the radioactive decay problem, it makes no sense to associate a finite nonzero probability with each instant in time (there is an infinite number of such instants). It is more useful to recognize that for a short time interval $Δ t$ the decay probability will have some value proportional to $Δ t$ , and that the relevant quantity is the decay probability per unit time (computed in the limit of small $Δ t$ ). Since this probability of decay per unit time may differ at different times, a general description for the decay within a time interval $dt$ at time $t$ must be of the form $f (t) dt$ , where $f$ can be called a probability function. Then the overall probability of decay during a time interval from $t_{1}$ to $t_{2}$ will be given by

$Probability of decay between times t_{1} and t_{2} = \int_{t_{1}}^{t_{2}} f (t) dt .$

By making an analogy between decay per unit time and mass per unit volume (which is density), the probability function $f (t)$ is often referred to as a probability density.

We now restate and slightly expand the ideas of the preceding paragraph, framing the discussion in terms of a continuous random variable $X$ whose probability density is $p (x)$ . We assume the range of $x$ to be $(- \infty, \infty)$ . If part of that entire range is irrelevant, we simply set $p (x) = 0$ there. The probability distribution of our random variable has the following features:

(1)

The probability that $X$ has a value in the range $(x, x + dx)$ is $p (x) dx$ .

(2)

Because $p (x) dx$ must represent a probability, $p (x)$ must for all $x$ be nonnegative.

(3)

Because the total probability for all $x$ must sum to unity, $p (x)$ must be such that

(18.45) $\int_{- \infty}^{\infty} p (x) dx = 1 .$

The reader may note that we did not impose a requirement that $p (x) \leq 1$ . In fact, $p (x)$ may even become infinite, providing it does so in a way that still permits the integral relation of Eq. (18.45) to be satisfied.

Now that we have a way of describing continuous probability distributions, let's see how to compute average values, including the mean and the variance. The sums of our earlier subsections now need to be replaced by integrals. The expectation value of a function $g (x)$ is given by

(18.46) $〈 g 〉 = \int_{- \infty}^{\infty} g (x) p (x) dx .$

A first application of Eq. (18.46) is to the mean and variance of a continuous random variable $X$ , which are described by the formulas

(18.47) $〈 X 〉 = μ = \int_{- \infty}^{\infty} xp (x) dx,$

(18.48) $σ^{2} = \int_{- \infty}^{\infty} {(x - μ)}^{2} p (x) dx .$

Example 18.2.4 Classical Particle in Potential Well

A particle of unit mass moves (assuming classical mechanics) subject to the potential $V = x^{2} / 2$ , with a total energy $E = 1 / 2$ (dimensionless units). These conditions correspond to motion with kinetic energy $T = (1 - x^{2}) / 2$ , so that when the particle is at $x$ its velocity can be found from

$\frac{v^{2}}{2} = \frac{1 - x^{2}}{2} leading to v (x) = \pm \sqrt{1 - x^{2}} .$

We see from this form for

v (x)

that the particle will move back and forth between turning points at

x = \pm 1

, and that it will move fastest at

x = 0

and momentarily become stationary at

x = \pm 1

The probability density for the particle's position will be proportional to the time spent in each element $dx$ of its range, which in turn is (at $x$ ) proportional to $1 / ∣ v (x) ∣$ . We therefore have

(18.49) $p (x) = \frac{C}{\sqrt{1 - x^{2}}},$

where

C

must be assigned a value such that

$\int_{- 1}^{1} p (x) dx = \int_{- 1}^{1} \frac{C dx}{\sqrt{1 - x^{2}}} = 1 .$

Evaluating the integral, we find

C = 1 / π

Let's comment briefly on the formula for $p (x)$ given in Eq. (18.49), and which we plot in Fig. 18.3.

The probability density becomes infinite at the turning points (the velocity is zero there) but does so in a way such that the overall probability for any region near $x = \pm 1$ remains finite.

Given $p (x)$ , we can now compute the mean and variance of $X$ , viewed as a random variable. We start by computing $〈 X 〉$ and $〈 X^{2} 〉$ . We get

$\begin{matrix} 〈 X 〉 = & \int_{- 1}^{1} xp (x) dx = \int_{- 1}^{1} \frac{x dx}{π \sqrt{1 - x^{2}}} = 0, (a result due to symmetry), \\ 〈 X^{2} 〉 = & \int_{- 1}^{1} x^{2} p (x) dx = \int_{- 1}^{1} \frac{x^{2} dx}{π \sqrt{1 - x^{2}}} = \frac{1}{2}, (the reader can check this) . \end{matrix}$

From the first of these integrals we read out

〈 X 〉 = 0

; to compute the variance we can use Eq. (18.26) , which is true for both discrete and continuous probability distributions:

$σ^{2} = 〈 X^{2} 〉 - 〈 X 〉^{2} = \frac{1}{2} - 0 = \frac{1}{2} .$

This result is equivalent to the standard deviation

σ = 1 / \sqrt{2} = 0.71

. The standard deviation is larger than half way to the turning points because the particle spends a majority of the time in the outer half of its excursions.

▪

Cumulative Distribution Functions

Sometimes we are interested in the probability that the values of a continuous random variable $X$ will fall between specified values. Calculations of that sort are most easily carried out if we first define a cumulative distribution function of definition

(18.50) $P (x) = \int_{- \infty}^{x} p (x) dx .$

Then the probability that $x$ is between $x_{1}$ and $x_{2}$ (with $x_{2} > x_{!}$ ) is $P (x_{2}) - P (x_{1})$ . In favorable cases it may be possible to obtain an analytical expression for $P (x)$ .

Covariance for Continuous Distributions

The covariance and correlation of continuous probability distributions $X$ and $Y$ are given by the formulas

(18.51) $cov (X, Y) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} p (x, y) (x - 〈 X 〉) (y - 〈 Y 〉) dx dy,$

(18.52) $corr (X, Y) = \frac{cov (X, Y)}{σ (X) σ (Y)} .$

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128010006000183

Probability and statistics

Mary Attenborough , in Mathematics for Electrical Engineering and Computing, 2003

Finding probabilities from a continuous graph

Before we look at the normal distribution in more detail we need to find out how to relate the graph of a continuous function to our previous idea of probability. In Section 21.3, we identified the probability of a class with its relative frequency in a frequency distribution. That was all right when we had already divided the various sample values into classes. The problem with the normal distribution is that is has no such divisions along the x-axis and no individual class heights, just a nice smooth curve.

To overcome this problem we define the probability of the outcome lying in some interval of values, as the area under the graph of the probability function between those two values as shown in Figure 21.22.

As we found in Chapter 7 , the area under a curve is given by the integral; therefore, for a continuous probability distribution, f (x), we define

$p (x lies between a and b) = \int_{a}^{b} f (x) d x .$

The cumulative distribution function gives us the probability of this value or any previous value (it is like the cumulative relative frequency). A continuous distribution thus becomes the 'the area so far' function and therefore becomes the integral from the lowest possible value that can occur in the distribution up to the current value.

The cumulative distribution up to a value a is represented by

$F (a) = \int_{- \infty}^{a} f (x) d x$

and it is the total area under to graph of the probability function up to a; this is shown in Figure 21.23.

We can also use the cumulative distribution function to represent probabilities of a certain interval. The area between two values can be found by subtracting two values of the cumulative distribution function as in Figure 21.24.

However, there is a problem with the normal distribution function in that is not easy to integrate! The probability density function for x, where x is N(μ σ²) is given by

$f (x) = \frac{1}{\sqrt{2 π σ}} e^{- {(x - μ)}^{2} / 2 σ^{2}} .$

It is only integrated by using numerical methods. Hence, the values of the integrals can only be tabulated. The values that we have tabulated are the areas in the tail of the standardized normal distribution; that is

$\int_{u}^{\infty} f (z) d z$

where f (z) is the probability distribution with 0 mean (μ = 0) and standard deviation of 1 (σ = 1). This is shown in Figure 21.25 and tabulated in Table 21.3. In order to use these values we need to use ideas of transformation of graphs from Chapter 2 to transform any normal distribution into its standardized form.

Table 21.3. Areas in the tail of the standardized normal distribution. P(z > u) values are given where z is a variable with distribution N(0, 1)

u	0.00	0.01	0.02	0.03	0.04	0.05	0.06	0.07	0.08	0.09
0.0	0.50000	0.49601	0.49202	0.48803	0.48405	0.48006	0.47608	0.47210	0.46812	0.46414
0.1	0.46017	0.45620	0.45224	0.44828	0.44433	0.44038	0.43644	0.43251	0.42858	0.42465
0.2	0.42074	0.41683	0.41294	0.40905	0.40517	0.40129	0.39743	0.39358	0.38974	0.38591
0.3	0.38209	0.37828	0.37448	0.37070	0.36693	0.36317	0.35942	0.35569	0.35197	0.34827
0.4	0.34458	0.34090	0.33724	0.33360	0.32997	0.32636	0.32276	0.31918	0.31561	0.31207
0.5	0.30854	0.30503	0.30153	0.29806	0.29460	0.29116	0.28774	0.28434	0.28096	0.27760
0.6	0.27425	0.27093	0.26763	0.26435	0.26109	0.25785	0.25463	0.25143	0.24825	0.24510
0.7	0.24196	0.23885	0.23576	0.23270	0.22965	0.22663	0.22363	0.22065	0.21770	0.21476
0.8	0.21186	0.20897	0.20611	0.20327	0.20045	0.19766	0.19489	0.19215	0.18943	0.18673
0.9	0.18406	0.18141	0.17879	0.17619	0.17361	0.17106	0.16853	0.16602	0.16354	0.16109
1.0	0.15866	0.15625	0.15386	0.15151	0.14917	0.14686	0.14457	0.14231	0.14007	0.13786
1.1	0.13567	0.13350	0.13136	0.12924	0.12714	0.12507	0.12302	0.12100	0.11900	0.11702
1.2	0.11507	0.11314	0.11123	0.10935	0.10749	0.10565	0.10383	0.10204	0.10027	0.09853
1.3	0.09680	0.09510	0.09342	0.09176	0.09012	0.08851	0.08691	0.08534	0.08379	0.08226
1.4	0.08076	0.07927	0.07780	0.07636	0.07493	0.07353	0.07215	0.07078	0.06944	0.06811
1.5	0.06681	0.06552	0.06426	0.06301	0.06178	0.06057	0.05938	0.05821	0.05705	0.05592
1.6	0.05480	0.05370	0.05262	0.05155	0.05050	0.04947	0.04846	0.04746	0.04648	0.04551
1.7	0.04457	0.04363	0.04272	0.04182	0.04093	0.04006	0.03920	0.03836	0.03754	0.03673
1.8	0.03593	0.03515	0.03438	0.03362	0.03288	0.03216	0.03144	0.03074	0.03005	0.02938
1.9	0.02872	0.02807	0.02743	0.02680	0.02619	0.02559	0.02500	0.02442	0.02385	0.02330
2.0	0.02275	0.02222	0.02169	0.02118	0.02068	0.02018	0.01970	0.01923	0.01876	0.01831
2.1	0.01786	0.01743	0.01700	0.01659	0.01618	0.01578	0.01539	0.01500	0.01463	0.01426
2.2	0.01390	0.01355	0.01321	0.01287	0.01255	0.01222	0.01191	0.01160	0.01130	0.01101
2.3	0.01072	0.01044	0.01017	0.00990	0.00964	0.00939	0.00914	0.00889	0.00866	0.00842
2.4	0.00820	0.00798	0.00776	0.00755	0.00734	0.00714	0.00695	0.00676	0.00657	0.00639
2.5	0.00621	0.00604	0.00587	0.00570	0.00554	0.00539	0.00523	0.00508	0.00494	0.00480
2.6	0.00466	0.00453	0.00440	0.00427	0.00415	0.00402	0.00391	0.00379	0.00368	0.00357
2.7	0.00347	0.00336	0.00326	0.00317	0.00307	0.00298	0.00289	0.00280	0.00272	0.00264
2.8	0.00256	0.00248	0.00240	0.00233	0.00226	0.00219	0.00212	0.00205	0.00199	0.00193
2.9	0.00187	0.00181	0.00175	0.00169	0.00164	0.00159	0.00154	0.00149	0.00144	0.00139
3.0	0.00135	0.00131	0.00126	0.00122	0.00118	0.00114	0.00111	0.00107	0.00104	0.00100
3.1	0.00097	0.00094	0.00090	0.00087	0.00084	0.00082	0.00079	0.00076	0.00074	0.00071
3.2	0.00069	0.00066	0.00064	0.00062	0.00060	0.00058	0.00056	0.00054	0.00052	0.00050
3.3	0.00048	0.00047	0.00045	0.00043	0.00042	0.00040	0.00039	0.00038	0.00036	0.00035
3.4	0.00034	0.00032	0.00031	0.00030	0.00029	0.00028	0.00027	0.00026	0.00025	0.00024
3.5	0.00023	0.00022	0.00022	0.00021	0.00020	0.00019	0.00019	0.00018	0.00017	0.00017
3.6	0.00016	0.00015	0.00015	0.00014	0.00014	0.00013	0.00013	0.00012	0.00012	0.00011
3.7	0.00011	0.00010	0.00010	0.00010	0.00009	0.00009	0.00008	0.00008	0.00008	0.00008
3.8	0.00007	0.00007	0.00007	0.00006	0.00006	0.00006	0.00006	0.00005	0.00005	0.00005
3.9	0.00005	0.00005	0.00004	0.00004	0.00004	0.00004	0.00004	0.00004	0.00003	0.00003

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780750658553500474

Advanced Math and Statistics

Robert Kissell , Jim Poserina , in Optimal Sports Math, Statistics, and Fantasy, 2017

Probability Distributions

Mathematicians utilize probability distribution functions in many different ways. For example, probability distribution functions can be used to "quantify" and "describe" random variables, they can be used to determine statistical significance of estimated parameter values, they can be used to predict the likelihood of a specified outcome, and also to calculate the likelihood that an outcome falls within a specified interval (i.e., confidence intervals). As mentioned, these probability distribution functions are described by their mean, variance, skewness, and kurtosis terms.

A probability mass function (pmf) is a function used to describe the probability associated with the discrete variable. A cumulative mass function (cmf) is a function used to determine the probability that the observation will be less than or equal to some specified value.

In general terms, if $x$ is a discrete random variable and $x^{*}$ is a specified value, then the pmf and cmf functions are defined as follows:

Probability Mass Function (pmf):

$f (x) = Prob (x = x^{*})$

Cumulative Mass Function (cmf):

$F (x) = Prob (x \leq x^{*})$

Probability distribution functions for continuous random variables are similar to those for discrete random variables with one exception. Since the continuous random variable can take on any value in an interval the probability that the random variable will be equal to a specified value is thus zero. Therefore, the probability distribution function (pdf) for a continuous random variable defines the probability that the variable will be within a specified interval (say between $a$ and $b$ ) and the cumulative distribution function for a continuous random variable is the probability that the variable will be less than or equal to a specified value $x^{*}$ .

A probability distribution function (pdf) is used to describe the probability that a continuous random variable and will fall within a specified range. In theory, the probability that a continuous value can be a specified value is zero because there are an infinite number of values for the continuous random value. The cumulative distribution function (cdf) is a function used to determine the probability that the random value will be less than or equal to some specified value. In general terms, these functions are:

Probability Distribution Function (pdf):

$Prob (a \leq X \leq b) = \int_{a}^{b} f (x) dx$

Cumulative Distribution Function (cdf):

$F (x) = Prob (X \leq x) = \int_{- \infty}^{x} f (x) dx$

Going forward, we will use the terminology "pdf" to refer to probability distribution function and probability mass function, and we will use the terminology "cdf" to refer to cumulative distribution function and cumulative mass function.

Example 4.1

Discrete Probability Distribution Function

Consider a scenario where a person rolls two dice (die) and adds up the numbers rolled. Since the numbers on dice range from 1 to 6, the set of possible outcomes is from 2 to 12. A pdf can be used to show the probability of realizing any value from 2 to 12 and the cdf can be used to show the probability that the sum will be less than or equal to a specified value.

Table 4.1 shows the set of possible outcomes along with the number of ways of achieving the outcome value, the probability of achieving each outcome value (pdf), and the probability that the outcome value will be less than or equal to the outcome value (cdf). For example, there were 6 different ways to roll a 7 from two dice. These combinations are (1,6), (2,5), (3,4), (4,3), (5,2), and (6,1). Since there are 36 different combinations of outcomes from the two die, the probability of rolling a seven is 6/36=1/6, and thus, the pdf of 7 is 16.7%. Additionally, there are 21 ways that we can roll our die and have a value that is less than or equal to 7. Thus, the cdf is 21/36=58%. The pdf and cdf graphs for this example are shown in Figs. 4.1 and 4.2 respectively.

Table 4.1. Discrete Random Variable: Rolling Die

Value	Count	Pdf	Cdf
2	1	3%	3%
3	2	6%	8%
4	3	8%	17%
5	4	11%	28%
6	5	14%	42%
7	6	17%	58%
8	5	14%	72%
9	4	11%	83%
10	3	8%	92%
11	2	6%	97%
12	1	3%	100%
Total	36	100%

Example 4.2

Continuous probability distribution function

An example of a continuous probability distribution function can be best shown via the familiar standard normal distribution. This distribution is also commonly referred to as the Gaussian distribution as well as the bell curve.

Table 4.2 provides a sample of data for a standard normal distribution. The left-hand side of the table has the interval values a and b. The corresponding probability to the immediate right in this table shows the probability that the standard normal distribution will have a value between a and b. That is, if x is a standard normal variable, the probability that x will have a value between a and b is shown in the probability column.

Table 4.2. Standard Normal Distribution

a	b	Pdf	z	Cdf
−1	1	68.3%	−3	0.1%
−2	2	95.4%	−2	2.3%
−3	3	99.7%	−1	15.9%
−inf	−1	15.9%	0	50.0%
−inf	−2	2.3%	1	84.1%
1	inf	15.9%	2	97.7%
2	inf	2.3%	3	99.9%

For a standard normal distribution, the values shown in column "a" and column "b" can also be thought of as the number of standard deviations where 1=plus one standard deviation and −1=minus one standard deviation (and the same for the other values). Readers familiar with probability and statistics will surely recall that the probability that a standard normal random variable will be between −1 and +1 is 68.3%, the probability that a standard normal variable will be between −2 and +2 is 95.4%, and the probability that a standard normal variable will be between −3 and +3 is 99.7%.

The data on the right-hand side of the table corresponds to the probability that a standard normal random value will be less than the value indicated in the column titled z. Readers familiar with probability and statistics will recall that the probability that a normal standard variable will be less than 0 is 50%, less than 1 is 84%, less than 2 is 97.7%, and less than 3 is 99.9%.

Fig. 4.3 illustrates a standard normal pdf distribution curve and Fig. 4.4 illustrates a standard normal cdf distribution curve. Analysts can use the pdf curves to determine the probability that an outcome event will be within a specified range and can use the cdf curves to determine the probability that an outcome event will be less than or equal to a specified value. For example, we utilize these curves to estimate the probability that a team will win a game and/or win a game by more than a specified number of points. These techniques are discussed in the subsequent sports chapters.

Important Notes:

•: One of the most important items regarding computing probabilities such as the likelihood of scoring a specified number of points, winning a game, or winning by at least a specified number of points is using the proper distribution function to compute these probabilities.
•: Different distribution functions will have different corresponding probability values for the same outcome value.
•: It is essential that analysts perform a thorough review of the outcome variable they are looking to estimate and determine the correct underlying distribution.
•: While there are many techniques that can be used to determine the proper distribution functions, analysts can gain important insight using histograms, p-p plots, and q-q plots as the starting points.
•: We provide information about some of the more useful distributions below and analysts are encouraged to evaluate a full array of these distributions to determine which is most appropriate before drawing conclusions about outcomes, winning teams, scores, etc.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128051634000049

Elementary Probability and Statistics

Prasanta S. Bandyopadhyay , Steve Cherry , in Philosophy of Statistics, 2011

5.2 The normal distribution

The normal distribution is the most important in statistics. It is the distribution people have in mind when they referred to the "bell-shaped" curve and it is also often referred to as the Gaussian distribution after the mathematician Karl Friedrich Gauss.

A continuous random variable X is said to have a normal distribution (or be normally distributed) with mean μ and variance σ ² if its probability density function is

$f (x) = \frac{1}{\sqrt{2 π} σ} e^{(- \frac{{(x - μ)}^{2}}{2 σ^{2}})}, - \infty < x < \infty .$

The mean can be any real number and the variance any positive real number. We say that X is N (μ, σ ²).

There is a different normal distribution for each pair of mean and variance values and it is mathematically more appropriate to refer to the family of normal distributions but this distinction is generally not explicitly made in introductory courses.

The history of the normal distribution is fascinating [Stigler, 1986]. It seems to have first appeared in the work of Abraham DeMoivre in the mid-18^th century and Gauss found it useful for work he was doing in the late 18^th and early 19^th centuries. It was imbued with semi-mystical significance initially. Some were impressed by the fact that this one distribution contained the three most famous irrational numbers, e, $\sqrt{2}$ , and π. Normally distributed variables were considered to be a law of nature. Although not viewed quite as reverentially today it is still important for reasons which will be discussed in more detail below.

The graph of the density function is symmetric about μ. There is a different curve for every μ and σ ². In any normal distribution 68% of the data fall within σ (one sigma) of the mean μ, 95% of the data fall within 1.96σ of μ, and 99.7% of the data fall within 3σ of μ. These proportions are the same for any normally distributed population. For simplicity, we frequently convert the values from the units in which they were measured to unitless standard values. Figure 4 shows an example of the so-called standard normal distribution with mean 0 and variance equal to 1 along with an illustration of the 68-95-99.7 rule.

To transform a normally distributed random variable into a standard normal random variable we subtract the mean and divide by the standard deviation. The result is typically referred to as a Z score,

$Z = \frac{X - μ}{σ}$

The random variable Z has a N (0,1) distribution.

As an example, suppose the heights of American young women are approximately normally distributed with μ = 65.5 inches and σ = 2.5 inches. The standardized height

$Z = \frac{h e i g h t - 65.5}{2.5}$

follows a standard normal distribution. A woman's standard height is the number of standard deviations by which her height differs from the mean height of all American young women. A woman who is 61 inches tall, for example, has a standard height of

$Z = \frac{61 - 65.5}{2.5} = - 1.8$

or 1.8 standard deviations less than the mean height.

The standard normal distribution is important in introductory classes because it simplifies probability calculations involving normally distributed random variables. Because the normal distribution is a continuous distribution probabilities can be computed as areas under the density curve. But the probability density function does not have a closed form integral solution and those areas must be determined numerically. Further, many introductory courses in statistics do not require calculus as a prerequisite and so integration is not an assumed skill. Tables of probabilities (areas) associated with the standard normal distribution are provided in introductory statistics texts. Finding the probability that a normally distributed random variable Xwith mean μ and variance σ ² falls in some interval (a,b) is solved by converting to standard units and using the tabled values.

Using the standard normal distribution to solve probability problems is no longer of much practical importance because probabilities can now be determined using computer software but the standard normal random variable still plays a major role in statistical inference as we will see.

The family of normal distributions has some nice mathematical properties not shared by other probability distributions. These mathematical properties explain, in part, the importance of the normal distribution in statistics. Users of statistics are often interested in linear transformations of their data or in combining data through the use of linear combinations. For example, the sample mean is a linear combination of observed data. The application of probability theory to data analysis must take this into account. Linear functions and linear combinations of normally distributed random variables have a property that is not shared by other probability distributions that might serve as a model for a data generating process.

Suppose X is a normally distributed random variable with mean μ and variance σ ². Then any linear transformation of the form Y = a + bX(with b ≠ 0) will be normally distributed with mean a + bμ and variance b ² σ ².

Suppose we have a sequence of independent random variable X ₁, X ₂, ···, X_n with means μ ₁, μ ₂, ···, μ_n and variances $σ_{1}^{2}, σ_{2}^{2}, \cdot \cdot \cdot, σ_{n}^{2}$ . Let a ₁, a ₂, ···, a_n be constants. What is the probability distribution of the linear combination

$Y = a_{1} X_{1} + a_{2} X_{2} + \cdot \cdot \cdot + a_{n} X_{n} ?$

It can be shown that if each of the X_i s is normally distributed then the linear combination Y is also normally distributed with mean a ₁ μ ₁ + a ₂ μ ₂ + ··· + a_nμ_n and variance

$a_{2}^{2} σ_{1}^{2} + a_{2}^{2} σ_{2}^{2} + \cdot \cdot \cdot + a_{n}^{2} σ_{n}^{2} .$

The important point of the above results is not the resulting means and variances. Those results hold for any probability distribution. What is important and, mathematically speaking, remarkable is that linear transformations and linear combinations of normally distributed random variables are themselves normally distributed.

Many commonly used statistical methods start with an assumption that observed data are a representative sample drawn from a population of individuals. A frequent goal is to use summary information in the sample to draw inferences to unknown corresponding quantities in the population. For example, the sample mean of a data set is commonly used to estimate an unknown population mean. Quantifying the uncertainty associated with the estimate requires a probability model. The data are viewed as realizations of a sequence of independent random variables X ₁, X ₂, ···, X_n . The sample mean, viewed as a random variable is

$\bar{X} = (1 / n) (X_{1} + X_{2} + \cdot \cdot \cdot + X_{n}) .$

Given the additional assumption that the values in the population can be approximated by a normal distribution with mean μ and variance σ ² then $\bar{X}$ will be normally distributed with mean μ and variance σ ²/n. We discuss the implications of this result in more detail below.

The normal distribution is important in statistics for another reason, a truly remarkable and fascinating result: The Central Limit Theorem (CLT). There are different versions of the CLT but we will consider it as it pertains to the probability distribution of the particular linear combination of random variables called the sample mean. Suppose we have a sequence of independent random variables X ₁, X ₂, ···, X_n all sharing the same finite mean μ and finite variance σ ². In the context of statistics we think of these random variables as constituting a random sample from a population with mean μ and variance σ ². We make no other distributional assumptions about the random variables, i.e. about the distribution of values in the population. The sample mean is

$\bar{X} = (1 / n) (X_{1} + X_{2} + \cdot \cdot \cdot + X_{n}) .$

The Central Limit Theorem says that if the sample size n is large enough then $\bar{X}$ will be approximately normally distributed as N (μ,σ ²/n).

How large is "large enough"? A value frequently seen in introductory statistics texts is that n ≥ 30. But, like all rules of thumb this one should not be applied indiscriminately. For some well behaved distributions (i.e. symmetric with small variances) sample sizes of 5 to 10 may suffice. For other distributions, especially those with high variance (known as fat or heavy-tailed distributions) the required sample size can be significantly greater than 30.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444518620500022

Asymptotic optimality and efficiency

Jaroslav Hájek , ... Pranab K. Sen , in Theory of Rank Tests (Second Edition), 1999

8.3.1 Exact Bahadur efficiency.

Asymptotic efficiency of tests defined in Section 8.2 gives a comparison of tests locally, i.e. for alternatives approaching the hypothesis. However, there are many other ways of comparing tests, see e.g. a survey in Singh (1981), or also Nikitin (1995).

Among the non-local measures of asymptotic efficiency the most popular one is the Bahadur efficiency. It works for fixed alternatives (and fixed hypotheses) and its main idea is to compare the rates how fast the attained significance levels of tests approach zero when the sample sizes tend to infinity.

We will present here basic definitions and theorems for some important special cases, as they were elaborated by Bahadur (1960, 1967, 1971). In principle, we arc now turning back to the end of Subsection 2.3.1.

Let X - (x₁, X₂,…) be an infinite sequence of independent and identically distributed random variables (possibly vector or abstract) with their distributions depending on a parameter Θ in a set Θ. Consider testing a null hypothesis that Θ lies in a subset Θ₀ of Θ. For each n = 1,2,…, let Tn (X) be a test statistic that depends only on (X₁,…, X_n ), and whose large values form the relevant critical regions. If T_n has the distribution function F_n under the null hypothesis, i.e.

$\begin{matrix} P_{Θ} (T_{n} \leq z) = F_{n} (z) & for \end{matrix} all Θ \in Θ_{0}, - \infty < z < \infty .$

(where F_n (z) does not depend on Θ). then the level attained by T_n is defined to be the random variable

$L_{n} (X) = 1 - F_{n} (T_{n} (X)) .$

If in a given case we observe (x₁,…, x_n ), then L_n (x₁,…, x_n ) is the probability of getting a larger value of T_n than the observed value T_n (x₁,…, x_n ) if the null hypothesis is true. Under the alternative hypothesis we expect generally that L_n (X) will be small.

The level attained is also popularly known as the p-value.

(At the end of 2.3.1 and in some other publications, the level actually attained was defined to be l(X) with l(x) = P(T_n (X) ≥ T(x)) while here we define l_n(x) = P(T_n(X) > T(x)). Thus the difference lies only in the inclusion or no inclusion of P(T_n(X) = T(x)) which is rather a matter of taste, and usually its probability tends to 0 asymptotically. Moreover, the discussion in 2.3.1 was more 'philosophical', while the present 'level attained by T_n ' is a focal point in the formulation of Bahadur efficiency.)

For finite samples in our area of interest an exact handling of L_n is usually intractable, because F_n is, as a rule, discontinuous for rank tests. However, when speaking about asymptotics, typically under the null hypothesis. L_n is asymptotically uniformly distributed over (0.1), and for the alternative hypothesis L_n → 0 exponentially fast (with probability one). We will say that for the alternative characterized by Θ the sequence {Tn} has exact slope c(Θ) if

(1) $\begin{matrix} lim_{n \to \infty} n^{- 1} log L_{n} (X) = - c (Θ) / 2 & [P_{Θ} a, s .] . \end{matrix}$

Bahadur's exact slope, for a reason immediately to be seen, can be taken as a measure of asymptotic efficiency of the test at hand. The relative Bahadur efficiency of two tests is defined as the ratio of relevant exact Bahadur slopes. (Here we use Bahadur's (1971) original definition of the exact slope. However, in some publications Bahadur's exact slope is defined as -c(Θ)/2; of course, this does not bring any change into relative efficiencies.)

Now- for given α, 0 < α < 1, and given x = (x₁, x₂,…), let N = N(α, x) be the smallest integer m such that L_n(x) < α for all n ≥ m, and let N = ∞ if no such m exists, i.e. N is the smallest sample size for which {T_n } becomes and remains significant at the level α. The following theorem shows that, for small α, the size N is approximately inversely proportional to the exact slope.

Theorem 1

If (1) hold, with 0 < c(Θ) < ∞, then,

(2) $\begin{matrix} lim_{α \to 0} N (α, x) / 2 log α^{- 1} = 1 / c (Θ) & [P_{Θ} a, s .] \end{matrix} .$

Proof.

Fix some Θ such that 0 < c(Θ) < ∞ and fix some x = (x ₁, x₂,…) such that n⁻¹ log L_n (X) → -c(Θ)/2. Then L_n > 0 for all sufficiently large n and L_n → 0 as n → ∞. It follows that N < ∞ for every α > 0 and that N → ∞ through a subsequence of integers as α → 0. Hence 2 ≤ N < ∞ for all sufficiently small α, say for α < α_1. For α < α₁, we have, by definition, L_N < α ≤ L_N-1. Thus N ⁻¹ log LN < N⁻¹ log α < (N − 1)N⁻¹(N − 1) 1 log L_N-1. It follows from the present choice of x that N ⁻¹ log α → - c(Θ)/2 as α → 0.

Let us compare two sequences of test statistic {T_n ⁽¹⁾ } and {T_n ⁽²⁾ } and positive finite exact slopes c ₁(Θ) and c ₂(Θ), respectively, under a non-null Θ alternative. It then follows from Theorem 1 that, defining N_i (α, x) the size required to make $T_{n}^{(i)}$ significant at level α, we have $\frac{N_{2} (α, x)}{N_{1} (α, x)} \to \frac{c_{1} (Θ)}{c_{2} (Θ)}$ with P _Θ probability 1. Consequently, c ₁(Θ)/c₂ (Θ) is a measure of the asymptotic efficiency (exact Bahadur efficiency) of $T_{n}^{(1)}$ relative to $T_{n}^{(2)}$ when the Θ alternative obtains.

A useful method of finding the exact slope is often given by the following theorem.

Theorem 2

Suppose that

(3) $\begin{matrix} lim_{n \to \infty} n^{- 1 / 2} T_{n} (X) = τ (Θ) & [P_{Θ} a, s .] \end{matrix}$

for each Θ ∈ Θ - Θ₀, where −∞ < τ(Θ) < ∞, and that

(4) $lim_{n \to \infty} n^{- 1} log [1 - F_{n} (n^{- 1 / 2} t)] = - v (t)$

for each t in an open interval I, where v is a continuous function on I, and {r(Θ): Θ ∈ Θ - Θ₀} $\subset$ I. Then (1) holds with c(Θ) = 2v-(τ(Θ)) for each Θ ∈ Θ- Θ_0.

Proof.

Fix some Θ ∈ Θ − Θ₀ and fix some x = {x₁,…, xn,…) such that n^−1/2 T_n(x) → τ for n → ∞. Let ɛ > 0 be so small that τ + ɛ ∈ I, τ − ɛ ∈ I. Since F_n (t) is non-decreasing in t it follows from the definition of L_n that n^1/2 (τ - ɛ) < T_n < n^1/2(τ + ɛ) implies 1 - F_n(n^1/2(τ - ɛ)) ≥ L_n ≥ 1 - F_n(n^1/2(τ + ɛ)); hence the latter inequality holds for all sufficiently large n. Assumption (4) now implies that $\underset{n \to \infty}{lim sup} n^{- 1} log L_{n} \leq - υ (τ - ɛ)$ and similarly $\underset{n \to \infty}{lim inf} n^{- 1} log L_{n} \geq - υ (τ + ɛ)$ . Since v(t) is continuous and ɛ is arbitrary this implies that $lim_{n \to \infty} n^{- 1} log L_{n} = - υ (τ)$ .

Theorem 2 shows therefore that the standard method for evaluating the Bahadur slope usually involves two ingredients: a strong convergence result expressed by (3) and a large deviation result expressed by (4), the latter being under the null hypothesis.

Bahadur's monograph (1971) as well as works of other authors contain a number of examples based on Theorem 2. While almost always the strong convergence result, corresponding to (3), is easy to verify, the second part, corresponding to (4), i.e. the large deviation result, is, as a rule, much more difficult to establish. It can also be remarked that historically, in former times, there were only very few results on Bahadur efficiency, namely for the reason that large deviation theory was still insufficiently developed but-later on, this link to Bahadur efficiency served as a strong impetus for the development of large deviation theory.

The following two examples illustrate how to find Bahadur slopes in specific situations.

Example 1.

(Bahadur (1971). p. 31) Let X be the real line, and Θ be the set of all continuous probability distribution functions Θ( x) on X, and let P_Θ (B) denote the probability measure on X determined by the distribution function Θ. The null hypothesis is that Θ = Θ₀ where Θ₀ is a given continuous distribution function.

For each n let F_n (f) = F_n(t|x₁,…, x_n) be the empirical distribution function based on {x₁,…, x_n }, and let T_n be the Kolmogorov statistic, i.e. T_n (x) =n^1/2 Sup{|F_n(t) - Θ₀(t)|: -∞ < t < ∞}. It follows from the Glivenko-Cantelli theorem that (3) holds for T_n , with τ(Θ) = sup{|Θ(t) - Θ₀(t)|: - ∞ < t < ∞} where 0 < τ(Θ) < 1 for Θ ≠ Θ_0.

Further, it follows from Bahadur (1971): Ex. 5.3. that T_n satisfies (4) where v is defined as follows. Let

$\begin{array}{l} f (a, y) = {\begin{array}{l} (a + y) log (\frac{u + y}{y}) + \\ + (1 - a - y) log (\frac{1 - a - y}{1 - y}) & for 0 \leq y \leq 1 - a, \\ \infty & for y > 1 - a \end{array} \\ g (a) = inf {f (a, y) : 0 \leq y \leq 1} . \end{array}$

In (4) we put v(t) = g(t), and, consequently, the exact slope of T_n is c(Θ) = 2g(τ(Θ)).

Example 2.

The formulas for the two-sample rank test problem are still more difficult. The following case and its solution was adapted from Kremer (1983). Let X₁,…, X_m , and Y₁,…, Y_n be two samples of N = m + n independent random variables where the X_i 's have a continuous distribution function F, while the Y_j 's have a continuous distribution function G. We wish to test the two-sample problem of randomness F = G, against alternatives F ≠ G. To this end, use a linear rank statistic

$T_{N} = N^{- 1 / 2} \sum_{i = 1}^{m} a_{N} (R_{i}),$

where R₁,…, R_N are the ranks of the combined sample X₁,…, X_m, Y₁,…, Y_n and the scores a_N (i), i = 1,…, N, satisfy

$lim_{N \to \infty} \int_{0}^{1} | a_{N} (1 + [u N]) - φ (u) | d u = 0,$

with some non-constant integrable function ϕ(u) (for simplicity supposed to be non-decreasing). We will present basic asymptotic results for m, n → ∞, where m/N → s (s ∈ (0,1) fixed).

The verification of condition (3) concerning the limit τ(Θ) in Theorem 2 is usually not difficult, so suppose it was done.

However, the derivation of the large deviation statement (4) is much more complicated: First define $t_{0} (φ, s) = s \int_{0}^{1} φ (u) d u, t_{1} (φ, s) = s \int_{1 - s}^{1} φ (u) d u$ . Now (4) holds for t < t₁(ϕ, s) and for v(t) defined

$v (t) = {\begin{matrix} \begin{matrix} 0, & for t \leq t_{0} (φ, s) \end{matrix}, \\ a t + s log z - \int_{0}^{1} log ((1 - s) + s e^{a φ (u)}) d u \\ for t \in (t_{0} (φ_{1} s), t_{1} (φ, s)), \end{matrix}$

where a, z ≥ 0 are the unique solutions of the integral equations

$\begin{array}{r} \int_{0}^{1} s φ (u) \frac{z e^{a φ (u)}}{(1 - s) + s z e^{a φ (u)}} d u = t, \\ \int_{0}^{1} \frac{z e^{a φ (u)}}{(1 - s) + s z e^{a φ (u)}} d u = 1. \end{array}$

This statement can be proved in two stages. First, we assume that a_N(1 + [uN]) = ϕ(u) is a step function which reduces (4) to a large deviation statement for a multinomially distributed statistic. Second, we utilize the assumption that the step function a_N (1 + (uN) → ϕv(u).

Consequently, the exact slope of T_N equals 2_v(τ(Θ)).

Concerning the literature on these topics, in addition to Bahadur's works quoted above, we refer to Kremer (1983) (a survey paper). Woodworth (1970) (general class of rank tests including linear rank tests for the two-sample and independence problem). Hoadley (1965) (Wilcoxon test). Stone (1967, 1968) (Wilcoxon and normal scores test), and Nikitin (1995).

As a complement to our considerations, we will now present a useful non-asymptotic property of L_n(X) in the null case, namely its relation to the uniform distribution. We will now work with left-continuous distribution functions.

Theorem 3

For each Θ ∈ Θ₀ and each n,

$\begin{matrix} P_{Θ} (L_{n} (X) \leq u) \leq u & for \end{matrix} 0 \leq u \leq 1.$

Proof.

Let us have some Θ ∈ Θ₀, and some statistic T = Tn , but we will omit Θ and n in the sequel since they are fixed. If F, the distribution function of T, is continuous, then L is uniformly distributed over [0,1] and P(L ≤ u) = u for 0 ≤ u ≤ 1. In the general case, let U be a random variable distributed uniformly over (0,1) and independent of X, and put T* = T*(X, U) = T(X) + αU, α > 0 being a constant. (This corresponds essentially to randomization of the outcome X, cf. also Problems to Section 2.3.) Then the distribution function F* of T* is continuous, hence F*(T*) is uniform on [0,1]. Now. for any t. F*(t) = P(T + αU < t) ≥ P(T < t-α) = F(t-α); this shows F(T* - α) ≥ F(T* - α) ≥ F(T - α) since T* ≥ T and F is non-decreasing. Therefore P(1 - F(T - α) < t) ≤ t for t ≥ 0. Now, let us have a decreasing sequence of positive constants α_1.α₂,…, such that α _k → 0, and let A_k(t) be the event 1 - F(T - α_k) < t. Then P[A_k(t)) ≤ t for each k. Since F is nondecreasing and left-continuous, we have $A_{k} (t) \subset A_{k + 1} (t)$ for each k and $\cup_{k = 1}^{\infty} A_{k} (t)$ is the event $1 - F (T) (\equiv L) < t$ . Consequently, $P (L < t) = lim_{k \to \infty} P (A_{k} (t)) \leq t$ . Since t is arbitrary, we get easily now P(L ≤ u) ≤ u for 0 ≤ u ≤ 1.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780126423501500266

Invited article by M. Gidea Extreme events and emergency scales

Veniamin Smirnov , ... Dimitri Volchenkov , in Communications in Nonlinear Science and Numerical Simulation, 2020

4.1 Generalized extreme value distributions

The GEVD is a flexible three-parameter continuous probability distributions that was developed with extreme value theory to combine the Gumbel, Fréchet, and Weibull extreme values distributions into one single distribution [34,35]. The GEV distribution has the following pdf [36]:

$f (x; μ, σ, ξ) = \frac{1}{σ} t {(x)}^{ξ + 1} e^{- t (x)},$

where

$t (x) = {\begin{matrix} {(1 + ξ (\frac{x - μ}{σ}))}^{- 1 / ξ} & ξ \neq 0 \\ e^{- (x - μ) / σ} & ξ = 0 \end{matrix}$

and $μ \in R$ is the location parameter, σ > 0 is the scale parameter, and $ξ \in R$ is the shape parameter. When the shape parameter ξ is equal to 0, greater than 0, and lower than 0 [33], the GEV distribution is equivalent to Gumbel [37], Fréchet [38] and "reversed" Weibull distributions [39], respectively.

The Gumbel distribution, also named as the Extreme Value Type I distribution, has the following pdf and cdf:

(4.1) $f (x; μ, β) = \frac{1}{β} e^{- (\frac{x - μ}{β} + e^{- \frac{x - μ}{β}})},$

(4.2) $F (x; μ, β) = e^{- e^{- \frac{x - μ}{β}}}$

where $x \in R,$ μ is the location parameter, β > 0 is the scale parameter. Specially, when $μ = 0$ and $β = 1,$ the distribution becomes the standard Gumbel distribution. Generalizations of the Gumbel distribution, which are of flexible skewness and kurtosis due to the addition of one more shape parameter are widely used for extreme value data as they better fit data [40]. The distribution in (4.1) has been employed as a model for extreme values [41,42]. The distribution has a light right tail, which declines exponentially, since its skewness and kurtosis coefficients are constant.

The Fréchet distribution, also known as the Extreme Value Type II distribution, has the following pdf and cdf, respectively:

$\begin{matrix} f (x; α, β) & = \frac{α}{β} {(\frac{β}{x})}^{α + 1} e^{- {(\frac{β}{x})}^{α}}, \\ F (x; α, β) & = e^{- {(\frac{β}{x})}^{α}} \end{matrix}$

where α > 0 is the shape parameter and β > 0 is the scale parameter.

The Weibull distribution is known as the Extreme Value Type III distribution. The pdf and cdf of a Weibull random variable are shown as follows, respectively:

$f (x; λ, k) = {\begin{matrix} \frac{k}{λ} {(\frac{x}{λ})}^{k - 1} e^{- {(x / λ)}^{k}} x \geq 0 \\ 0 x < 0 \end{matrix}$

$F (x; λ, k) = {\begin{matrix} 1 - e^{- {(x / λ)}^{k}} x \geq 0 \\ 0 x < 0 \end{matrix}$

where λ > 0 is the scale parameter and k > 0 is the shape parameter.

Further we show the application of the GEV model to the stock market close price using the weekly-return data that was calculated by

$R (t) = \frac{(maximum close price of week t) - (maximum close price of week (t - 1))}{(maximum close price of week (t - 1))} .$

The results of fitting the GEV distribution to (weekly) block maxima data is presented in Fig. 6 and Table 1 that present the Quantile-quantile plot (QQ-plot), quantiles from a sample drawn from the fitted GEV pdf against the empirical data quantiles with 95% confidence bands. The maximum likelihood estimators of the GEV distribution are the values of the three parameters (μ, σ, ξ) that maximize the log -likelihood. The magnitude along with positive sign of ξ indicates the fat-tailness of the weekly-return data, which is consistent with the quantile plot.

Table 1. Parameter estimates for the GEV fitted model with maximum likelihood estimator. The 95% confidence intervals for each estimate are included.

	Location $\hat{μ}$	Scale $\hat{σ}$	Shape $\hat{ξ}$
Estimated parameter	606.3260	511.1713	0.1215
95% a lower bound of the confidence interval	576.43	486.42	0.05
95% an upper bound of the confidence interval	636.22	535.92	0.19

Based on the statistical analysis presented above (Fig. 6a), we see that the distribution of the weekly-return data can be described by a combination of different distributions. The density plot (Fig. 6b) having two humps validates the idea of a mixture of distributions.

Read full article

URL:

https://www.sciencedirect.com/science/article/pii/S1007570420301829

desailllytrustre.blogspot.com

Source: https://www.sciencedirect.com/topics/mathematics/continuous-probability-distribution

Desaillly Trustre

The Total Area Within Any Continuous Probability Distribution is Equal to 1 00

Probability and Sampling Distributions

2.4.1 Characteristics of a Continuous Probability Distribution

Information Geometry

2.5 Obtaining fisher information measure

Truth, Possibility and Probability

Theorem X.6

Probability, Statistics, and Experimental Errors

Continuous Probability Distributions

Probability and Statistics

Exercises

Continuous Distributions

Cumulative Distribution Functions

Covariance for Continuous Distributions

Probability and statistics

Finding probabilities from a continuous graph

Advanced Math and Statistics

Probability Distributions

Elementary Probability and Statistics

5.2 The normal distribution

Asymptotic optimality and efficiency

8.3.1 Exact Bahadur efficiency.

Invited article by M. Gidea Extreme events and emergency scales

4.1 Generalized extreme value distributions

Menu Halaman Statis