Foundation of Data Analysis 学习笔记(2) Probability and Distributions

2019-10-09 00:23

学习笔记

2716 阅读

0 回复

Note for chapter 2 Probability and Distributions (概率与分布)

一些符号：

符号	表示	定义
$\forall$	for all symbol 对所有	$\forall x: P(x)$ 表示 $P(x)$ 对于所有 $x$ 为真

Probability 概率

Properties of a probability space 概率空间的性质

概率空间是概率论的基础。概率的严格定义基于这个概念。

$P(C)=1-P\left(C^{c}\right)$
$P(\emptyset)=0$
$P\left(C_{1}\right) \leq P\left(C_{2}\right)$ if $C_{1} \subset C_{2}$
$0 \leq P(C) \leq 1, \quad \forall C \in \mathcal{B}$
Inclusion-exclusion formula (容斥原理)
$P\left(C_{1} \cup C_{2}\right)=P\left(C_{1}\right)+P\left(C_{2}\right)-P\left(C_{1} \cap C_{2}\right)$
More general:
$P\left(C_{1} \cup \ldots \cup C_{k}\right)=p_{1}-p_{2}+p_{3}-\ldots+(-1)^{k+1} p_{k},$
where
$\begin{array}{l}{p_{1} = \sum_{i=1}^{k} P\left(C_{i}\right), \quad p_{2}=\sum_{i=1}^{k} \sum_{j=i+1}^{k} P\left(C_{i} \cap C_{j}\right)} \\ {p_{k}=P\left(C_{1} \cap \ldots \cap C_{k}\right)}\end{array}$

Law of total probability 全概公式

Let $\left\{C_{1}, \ldots, C_{k}\right\}$ be a partition of $C$

$P(C)=\sum_{i=1}^{k} P\left(C_{i}\right) P\left(C | C_{i}\right)$

is called the law of total probability.

Bayes’ Theorem:

$P\left(C_{j} | C\right)=\frac{P\left(C \cap C_{j}\right)}{P(C)}=\frac{P\left(C_{j}\right) P\left(C | C_{j}\right)}{\sum_{i=1}^{k} P\left(C_{i}\right) P\left(C | C_{i}\right)}$

Distribution

Bernoulli experiment (伯努利试验) and Bernoulli Distribution (伯努利分布)

伯努利试验（Bernoulli experiment）是在同样的条件下重复地、相互独立地进行的一种随机试验，其特点是该随机试验只有两种可能结果：发生或者不发生。

我们假设该项试验独立重复地进行了 $n$ 次，那么就称这一系列重复独立的随机试验为 $n$ 重伯努利试验，或称为伯努利概型。单个伯努利试验是没有多大意义的。

Let $X$ be a random variable associated with a Bernoulli trial by defining it as follows:

$X(\text { success })=1 \quad \text { and } \quad X \quad(\text { failure })=0$

The pmf of $X$ can be written as

$p(x)=p^{x}(1-p)^{1-x}, x=0,1$

伯努利分布就是常见的0-1分布，即两点分布（two-point distribution）。

Binomial Distribution (二项分布)

Let $X$ equal the number of observed successes in $n$ Bernoulli trials, the
possible values of $X$ are $0,1, \cdots, n .$ We say the $X$ follows a binomial distribution and write $X \sim B(n, p) .$ The pdf of $x$ is

$p(x)=\left\{\begin{array}{ll}{\left(\begin{array}{l}{n} \\ {x}\end{array}\right) p^{x}(1-p)^{n-x}, \quad x=0,1, \cdots, n} \\ {0,} & {\text { elsewhere. }}\end{array}\right.$

where $\left(\begin{array}{l}{n} \\ {x}\end{array}\right)=\frac{n !}{x !(n-x) !}$ .

这里是一个二项分布的资料，有一些习题例子：Binomial Distribution

Geometric distribution (几何分布)

Let $X$ be the number of a Bernoulli trials where the first “yes”
appeared. $\mathcal{D}_{X}=\{1,2, \ldots\}$
Let $Y$ be the number of “No” before the first “yes”. Y=X-1.
$\mathcal{D}_{Y}=\{0,1, \ldots\}$

$P(X=n)=P(Y=n-1)=p(1-p)^{n-1}, n=1,2, \cdots$

Multinomial Distribution (多项分布)

This is an extension of the binomial distribution.
Let a random experiment be repeated $n$ independent times.
Each experimental results in but one of $k$ mutually exclusive
and exhaustive ways, say $C_{1}, \ldots, C_{k}$ . Let $p_{i}$ be the prob. that the
outcome is an element of $C_{i} .$
Let $X_{i}$ be the number of outcomes that are elements of $C_{i} .$ We
have $X_{1}+X_{2}+\cdots+X_{k}=n .$
The pmf of $X_{1}, \cdot, X_{k-1}$ is
$P\left(X_{1}=n_{1}, \ldots, X_{k}=n_{k}\right)=\left\{\begin{array}{ll}{\frac{n !}{n ! \ldots n_{k} !} p_{1}^{n_{1}} \cdots p_{k}^{n_{k}},} & {n_{1}+\cdots n_{k}=n} \\ {0,} & {\text { elsewhere. }}\end{array}\right.$

Poisson Distribution (泊松分布)

A random variable $X$ that has a pmf

$p(x)=\left\{\begin{array}{ll}{\frac{m^{x} e^{-m}}{x !},} & {x=0,1, \cdots} \\ {0, \text { elsewhere, }}\end{array}\right.$

is said to have a Poisson distribution with parameter $m$ .

Suppose $X_{1}, X_{2}, \cdots, X_{n}$ are independent random variables and suppose $X_{i}$ has a Poisson distribution with parameter $m_{i} .$ Then $Y=\sum_{i=1}^{n} X_{i}$ has a Poisson distribution with parameter $\sum_{i=1}^{n} m_{i}$ .

Exponential Distribution (指数分布)

The exponential distribution $E(\lambda)$ with the pdf

$f(x)=\left\{\begin{array}{ll}{\lambda e^{-\lambda x},} & {x \geq 0} \\ {0,} & {x<0}\end{array}\right.$

was one of important continuous distribution in theory of reliability, queueing theory and telephone system.

Gamma Distribution (伽玛分布)

The Gamma Function

The integral is called the gamma function of $\alpha>0,$ and we write

$\Gamma(\alpha)=\int_{0}^{\infty} y^{\alpha-1} e^{-y} d y$

Properties:

$\Gamma(1)=1$
$\Gamma(\alpha)=(\alpha-1) \Gamma(\alpha-1)$
$\Gamma(n)=(n-1) !$ if $n$ is a positive integer
$\Gamma(0)=\infty, \quad \Gamma\left(\frac{1}{2}\right)=\sqrt{\pi}, \Gamma(\alpha) \Gamma(1-\alpha)=\frac{\pi}{\sin (\pi \alpha)}$

The $\Gamma$ -distribution

A random variable $X$ that has the pdf of the form

$f(x)=\left\{\begin{array}{ll}{\frac{1}{\Gamma(\alpha) \beta^{\alpha}} x^{\alpha-1} e^{-x / \beta},} & {0<x<\infty} \\ {0,} & {\text { elsewhere }}\end{array}\right.$

is said to have a gamma distribution with parameters $\alpha$ and $\beta,$ where $\alpha>0$ and $\beta>0 .$ We will write $X \sim \Gamma(\alpha, \beta)$ or $X \sim gamma(\alpha, \beta)$ .
We have

$f(x) \geq 0$ ;
$1=\int_{0}^{\infty} \frac{1}{\Gamma(\alpha) \beta^{\alpha}} x^{\alpha-1} e^{-x / \beta} d x$

as by using a transformation of $y=x / \beta$ in the integral of $\Gamma(\alpha)$

$\Gamma(\alpha)=\int_{0}^{\infty}\left(\frac{x}{\beta}\right)^{\alpha-1} e^{-x / \beta}\left(\frac{1}{\beta}\right) d x$

The $\Gamma$ -distribution involves many useful distributions

The standard $\Gamma$ -distribution $\Gamma(\alpha, 1)$ with $\mathrm{pd} f$

$f(x)=\left\{\begin{array}{ll}{x^{\alpha-1} e^{-x} / \Gamma(\alpha),} & {x \geq 0} \\ {0,} & {x<0}\end{array}\right.$

The exponential distribution $(\alpha=1, \lambda=1 / \beta)$ with pdf

$f(x)=\left\{\begin{array}{ll}{\lambda e^{-\lambda x},} & {x \geq 0} \\ {0,} & {x<0}\end{array}\right.$

The $\chi^{2}$ -distribution $(\alpha=n / 2, \beta=2)$ with $\mathrm{pdf}$

$f(x)=\left\{\begin{array}{ll}{\frac{1}{2^{n / 2} \Gamma(n / 2)} x^{n / 2-1} e^{-x / 2},} & {x \geq 0} \\ {0,} & {x<0}\end{array}\right.$

The $\chi^{2}$ -distribution ( $\chi^{2}$ )

Example. If $X$ has the pdf

$f(x)=\left\{\begin{array}{ll}{\frac{1}{4} x e^{-x / 2},} & {0<x<\infty} \\ {0,} & {\text { elsewhere, }} \\ {\text { then } X \sim \chi^{2}(4)}\end{array}\right.$

Corollary:
Let $X_{1}, X_{2}, \cdots, X_{n}$ be independent and
$X_{i} \sim \chi^{2}\left(n_{i}\right), i=1, \ldots, n .$ Then

$Y=\sum_{i=1}^{n} X_{i} \sim \chi^{2}(m)$

where $m=\sum_{i=1}^{n} n_{i} .$

The $\beta$ -distribution (贝塔分布)

The beta function

$B(a, b)=\int_{0}^{1} y^{a-1}(1-y)^{b-1} d y ; a>0, b>0$

Properties

$B(a, b)=b(b, a)$
$B(a, b)=\frac{\Gamma(a) \Gamma(b)}{\Gamma(a+b)}$
$B(a, b-a)=\int_{0}^{\infty} x^{a-1}(1+x)^{-b} d x$

The $\beta$ -distribution

Let $X_{1}$ and $X_{2}$ be two independent random variables, where $X_{1} \sim \Gamma(\alpha, 1)$ and $X_{2} \sim \Gamma(\beta, 1) .$ The distribution of $B=\frac{X_{1}}{X_{1}+X_{2}}$ is called the $\beta$ -distribution with parameters $\alpha$ and $\beta$ and write $B \sim \beta(\alpha, \beta)$ or B \sim \operatorname{beta}(\alpha, \beta)

Properties of The $\beta$ -distribution:
The $\beta$ -distribution involves

The uniform distribution $=\beta(1,1)$ with pdf 1 on $[0,1]$ and 0
elsewhere.
The inverse sine distribution $=\beta(1 / 2,1 / 2) .$ Its pdf is

$p(x)=\left\{\begin{array}{ll}{\frac{1}{\pi \sqrt{x(1-x)}},} & {0 \leq x \leq 1} \\ {0,} & {\text { elsewhere }}\end{array}\right.$

The power distribution $=\beta(\alpha, 1)$ and its pdf is

$p(x)=\left\{\begin{array}{ll}{\alpha x^{\alpha-1},} & {0 \leq x \leq 1} \\ {0,} & {\text { elsewhere }}\end{array}\right.$

Normal Distribution (正态分布)

Definition A random variable $X$ that has a pdf

$p(x)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left\{-\frac{(x-\mu)^{2}}{2 \sigma^{2}}\right\}$

is said to have a normal distribution with parameters $\mu$ and $\sigma^{2},$ and write $X \sim N\left(\mu, \sigma^{2}\right) .$ When $\mu=0$ and $\sigma^{2}=1,$ we say that $X$ follows a standard normal distribution.

Assume random variable $X \sim N\left(\mu, \sigma^{2}\right)$ with $\sigma^{2}>0,$ then the random variable $V=(X-\mu)^{2} / \sigma^{2} \sim \chi^{2}(1)$ .

The $t$ -distribution ( $t$ 分布)

Let random variables $W \sim N(0,1)$ and let $V \sim \chi^{2}(n)$ are independent.
Define a new random variable $T$ by writing

$T=\frac{W}{\sqrt{V / n}}=\sqrt{n} \frac{W}{\sqrt{V}}$

We say that $T$ follows a $t$ -distribution with $n$ degrees of freedom.

The $F$ -distribution ( $F$ 分布)

Let $U \sim \chi_{m}^{2}$ and $V \sim \chi_{n}^{2}$ be independent. Then

$F=\frac{U / m}{V / n}$

have the pdf

$p(f)=\left\{\begin{array}{ll}{\frac{\Gamma\left(\frac{m+n}{2}\right)(m / n)^{m / 2}}{\Gamma(m / 2) \Gamma(n / 2)} \frac{f^{m / 2-1}}{\left(1+\frac{m f}{n}\right)^{(m+n) / 2}},} & {0<f<\infty} \\ {0,} & {\text { elsewhere }}\end{array}\right.$

We say that $F$ follows a $F$ -distribution with $m$ and $n$ degrees of
freedom, and write $F \sim F_{m, n}$

End

标签：学习

本文链接：https://www.goozp.com/article/116.html

接收邮件

咖啡与代码

Foundation of Data Analysis 学习笔记(2) Probability and Distributions

2019-10-09 00:23

学习笔记

2716 阅读

0 回复

Probability 概率

Properties of a probability space 概率空间的性质

Law of total probability 全概公式

Bayes’ Theorem:

Distribution

Bernoulli experiment (伯努利试验) and Bernoulli Distribution (伯努利分布)

Binomial Distribution (二项分布)

Geometric distribution (几何分布)

Multinomial Distribution (多项分布)

Poisson Distribution (泊松分布)

Exponential Distribution (指数分布)

Gamma Distribution (伽玛分布)

The Gamma Function

The $\Gamma$ -distribution

The $\chi^{2}$ -distribution ( $\chi^{2}$ )

The $\beta$ -distribution (贝塔分布)

The beta function

The $\beta$ -distribution

Normal Distribution (正态分布)

The $t$ -distribution ( $t$ 分布)

The $F$ -distribution ( $F$ 分布)

暂未开放

0 条评论

来做第一个留言的人吧！

目录

咖啡与代码

Foundation of Data Analysis 学习笔记(2) Probability and Distributions 2019-10-09 00:23 学习笔记 2716 阅读 0 回复

Probability 概率

Properties of a probability space 概率空间的性质

Law of total probability 全概公式

Bayes’ Theorem:

Distribution

Bernoulli experiment (伯努利试验) and Bernoulli Distribution (伯努利分布)

Binomial Distribution (二项分布)

Geometric distribution (几何分布)

Multinomial Distribution (多项分布)

Poisson Distribution (泊松分布)

Exponential Distribution (指数分布)

Gamma Distribution (伽玛分布)

The Gamma Function

The Γ\GammaΓ-distribution

The χ2\chi^{2}χ2-distribution (χ2\chi^{2}χ2)

The β\betaβ-distribution (贝塔分布)

The beta function

The β\betaβ-distribution

Normal Distribution (正态分布)

The ttt-distribution (ttt分布)

The FFF-distribution (FFF分布)

暂未开放

0 条评论

来做第一个留言的人吧！

目录

Foundation of Data Analysis 学习笔记(2) Probability and Distributions

2019-10-09 00:23

学习笔记

2716 阅读

0 回复

The $\Gamma$ -distribution

The $\chi^{2}$ -distribution ( $\chi^{2}$ )

The $\beta$ -distribution (贝塔分布)

The $\beta$ -distribution

The $t$ -distribution ( $t$ 分布)

The $F$ -distribution ( $F$ 分布)