Maximum Likelihood及Maximum Likelihood Estimation

1、What is Maximum Likelihood?

极大似然是一种找到最可能解释一组观测数据的函数的方法。

Maximum Likelihood is a way to find the most likely function to explain a set of observed data.

在基本统计学中，通常给你一个模型来计算概率。例如，你可能被要求找出X大于2的概率，给定如下泊松分布:X ~ Poisson (2.4)。在这个例子中，已经给定了你泊松分布的参数 λ(2.4)，在现实生活中，您没有这么奢侈，因为您没有确定参数的模型:您必须将数据与模型相匹配。这就是最大可能性(MLE)的作用。在统计学中，最大似然估计(maximum likelihood estimation, MLE)是在给定观测值的情况下估计统计模型参数的一种方法。MLE试图在给定观测值的情况下找到使似然函数最大化的参数值。得到的估计称为最大似然估计，也缩写为MLE。

In elementary statistics, you are usually given a model to find probabilities. For example, you might be asked to find the probability that X is greater than 2, given the following Poisson distribution:

X ~ Poisson (2.4)

In this example, you are given the parameter, λ, of 2.4 for the Possion distribution. In real life, you don’t have the luxury of having a model given to you: you’ll have to fit your data to a model. That’s where Maximum Likelihood (MLE) comes in.
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model, given observations. MLE attempts to find the parameter values that maximize the likelihood function, given the observations. The resulting estimate is called a maximum likelihood estimate, which is also abbreviated as MLE.

MLE采用已知的概率分布模型(如正态分布)，并将数据集与这些分布进行比较，以便找到数据的合适匹配。一个分布模型对应的参数可以有无穷个。例如正态分布的均值可以是0，也可以是100亿以上。最大似然估计是找到最可能生成待测样本的总体参数的一种方法。数据与模型的匹配程度称为“拟合优度”。

MLE takes known probability distributions (like the normal distribution) and compares data sets to those distributions in order to find a suitable match for the data. A Family of distributions can have an infinite amount of possible parameters. For example, the mean of the normal distribution could be equal to zero, or it could be equal to ten billion and beyond. Maximum Likelihood Estimation is one way to find the parameters of the population that is most likely to have generated the sample being tested. How well the data matches the model is known as “Goodness of Fit.”

例如，研究人员可能有兴趣找出吃特定食物的老鼠的平均体重增加。研究人员无法测量每只老鼠的体重，所以只能取样。大鼠体重增加呈正态分布;最大似然估计可用于求基于该样本的总体增重的均值和方差

For example, a researcher might be interested in finding out the mean weight gain of rats eating a particular diet. The researcher is unable to weigh every rat in the population so instead takes a sample. Weight gains of rats tend to follow a normal distribution; Maximum Likelihood Estimation can be used to find the mean and variance of the weight gain in the general population based on this sample

MLE根据似然函数的最大值来选择模型参数。

MLE chooses the model parameters based on the values that maximize the Likelihood Function.

2、The Likelihood Function（似然函数，是一种表示概率的方法；似然表示得到样本的概率；最大似然表示的是得到样本最大概率的参数）

给定一个特定的概率分布模型，样本的似然是得到样本的概率。似然函数是一种表示概率的方法:最大概率得到样本的参数是最大似然估计。

一句话：似然表示概率；似然函数表示得到概率的方法；最大似然表示的得到最大概率的参数

The likelihood of a sample is the probability of getting that sample, given a specified probability distribution model. The likelihood function is a way to express that probability: the parameters that maximize the probability of getting that sample are the Maximum Likelihood Estimators.

假设你有一组从一个未知分布参数Θ的总体得到的随机变量X1, X2…Xn。该分布的概率密度函数(PDF) f(X_i,Θ)模型,X_i是随机变量的集合,Θ是未知参数。最大似然函数你想知道Θ最可能的值是什么,得到随机变量X_i。本例的联合概率密度函数为:

Let’s suppose you had a set of random variables X1, X2…Xn taken from an unknown population distribution with parameter Θ. This distribution has a probability density function (PDF) of f(Xi,Θ) where f is the model, Xi is the set of random variables and Θ is the unknown parameter. For the maximum likelihood function you want to know what the most likely value for Θ is, given the set of random variables Xi. The joint probability density function for this example is:

3、The Basic Idea

It seems reasonable that a good estimate of the unknown parameter θ would be the value of θ that maximizes the probability, errrr... that is, the likelihood... of getting the data we observed. (So, do you see from where the name "maximum likelihood" comes?) So, that is, in a nutshell, the idea behind the method of maximum likelihood estimation. But how would we implement the method in practice? Well, suppose we have a random sample X₁, X₂,..., X_n for which the probability density (or mass) function of each X_i is f(x_i; θ). Then, the joint probability mass (or density) function of X₁, X₂,..., X_n, which we'll (not so arbitrarily) call L(θ) is:

The first equality is of course just the definition of the joint probability mass function. The second equality comes from that fact that we have a random sample, which implies by definition that the X_iare independent. And, the last equality just uses the shorthand mathematical notation of a product of indexed terms. Now, in light of the basic idea of maximum likelihood estimation, one reasonable way to proceed is to treat the "likelihood function" L(θ) as a function of θ, and find the value of θ that maximizes it.

4、example1

假设权重随机选择的美国女大学生与未知的正态分布均值μ和标准差σ。随机抽取的10名美国女大学生的体重(以磅为单位)如下:

115   122   130   127   149   160   152   138  149   180

根据上面给出的定义,识别似然函数和μ的极大似然估计量,所有的美国女大学生的平均重量。使用给定的样本,找到一个最大似然估计的μ。

Based on the definitions given above, identify the likelihood function and the maximum likelihood estimator of μ, the mean weight of all American female college students. Using the given sample, find a maximum likelihood estimate of μ as well.

5、example2

Suppose we have a random sample X₁, X₂,..., X_n where:

X_i = 0 if a randomly selected student does not own a sports car, and
X_i = 1 if a randomly selected student does own a sports car.

Assuming that the X_i are independent Bernoulli random variables with unknown parameter p, find the maximum likelihood estimator of p, the proportion of students who own a sports car.

6、文献

https://newonlinecourses.science.psu.edu/stat414/node/191/（写的很好，里面有很多的例子）

https://en.wikipedia.org/wiki/Maximum_likelihood_estimation

https://www.statisticshowto.datasciencecentral.com/maximum-likelihood-estimation/