最尤法 - 暇つぶしWikipedia

最尤法

最尤推定（さいゆうすいてい、英: maximum likelihood estimationという）や最尤法（さいゆうほう、英: method of maximum likelihood）とは、統計学において、与えられたデータからそれが従う確率分布の母数を点推定する方法である。

X ∼ f ( X ∣ θ 0 ) θ ^ = arg ⁡ max θ L ( θ ∣ X = x ) = arg ⁡ max θ f ( X = x ∣ θ ) {\displaystyle {\begin{array}{lcl}X&\thicksim &f(X\mid \theta _{0})\\{\hat {\theta }}&=&\arg \max \limits _{\theta }L(\theta \mid X=x)=\arg \max \limits _{\theta }f(X=x\mid \theta )\end{array}}}

この方法はロナルド・フィッシャーが1912年から1922年にかけて開発した。

観測されたデータからそれを生んだ母集団を説明しようとする際に広く用いられる。生物学では塩基やアミノ酸配列のような分子データの置換に関する確率モデルに基づいて系統樹を作成する際に、一番尤もらしくデータを説明する樹形を選択するための有力な方法としても利用される。機械学習ではニューラルネットワーク（特に生成モデル）を学習する際に最尤推定（負の対数尤度最小化として定式化）が用いられる。
基本的理論

最尤推定が解く基本的な問題は「パラメータ θ {\displaystyle \theta } が不明な確率分布 f D {\displaystyle f_{D}} に従う母集団から標本が得られたとき、データを良く説明する良い θ {\displaystyle \theta } は何か」である。

ある母集団が確率分布関数 f D {\displaystyle f_{D}} と母数 θ {\displaystyle \theta } で表される離散確率分布 D {\displaystyle D} を従うとする。そこから n {\displaystyle n} 個の標本 X 1 , X 2 , . . . X n {\displaystyle X_{1},X_{2},...X_{n}} を取り出すことを考えよう。すると分布関数から、観察されたデータ（標本）が得られる確率を次のように計算できる（離散分布はP=f)：

P ( x 1 , x 2 , … , x n ∣ θ ) = f D ( x 1 , … , x n ∣ θ ) {\displaystyle \mathbb {P} (x_{1},x_{2},\dots ,x_{n}\mid \theta )=f_{D}(x_{1},\dots ,x_{n}\mid \theta )}

このとき、母集団分布 D {\displaystyle D} の形（確率分布 f D {\displaystyle f_{D}} ）はわかっているが母数 θ {\displaystyle \theta } は不明な場合、どうしたら θ {\displaystyle \theta } を良く推定できるか？利用できる情報はこの母集団から得られた n {\displaystyle n} 個の標本 X 1 , X 2 , . . . X n {\displaystyle X_{1},X_{2},...X_{n}} である。

最尤法では、 θ {\displaystyle \theta } を仮定したときに今回サンプリングされた標本が得られる確率に着目する。すなわち上記にある、母数 θ {\displaystyle \theta } で条件付けられた確率Pに着目する。異なる θ {\displaystyle \theta } （ θ a {\displaystyle \theta _{a}} と θ b {\displaystyle \theta _{b}} ）を仮定して P θ a < P θ b {\displaystyle P_{\theta _{a}}<P_{\theta _{b}}} だった場合、これは何を意味するか？例えばコイン振りの表確率 θ {\displaystyle \theta } を θ a = 0.01 {\displaystyle \theta _{a}=0.01} と θ b = 0.5 {\displaystyle \theta _{b}=0.5} と仮定し、実際の標本が（表・表・表・表・裏）となって P ( x ∣ θ = 0.01 ) = 0.000...9 {\displaystyle \mathbb {P} (x\mid \theta =0.01)=0.000...9} 、 P ( x ∣ θ = 0.5 ) = 0.03125 {\displaystyle \mathbb {P} (x\mid \theta =0.5)=0.03125} （ P θ a << P θ b {\displaystyle P_{\theta _{a}}<<P_{\theta _{b}}} ）だった場合、これは何を意味するか？

直感的には「 θ b = 0.5 {\displaystyle \theta _{b}=0.5} の方がそれっぽい」と考えられる。すなわち2つの θ {\displaystyle \theta } を仮定したとき、片方ではほぼあり得ない現象が起きたことになり、もう片方ではまぁありうる確率の現象が起きたと考えられるので、より P ( x ∣ θ ) {\displaystyle \mathbb {P} (x\mid \theta )} が大きい方が尤もらしいと推定しているのである。もちろん奇跡的に稀な表が続いた（ θ a = 0.01 {\displaystyle \theta _{a}=0.01} である）可能性もありうるが、より尤もらしいのはより起きやすい現象であろう、という論理が最尤推定の根底にある論理である（「起きやすい現象が起きた」と「起きづらい現象が起きた」なら前者と考えるのが合理的、という論理）。

このような論理に基づき、母数 θ {\displaystyle \theta } の一番尤もらしい値を探す（ θ {\displaystyle \theta } のすべての可能な値の中から、観察された標本の尤度 P ( x ∣ θ ) {\displaystyle \mathbb {P} (x\mid \theta )} を最大にするものを探す）方法が最尤推定である。これは他の推定量を求める方法と対照的である。たとえば θ {\displaystyle \theta } の不偏推定量は、 θ {\displaystyle \theta } を過大評価することも過小評価することもないが、必ずしも一番尤もらしい値を与えるとは限らない。尤度関数を次のように定義する：

L ( θ ) = f D ( x 1 , … , x n ∣ θ ) {\displaystyle L(\theta )=f_{D}(x_{1},\dots ,x_{n}\mid \theta )}

この関数を母数 θ {\displaystyle \theta } のすべての可能な値から見て最大になるようにする。そのような値 θ ^ {\displaystyle {\hat {\theta }}} を母数 θ {\displaystyle \theta } に対する最尤推定量（さいゆうすいていりょう、maximum likelihood estimator、これもMLEと略す）という。最尤推定量は（適当な仮定の下では）しばしば尤度方程式（ゆうどほうていしき、likelihood equation） ∂ ∂ θ log ⁡ L ( θ ) = 0 {\displaystyle {\frac {\partial }{\partial \theta }}\log L(\theta )=0}

の解として求められる。
注意

尤度は θ {\displaystyle \theta } を変数とし x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\ldots ,x_{n}} を定数とする関数である。

最尤推定量は唯一ではないこともあるし、存在しないことさえある [1]。

f D {\displaystyle f_{D}} を離散確率分布関数でなく確率密度関数として考えれば、上の定義は連続確率分布にも当てはまる。

尤度の解釈

尤度 P ( x ∣ θ ) {\displaystyle \mathbb {P} (x\mid \theta )} は条件付確率の定義から「 θ {\displaystyle \theta } を仮定したときに今回サンプリングされた標本が得られる確率」である。「観測データから求まる、パラメータが θ {\displaystyle \theta } である確率」では決してない。それは事後確率 P ( θ ∣ x ) {\displaystyle \mathbb {P} (\theta \mid x)} である。

よって尤度最大の θ {\displaystyle \theta } を求める最尤推定は「パラメータが θ {\displaystyle \theta } である確率をデータから最大化する統計的推論手法」ではない。起きやすい現象が起きた場合が最も尤もらしいという考えに基づいて、尤度を最大化する θ {\displaystyle \theta } を母集団の推定値とする手法が最尤推定である。
他手法との関係性
 MAP推定

最尤推定は最大事後確率推定（MAP推定）の特殊例とみなせる。ベイズの定理より P ( θ ∣ x ) ∼ L ( θ ∣ x ) ⋅ P ( θ ) {\displaystyle \mathbb {P} (\theta \mid x)\sim \mathbb {L} (\theta \mid x)\cdot \mathbb {P} (\theta )} は常に成り立ちここで P ( θ ) {\displaystyle \mathbb {P} (\theta )} を一様分布と仮定すると、 P ( θ ∣ x ) ∼ L ( θ ∣ x ) {\displaystyle \mathbb {P} (\theta \mid x)\sim \mathbb {L} (\theta \mid x)} となってこの最大値推定量はMLEと一致する（c.f. 計量経済学）。

Size:73 KB
出典: フリー百科事典『ウィキペディア（Wikipedia）』
担当:undef