PSOLA - 暇つぶしWikipedia

PSOLA

[Wikipedia|▼Menu]

PSOLA (Pitch Synchronous Overlap and Add; ピッチ同期重畳加算) は「ピッチに基づいた音声の分割・変換・再合成」をおこなう音声処理の枠組み [1]である。ピッチ同期波形重畳法 [2]とも。

PSOLAを採用した音声処理ではスペクトル包絡/フォルマントを保ったまま音高や持続時間(テンポ)を変更できる。

PSOLAは次の3つの段階（分析・変換・再合成）からなる [3]。
分析: 信号を短い区間の集合へ変換 [4]。区間長は可変、短時間でのピッチに同期（Pitch-Synchronous）[5]

変換: 区間ごとあるいは区間単位で操作

再合成: 重畳加算（OverLap-Add）

分析では、対象の音声波形がもつ周期(ピッチ)と同期した分析窓を用い [6]、互いにオーバーラップした短い断片/区間（基本周期の2倍程度 [6]）に分割する。

変換例として、信号のピッチを下げるには断片を互いに遠ざけ、ピッチを上げるには互いに近付けて断片を再配置する。断片を離す/重ねる結果として信号長/持続時間が変化するため、次の補正を行う [6]。信号の持続時間を長くするには引き続き同じ断片を複数回繰り返し、短くするにはいずれかの断片を間引きする。

変換された断片は重畳加算法 (英語: overlap-add) で結合され信号が再合成される。

PSOLAを採用しかつ操作が時間領域でおこなわれるアルゴリズムはTD-PSOLAと総称され、また周波数領域でおこなわれるアルゴリズムはFD-PSOLAと総称される [7]。

PSOLAは音声信号の韻律 (英語: prosody) の変更に使用できる。
関連項目

 タイムストレッチ/ピッチシフト

 重畳加算法

 波形接続型音声合成

 フェーズボコーダ（一定区間長/STFTで分析）

参考文献

 Eric Moulines; Francis Charpentier (December 1990), “Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones”, Speech Communication 9: 453–467, doi:10.1016/0167-6393(90)90021-Z

Eric Moulines; Jean Laroche (February 1995), “Non-parametric techniques for pitch-scale and time-scale modification of speech”, Speech Communication 16 (2), doi:10.1016/0167-6393(94)00054-E
^ 特定のアルゴリズムではなく、流れ・型である。 "a pitch-synchronous overlap-add (PSOLA) approach ... we first present the common PSOLA framework" MOULINES, et al. (1990).
^ 板橋秀一 (2005), 音声工学, 森北出版, p. 169, .mw-parser-output cite.citation{font-style:inherit;word-wrap:break-word}.mw-parser-output .citation q{quotes:"\"""\"""'""'"}.mw-parser-output .citation.cs-ja1 q,.mw-parser-output .citation.cs-ja2 q{quotes:"「""」""『""』"}.mw-parser-output .citation:target{background-color:rgba(0,127,255,0.133)}.mw-parser-output .id-lock-free a,.mw-parser-output .citation .cs1-lock-free a{background:url("//upload.wikimedia.org/wikipedia/commons/6/65/Lock-green.svg")right 0.1em center/9px no-repeat}.mw-parser-output .id-lock-limited a,.mw-parser-output .id-lock-registration a,.mw-parser-output .citation .cs1-lock-limited a,.mw-parser-output .citation .cs1-lock-registration a{background:url("//upload.wikimedia.org/wikipedia/commons/d/d6/Lock-gray-alt-2.svg")right 0.1em center/9px no-repeat}.mw-parser-output .id-lock-subscription a,.mw-parser-output .citation .cs1-lock-subscription a{background:url("//upload.wikimedia.org/wikipedia/commons/a/aa/Lock-red-alt-2.svg")right 0.1em center/9px no-repeat}.mw-parser-output .cs1-ws-icon a{background:url("//upload.wikimedia.org/wikipedia/commons/4/4c/Wikisource-logo.svg")right 0.1em center/12px no-repeat}.mw-parser-output .cs1-code{color:inherit;background:inherit;border:none;padding:inherit}.mw-parser-output .cs1-hidden-error{display:none;color:#d33}.mw-parser-output .cs1-visible-error{color:#d33}.mw-parser-output .cs1-maint{display:none;color:#3a3;margin-left:0.3em}.mw-parser-output .cs1-format{font-size:95%}.mw-parser-output .cs1-kern-left{padding-left:0.2em}.mw-parser-output .cs1-kern-right{padding-right:0.2em}.mw-parser-output .citation .mw-selflink{font-weight:inherit}ISBN 9784627828117
^ "The PSOLA synthesis scheme involves the three following steps: an analysis of the original speech waveform ... modifications brought to this intermediate representation ... the synthesis of the modified signal from the modified intermediate representation" MOULINES, et al. (1990). PITCH-SYNCHRONOUS WAVEFORM PROCESSING TECHNIQUES FOR TEXT-TO-SPEECH SYNTHESIS USING DIPHONES.
^ "consists of a sequence of short-term signals xm(n)" MOULINES, et al. (1990).
^ "at a pitch-synchronous rate on the voiced portions of the signal and at a constant rate on the unvoiced portions." MOULINES, et al. (1990).
^ a b c R. Kortekaas; A. Kohlrausch (1997), ⇒“Psychoacoustical Evaluation of the Pitch-Synchronous Overlap-and-Add Speech-Waveform Manipulation Technique Using Single-Formant Stimuli”, Journal of the Acoustical Society of America (JASA) 101 (4): 2202–2213, ⇒http://alexandria.tue.nl/repository/freearticles/622042.pdf
^ "The modifications of the speech signal are performed either in the frequency domain (FD-PSOLA) ... or directly in the time domain (TD-PSOLA)" MOULINES, et al. (1990).

外部リンク

⇒Changing Pitch with PSOLA for Voice Conversion (英語)

⇒A thesis that discusses PSOLA with diagrams (PDF, 英語); 35ページ参照(PDF上の44ページ目)

表

話

編

歴
 音声合成
モデル / 手法

物理モデル

 ソースフィルタモデル

スペクトルモデル

波形接続合成

 フォルマント合成

記事の検索

おまかせリスト

▼オプションを表示

ブックマーク登録

mixiチェック！

Twitterに投稿

ｵﾌﾟｼｮﾝ/ﾘﾝｸ一覧

話題のニュース

列車運行情報

暇つぶしWikipedia

Size:17 KB
出典: フリー百科事典『ウィキペディア（Wikipedia）』
担当:undef