バイト順マーク - 暇つぶしWikipedia

バイト順マーク

Unicode
文字符号化スキーム
 UTF-7
UTF-8
CESU-8
UTF-16
UTF-32
UTF-EBCDIC
SCSU
Punycode (IDN/IDNA)
GB 18030
その他
UCS
マッピング
 書字方向
BOM
漢字統合
UnicodeとHTML
Unicodeと電子メール
Unicodeフォント
.mw-parser-output .hlist ul,.mw-parser-output .hlist ol{padding-left:0}.mw-parser-output .hlist li,.mw-parser-output .hlist dd,.mw-parser-output .hlist dt{margin-right:0;display:inline-block;white-space:nowrap}.mw-parser-output .hlist dt:after,.mw-parser-output .hlist dd:after,.mw-parser-output .hlist li:after{white-space:normal}.mw-parser-output .hlist li:after,.mw-parser-output .hlist dd:after{content:" ・\a0 ";font-weight:bold}.mw-parser-output .hlist dt:after{content:": "}.mw-parser-output .hlist-pipe dd:after,.mw-parser-output .hlist-pipe li:after{content:" |\a0 ";font-weight:normal}.mw-parser-output .hlist-hyphen dd:after,.mw-parser-output .hlist-hyphen li:after{content:" -\a0 ";font-weight:normal}.mw-parser-output .hlist-comma dd:after,.mw-parser-output .hlist-comma li:after{content:"、";font-weight:normal}.mw-parser-output .hlist-slash dd:after,.mw-parser-output .hlist-slash li:after{content:" /\a0 ";font-weight:normal}.mw-parser-output .hlist dd:last-child:after,.mw-parser-output .hlist dt:last-child:after,.mw-parser-output .hlist li:last-child:after{content:none}.mw-parser-output .hlist dd dd:first-child:before,.mw-parser-output .hlist dd dt:first-child:before,.mw-parser-output .hlist dd li:first-child:before,.mw-parser-output .hlist dt dd:first-child:before,.mw-parser-output .hlist dt dt:first-child:before,.mw-parser-output .hlist dt li:first-child:before,.mw-parser-output .hlist li dd:first-child:before,.mw-parser-output .hlist li dt:first-child:before,.mw-parser-output .hlist li li:first-child:before{content:" (";font-weight:normal}.mw-parser-output .hlist dd dd:last-child:after,.mw-parser-output .hlist dd dt:last-child:after,.mw-parser-output .hlist dd li:last-child:after,.mw-parser-output .hlist dt dd:last-child:after,.mw-parser-output .hlist dt dt:last-child:after,.mw-parser-output .hlist dt li:last-child:after,.mw-parser-output .hlist li dd:last-child:after,.mw-parser-output .hlist li dt:last-child:after,.mw-parser-output .hlist li li:last-child:after{content:")\a0 ";font-weight:normal}.mw-parser-output .hlist ol{counter-reset:listitem}.mw-parser-output .hlist ol>li{counter-increment:listitem}.mw-parser-output .hlist ol>li:before{content:" "counter(listitem)" ";white-space:nowrap}.mw-parser-output .hlist dd ol>li:first-child:before,.mw-parser-output .hlist dt ol>li:first-child:before,.mw-parser-output .hlist li ol>li:first-child:before{content:" ("counter(listitem)" "}.mw-parser-output .navbar{display:inline;font-size:75%;font-weight:normal}.mw-parser-output .navbar-collapse{float:left;text-align:left}.mw-parser-output .navbar-boxtext{word-spacing:0}.mw-parser-output .navbar ul{display:inline-block;white-space:nowrap;line-height:inherit}.mw-parser-output .navbar-brackets::before{margin-right:-0.125em;content:"[ "}.mw-parser-output .navbar-brackets::after{margin-left:-0.125em;content:" ]"}.mw-parser-output .navbar li{word-spacing:-0.125em}.mw-parser-output .navbar-mini abbr{font-variant:small-caps;border-bottom:none;text-decoration:none;cursor:inherit}.mw-parser-output .navbar-ct-full{font-size:114%;margin:0 7em}.mw-parser-output .navbar-ct-mini{font-size:114%;margin:0 4em}.mw-parser-output .infobox .navbar{font-size:88%}.mw-parser-output .navbox .navbar{display:block;font-size:88%}.mw-parser-output .navbox-title .navbar{float:left;text-align:left;margin-right:0.5em}

表

話

編

歴

バイト順マーク (バイトじゅんマーク、英: byte order mark) 、バイトオーダーマークあるいはBOM（ボム）は、Unicodeの符号化形式で符号化したテキストの先頭につける数バイトのデータ。元にUnicodeで符号化されていることおよび符号化の種類の判別に使用する。
概要

プログラムがテキストデータを読み込む時、その先頭の数バイトからそのデータがUnicodeで表現されていること、また符号化形式（エンコーディング）としてどれを使用しているかを判別できるようにしたものである。[1]
経緯

Unicodeが開発された当初は、アメリカでは ASCII、ヨーロッパなどではISO-8859、日本ではShift_JISやEUC-JPといった他の文字コードが主流であり、使用されている符号化方式がUnicodeのものであることを明示する必要があった。また、Unicodeの符号化方式は複数あり、特にUTF-16やUTF-32にはそれぞれエンディアンが異なる2種類があるため、符号化方式同士を区別する必要があった。その方法として、先頭のデータにテキスト以外のデータを入れることが発案された。
使用するべきか否か

UTF-8は文字コードとしてASCIIを前提としたプログラムでもおよそ支障なく動作するように設計されているが、BOMによって正常に処理できなくなる場合がある。Unicodeの規格において、UTF-8においてBOMは容認されるが、必須でも勧められるものでもないとされている [4]。また、データベースやメモリにロードするデータなど、内部的なデータ形式では、プログラムの性能や効率の観点から普通BOMは用いられない。

BOMによってUnicodeのテキストデータが他のUnicode符号化形式や、BOMのバイト表現（UTF-7を除く）に符号位置に該当する文字のない日本語の文字コードから正確に区別をすることができる一方で、0xFEに"t"、0xFFに"y"が割り当てられているISO/IEC 8859-1に対しては、この2文字が先頭にくる文章を誤ってUnicodeと判断してしまう問題がある。
各符号化形式（符号化スキーム）ごとのバイト順マーク

符号化形式（符号化スキーム）エンディアンの区別バイト順マーク (BOM)
UTF-80xEF 0xBB 0xBF
UTF-16BE0xFE 0xFF
LE0xFF 0xFE
UTF-16BE（付加は認められない）
UTF-16LE（付加は認められない）
UTF-32BE0x00 0x00 0xFE 0xFF
LE0xFF 0xFE 0x00 0x00
UTF-32BE（付加は認められない）
UTF-32LE（付加は認められない）
UTF-70x2B 0x2F 0x76 ※ （※は次のバイトの値によって異なり、0x38、0x39、0x2B、0x2Fのいずれかがくる）

関連項目

ゼロ幅ノーブレークスペース - U+FEFF の文字
 脚注 ^ “ ⇒Unicode FAQ”. 2012年7月25日閲覧。
^ “ ⇒RFC [https://datatracker.ietf.org/doc/html/rfc3023 3023 - XML Media Types]”. 2012年7月25日閲覧。
^ 8.1. Character Encoding - STD 90 - The JavaScript Object Notation (JSON) Data Interchange Format
^ the Unicode Consortium, Julie D. Allen (2007). ⇒The Unicode Standard -- Version 5.0. p. 36. ISBN 0-321-48091-0. ⇒http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf. "(from Chapter 2:General Structure) Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature"

Size:28 KB
出典: フリー百科事典『ウィキペディア（Wikipedia）』
担当:undef