Unicode#
The default is "utf-8". The value should be a charset registered with the Internet Assigned Numbers Authority IANA
.
"ASCII" (or "ANSI_X3.4-1968") is a 7 bit character set.
It is mistaked with IBM850 sometimes, which is called also "DOS ASCII". Java cannot map 8 bit or even multibyte characters (UTF-8, Latin1, Latin2 ...) into 7 bit ASCII.
MSDN Windows Codepages
| Wikipedia Windows Codepages
Unicode benutzen#
- Font muss das Zeichen beinhalten (z.B. "Arial Unicode MS")
- Word: Hexadecimalcode eingeben und <Alt>+C drücken
- Generell: Alt + Decimalcode (z.B. "937" = Ω)
Hexadecimalcode de Zeichen siehe
UTF-8 Codepage
Enter Unicode
|
Enter unicode2
Beispielzeichen: 缃考純৴৳۞ă
Abap:
DATA: in1 TYPE REF TO cl_abap_conv_in_ce. in1 = cl_abap_conv_in_ce=>create( encoding = 'UTF-8' input = iv_uv_xml ). in1->read( IMPORTING data = uv_xml ).
Big / Little Endian#
| 44 00 69 00 65 00 | D i e | UTF-16LE / UCS-2LE | Little Endian | BOM am Dateianfang = FF FE |
| 00 44 00 69 00 65 | D i e | UTF-16BE / UCS-2BE | Big Endian | BOM am Dateianfang = FE FF |
Byte Order Mark (BOM
)#
| UTF-8 | EF BB BF |
| UTF-16 (BE) | FE FF |
| UTF-16 (LE) | FF FE |
| UTF-32 (BE) | 00 00 FE FF |
| UTF-32 (LE) | FF FE 00 00 |
| UTF-7 | 2B 2F 76, und ein Zeichen aus: [38] |
| UTF-1 | F7 64 4C |
| UTF-EBCDIC | DD 73 66 73 |
| SCSU | 0E FE FF |
| BOCU-1 | FB EE 28 optional gefolgt von FF |
| GB 18030 | 84 31 95 33 |