Unicode#
The default is "utf-8". The value should be a charset registered with the Internet Assigned Numbers Authority IANA. "ASCII" (or "ANSI_X3.4-1968") is a 7 bit character set.
It is mistaked with IBM850 sometimes, which is called also "DOS ASCII". Java cannot map 8 bit or even multibyte characters (UTF-8, Latin1, Latin2 ...) into 7 bit ASCII.
MSDN Windows Codepages | Wikipedia Windows Codepages
Unicode benutzen#
- Font muss das Zeichen beinhalten (z.B. "Arial Unicode MS")
- Word: Hexadecimalcode eingeben und <Alt>+C drücken
- Generell: Alt + Decimalcode (z.B. "937" = Ω)
Hexadecimalcode de Zeichen siehe UTF-8 Codepage
Enter Unicode | Enter unicode2
Beispielzeichen: 缃考純৴৳۞ă
Abap:
DATA: in1 TYPE REF TO cl_abap_conv_in_ce. in1 = cl_abap_conv_in_ce=>create( encoding = 'UTF-8' input = iv_uv_xml ). in1->read( IMPORTING data = uv_xml ).
Big / Little Endian#
44 00 69 00 65 00 | D i e | UTF-16LE / UCS-2LE | Little Endian | BOM am Dateianfang = FF FE |
00 44 00 69 00 65 | D i e | UTF-16BE / UCS-2BE | Big Endian | BOM am Dateianfang = FE FF |
Byte Order Mark (BOM)#
UTF-8 | EF BB BF |
UTF-16 (BE) | FE FF |
UTF-16 (LE) | FF FE |
UTF-32 (BE) | 00 00 FE FF |
UTF-32 (LE) | FF FE 00 00 |
UTF-7 | 2B 2F 76, und ein Zeichen aus: [38] |
UTF-1 | F7 64 4C |
UTF-EBCDIC | DD 73 66 73 |
SCSU | 0E FE FF |
BOCU-1 | FB EE 28 optional gefolgt von FF |
GB 18030 | 84 31 95 33 |