Unicode#

The default is "utf-8". The value should be a charset registered with the Internet Assigned Numbers Authority IANA. "ASCII" (or "ANSI_X3.4-1968") is a 7 bit character set.

It is mistaked with IBM850 sometimes, which is called also "DOS ASCII". Java cannot map 8 bit or even multibyte characters (UTF-8, Latin1, Latin2 ...) into 7 bit ASCII.

MSDN Windows Codepages | Wikipedia Windows Codepages

Unicode benutzen#

  1. Font muss das Zeichen beinhalten (z.B. "Arial Unicode MS")
  2. Word: Hexadecimalcode eingeben und <Alt>+C drücken
  3. Generell: Alt + Decimalcode (z.B. "937" = Ω)

Hexadecimalcode de Zeichen siehe UTF-8 Codepage

Enter Unicode | Enter unicode2

Beispielzeichen: 缃考純৴৳۞ă

Abap:

  DATA: in1   TYPE REF TO cl_abap_conv_in_ce.
  in1 = cl_abap_conv_in_ce=>create( encoding = 'UTF-8' input = iv_uv_xml ).
  in1->read( IMPORTING data = uv_xml ).

Big / Little Endian#

44 00 69 00 65 00 D i e UTF-16LE / UCS-2LELittle EndianBOM am Dateianfang = FF FE
00 44 00 69 00 65 D i eUTF-16BE / UCS-2BEBig EndianBOM am Dateianfang = FE FF

Byte Order Mark (BOM)#

UTF-8 EF BB BF
UTF-16 (BE) FE FF
UTF-16 (LE) FF FE
UTF-32 (BE) 00 00 FE FF
UTF-32 (LE) FF FE 00 00
UTF-7 2B 2F 76, und ein Zeichen aus: [38]
UTF-1 F7 64 4C
UTF-EBCDIC DD 73 66 73
SCSU 0E FE FF
BOCU-1 FB EE 28 optional gefolgt von FF
GB 18030 84 31 95 33