aboutsummaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/README.developer16
1 files changed, 11 insertions, 5 deletions
diff --git a/doc/README.developer b/doc/README.developer
index f371783dd1..57aaed46f6 100644
--- a/doc/README.developer
+++ b/doc/README.developer
@@ -2377,15 +2377,21 @@ order.
For string fields, the encoding specifies the character set used for the
string and the way individual code points in that character set are
encoded. For FT_UINT_STRING fields, the byte order of the count must be
-specified; when support for UTF-16 encoding is added, the byte order of
-the encoding will also have to be specified. In other cases, ENC_NA
-should be used. The character encodings that are currently
-supported are:
+specified; for UCS-2 and UTF-16, the byte order of the encoding must be
+specified (for counted UCS-2 and UTF-16 strings, the byte order of the
+count and the 16-bit values in the string must be the same). In other
+cases, ENC_NA should be used. The character encodings that are
+currently supported are:
- ENC_UTF_8 - UTF-8
ENC_ASCII - ASCII (currently treated as UTF-8; in the future,
all bytes with the 8th bit set will be treated as
errors)
+ ENC_UTF_8 - UTF-8
+ ENC_UCS_2 - UCS-2
+ ENC_UTF_16 - UTF-16 (currently treated as UCS-2; in the future,
+ surrogate pairs will be handled, and non-valid 16-bit
+ code points and surrogate pairs will be treated as
+ errors)
ENC_EBCDIC - EBCDIC
Other encodings will be added in the future.