Encoding conversion between the IDE and the engine
In the engine, data is stored and processed internally in UTF-8. Command-line C
programs take ANSI parameters, but they handle data in UTF-8. For ANSI data files, you
can use the -e
switch to specify the encoding, whereupon
the system converts that to UTF-8 for internal process. Because it opens/closes a file,
the file names must be in ANSI (Windows API only takes ANSI or UTF-16).
Each thread has an encoding which controls the conversion between the external format and UTF-8. An XML encoding mode permits the external encoding to be determined from the XML encoding declaration.
StdIn and StdOut are used when the IDE passes the command-line parameters
to the engine or when the engine outputs data to the console/IDE. In the system, the
engine always outputs UTF8 data to StdOut if the data originates from a message being
processed. The console codepage on Windows is the ANSI codepage, but it can be UTF8 on
Linux. The data file can be in a non-UTF8 encoding. This is controlled by the -e
switch.