Encoding conversion between the IDE and the engine

In the engine, data is stored and processed internally in UTF-8. Command-line C programs take ANSI parameters, but they handle data in UTF-8. For ANSI data files, you can use the -e switch to specify the encoding, whereupon the system converts that to UTF-8 for internal process. Because it opens/closes a file, the file names must be in ANSI (Windows API only takes ANSI or UTF-16).

Each thread has an encoding which controls the conversion between the external format and UTF-8. An XML encoding mode permits the external encoding to be determined from the XML encoding declaration.

StdIn and StdOut are used when the IDE passes the command-line parameters to the engine or when the engine outputs data to the console/IDE. In the system, the engine always outputs UTF8 data to StdOut if the data originates from a message being processed. The console codepage on Windows is the ANSI codepage, but it can be UTF8 on Linux. The data file can be in a non-UTF8 encoding. This is controlled by the -e switch.