Author Topic: Correct way to handle charsets for console output?  (Read 1276 times)

Offline Ineth

  • Planewalker
  • *****
  • Posts: 4
Correct way to handle charsets for console output?
« on: January 14, 2016, 11:22:38 AM »
Here's the situation (if I'm not mistaken):

  • The original games and the Windows console expect text to be encoded in old-fashioned Windows codepages.
  • The EE games and the Mac and Linux consoles expect text to be encoded in UTF8.

HANDLE_CHARSETS can take care of the discrepancy between the games themselves.
But what about console output, e.g. translated component names that are shown to the user during interactive mod installation?

Before the EE's came along, strings would just be printed to standard output using the Windows encoding, and Mac and Linux users were left to fend for themselves. Not a great solution, but since Mac/Linux users are a minority and the games didn't officially support those platforms anyway, no-one complained too much.

But now things are more complicated... Not just because the EE's upgraded Mac and Linux to officially supported platforms, but also because porting old mods to EE can now cause encoding problems for Windows users as well:

Many mods out in the wild use a combined "setup.tra" (or similar) containing both the component names seen during installation, and item descriptions etc. which will end up in the game's dialog.tlk.
So if I (or the BigWorld/EET guys etc.) want to patch such abandoned mods to make them EE-compatible, I have to add HANDLE_CHARSETS to reload that TRA file, because otherwise the game might misbehave or even crash for non-English users.
But this conversion will also cause non-English Windows users to see scrambled strings instead of component names during installation, right?

Does weidu provide a solution that?

Offline The Imp

  • Planewalker
  • *****
  • Posts: 288
  • Gender: Male
Re: Correct way to handle charsets for console output?
« Reply #1 on: January 14, 2016, 02:18:24 PM »
You use two .tra files, one for the install, aka the replace the .tp2's references, component names etc.
And the other for the in game content, aka the strings in item descriptions, and you use the correct encoding for each.
The English in game content's .tra file is set to utilize the "Encode in ANSI" tag. While the weidu's internal commands use the "Encode in UTF-8 without BOM". These are easy to set this in Notepad++ for example.

Not that you actually have to, if you don't use the special characters as it's fine without them in most cases. And don't code in Russian language. Then you use the specific codex or have the file translated to the code that the HANDLE_CHARSETS then auto translated to the Russian code.
« Last Edit: January 14, 2016, 02:27:43 PM by The Imp »

Offline Argent77

  • Planewalker
  • *****
  • Posts: 187
Re: Correct way to handle charsets for console output?
« Reply #2 on: January 14, 2016, 04:15:59 PM »
Text console output can be very tricky to ensure correct behavior for every platform. You have to take into account various operating systems, regional settings and sometimes even special fonts used for the console on the user's system. Your best bet is probably to limit yourself to the US-ASCII charset (i.e. only characters defined in the lower half of an ANSI charset).

Offline Ineth

  • Planewalker
  • *****
  • Posts: 4
Re: Correct way to handle charsets for console output?
« Reply #3 on: January 14, 2016, 05:01:43 PM »
You use two .tra files, one for the install, aka the replace the .tp2's references, component names etc.

That's what I was afraid of. Because while that's easy to do for new mods, it makes writing patches which port old/unmaintained mods to EE much more burdensome.

Also, it still doesn't improve matters for Mac and Linux users.

Your best bet is probably to limit yourself to the US-ASCII charset (i.e. only characters defined in the lower half of an ANSI charset).

For the English translation of new mods, that's a workable solution. But not for existing mods, and not for non-English translations...

Text console output can be very tricky to ensure correct behavior for every platform. You have to take into account various operating systems, regional settings and sometimes even special fonts used for the console on the user's system.

It's a tricky problem to solve, but not impossible I think. In an ideal world, weidu would figure out all those things about its environment and encode strings accordingly on output to both the console and the game files. And it wouldn't even need a clunky "LAF HANDLE_CHARSETS" to tell it to do the right thing; it would simply Do The Right Thing™ by itself...

Hey, one can dream... :D

Offline Wisp

  • Moderator
  • Planewalker
  • *****
  • Posts: 1176
Re: Correct way to handle charsets for console output?
« Reply #4 on: January 23, 2016, 12:31:49 PM »
Also, it still doesn't improve matters for Mac and Linux users.
Once you've fixed your tra file(s), it becomes fairly trivial to create UTF-8-encoded files for use on *nix (loaded by LANGUAGE with the help of the %WEIDU_OS% variable).

Quote
It's a tricky problem to solve, but not impossible I think. In an ideal world, weidu would figure out all those things about its environment and encode strings accordingly on output to both the console and the game files. And it wouldn't even need a clunky "LAF HANDLE_CHARSETS" to tell it to do the right thing; it would simply Do The Right Thing™ by itself...

Hey, one can dream... :D
The practice of distributing mods with a setup-mymod.exe and the terms WeiDU is distributed under, precludes the use of most third-party code due to license problems.
« Last Edit: January 23, 2016, 06:56:49 PM by Wisp »

 

With Quick-Reply you can write a post when viewing a topic without loading a new page. You can still use bulletin board code and smileys as you would in a normal post.

Warning: this topic has not been posted in for at least 120 days.
Unless you're sure you want to reply, please consider starting a new topic.

Name: Email:
Verification:
Type the letters shown in the picture
Listen to the letters / Request another image
Type the letters shown in the picture:
What color is grass?:
What is the seventh word in this sentence?:
What is five minus two (use the full word)?: