A EE/tra tutorial based on BG1 NPC code?

Kulyok · July 16, 2014

It seems that the latest beta BG1 NPC has the best and most convenient code when it comes to tra files and converting stuff to utf-8 for EE installations.

I was wondering if you folks would take a few minutes to write short tutorial for us to adapt in other mods? Basically, what files to take and which code to steal.

jastey · July 16, 2014

Not having to support two sets of tra-files is preferable - especially if the mod is still in progress (Not to say - having two sets of tra files in different convertions is a pain in the ***). I am very much interested in this, as well.

AstroBryGuy · July 17, 2014

Sounds like a good idea. That code was written by Isaya, so I'll see if Isaya wants to write it. If not, I understand it well enough to write something up.

Isaya · July 17, 2014

Actually, Kulyok, the code in Xan Friendship is not really different, it's mostly a copy from BG1 NPC. I may even use it as a base to explain how to proceed. I'll see to writing a short guide in the next few weeks.

In a discussion on the same topic on the WeiDU forum, Wisp provided a template for those who prefer adding the two sets of files in the mod. I'd rather suggest his way for most mods, notably those with only a few tra files, as it's easier to implement and does not require check on the operating system and including script depending on it. I could suggest an equivalent script to maintain the utf8 set of files from the initial ones to run before packing the mod in this case.

Kulyok · July 18, 2014

Thing is, we don't have utf-8 files for most of our translations, and I personally don't think I should make them myself, because I'm not familiar with most of the languages and can't quality-control(what does this symbol mean? that one? is it good or is it gibberish?) That's why I was hoping your method would help us out by automatically creating the desired files in utf8.

Jarno Mikkola · July 18, 2014

Thing is, we don't have utf-8 files for most of our translations...

And you can't just take a file and then edit it with Notepad++ to make it into a "utf-8 without BOM" encoded-file ?

argent77 · July 18, 2014

Thing is, we don't have utf-8 files for most of our translations...
And you can't just take a file and then edit it with Notepad++ to make it into a "utf-8 without BOM" encoded-file ?

I think the key problem is determining the right character set of the source files. Different localizations of the games require specific encodings (in some cases they are very exotic and not part of any official standard, e.g. for polish or chinese). Even Notepad++ can't do it automatically in many cases.

Isaya · July 18, 2014

Thing is, we don't have utf-8 files for most of our translations, and I personally don't think I should make them myself, because I'm not familiar with most of the languages and can't quality-control(what does this symbol mean? that one? is it good or is it gibberish?) That's why I was hoping your method would help us out by automatically creating the desired files in utf8.

Actually the process to obtain the UTF8 files is the same as during installation, except that you have to do it for each language, and provide the proper original encoding name. The question of knowing the proper encoding used in BG II (not EE) depending on language is the same whether you convert the files before packing the mod or during installation. At some point, you need to determine the initial encoding.

I know them for western european languages (english, french, spanish, german, italian all use CP1252, portuguese/brazilian probably too), polish (CP1250, probably for czech too) and russian (CP1252) but I'm not able to provide an complete list. Due to the way BG II could handle fonts, to make the game in polish or cyrillic required specific font files. I have no clue which encoding was used to make versions in chinese or korean, for instance (they are available in BGT).

All languages are not yet available in BGEE, for instance russian is only available in the latest beta version (I should try Xan with it).

And you can't just take a file and then edit it with Notepad++ to make it into a "utf-8 without BOM" encoded-file ?

In my experience (just tried now with Notepad++ 6.51, not the latest but not old either), this doesn't work except for the encoding used by Windows on your system.

When I start typing special characters in the default CP1252 (in my country), then tell Notepad++ to convert to UTF8 without BOM, it works. However, if I open a file from the polish translation, encoded in CP1250, the text is still displayed using the character using the equivalent 8 bit code in CP1252. So I see things like superscript 1, 2 or 3 or the reverse ? of spanish instead of the proper polish character. These things remain after conversion to UTF8, instead of the intended polish character.

As Argent77 said, Notepad++ has no way of guessing the encoding of 8 bit characters.

Kulyok · July 19, 2014

Mostly it's the first four, so CP1252 it probably is, plus Polish(1250), plus Russian.

Wisp · July 19, 2014

A thing you need to watch out for is that some translators have mixed several (typically two) character encodings in the same tra file. This was typically done in the olden days, when the text added to dialog.tlk needed to be encoded in, for example, CP1252, while the text displayed during the installation needed to be encoded in the corresponding MS-DOS code page and the modder had put the two kinds of strings into the same TRA file. Later this has become much less common, with translators instead choosing to go with English for the latter text, or simply dropping the diacritical marks from it, but it is still fairly common in the wild (I've personally unscrewed several mods, out of the handful I've done charset work on).

argent77 · July 19, 2014

I've used the following list of character sets as reference when adding the Charset feature to NearInfinity (no liability assumed):

English: CP1252
French: CP1252
German: CP1252
Italian: CP1252
Spanish: CP1252
Polish: BG1: ISO-IR-179 (a supplement of ISO-8859-13), BG2: CP1250
Czech: CP1250
Russian: CP1251
Japanese: Shift JIS, may also be (known as) CP932
Simplified Chinese: CP936
Traditional Chinese: ? (most likely some proprietary charset available in chinese Windows versions only)
Korean: CP949

Kulyok · July 19, 2014

A thing you need to watch out for is that some translators have mixed several (typically two) character encodings in the same tra file. This was typically done in the olden days, when the text added to dialog.tlk needed to be encoded in, for example, CP1252, while the text displayed during the installation needed to be encoded in the corresponding MS-DOS code page and the modder had put the two kinds of strings into the same TRA file. Later this has become much less common, with translators instead choosing to go with English for the latter text, or simply dropping the diacritical marks from it, but it is still fairly common in the wild (I've personally unscrewed several mods, out of the handful I've done charset work on).

That happened with old Russian mods, yeah. I'm watching it in mine(thankfully easy) - all those MS-DOS lines went straight to English.

Sign In

A EE/tra tutorial based on BG1 NPC code?

Recommended Posts

Kulyok

Link to comment

jastey

Link to comment

AstroBryGuy

Link to comment

Isaya

Link to comment

Kulyok

Link to comment

Jarno Mikkola

Link to comment

argent77

Link to comment

Isaya

Link to comment

Kulyok

Link to comment

Wisp

Link to comment

argent77

Link to comment

Kulyok

Link to comment

Archived

Website

Forums

My Activity Streams

Downloads

Gallery