Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  

Author Topic: Accented characters in item names  (Read 1342 times)

Dirtcopter77

  • Bay Watcher
    • View Profile
Accented characters in item names
« on: June 19, 2022, 02:43:24 pm »

I'm working on a raw overhaul for personal use and part of that is making some fixes/changes to abacá trees, one of which is putting the diacritic in the name. I know raws use unicode and DF uses CP437 and you have to sort of translate between them if you want to put special characters in strings, but I managed to find the correct unicode character to input to get the á in-game (or so I thought?)-- the problem is that for some reason an extra character (┬) appears before it. Does anyone know what's causing this "ghost" character to insert itself? I've included a picture so you can see what I'm talking about.

Logged

Ziusudra

  • Bay Watcher
    • View Profile
Re: Accented characters in item names
« Reply #1 on: June 19, 2022, 03:55:49 pm »

That's because the raws are not encoded in Unicode, they're also encoded in cp437. The ┬ appears because in Unicode á is a 2 byte code point and the game is showing both bytes as separate characters.

Edit: in utf-8 á is 0xC3 0xA1 which in cp437 are ┬ and á.
« Last Edit: June 19, 2022, 04:32:46 pm by Ziusudra »
Logged
Ironblood didn't use an axe because he needed it. He used it to be kind. And right now he wasn't being kind.

Dirtcopter77

  • Bay Watcher
    • View Profile
Re: Accented characters in item names
« Reply #2 on: June 19, 2022, 05:38:18 pm »

That's because the raws are not encoded in Unicode, they're also encoded in cp437. The ┬ appears because in Unicode á is a 2 byte code point and the game is showing both bytes as separate characters.

Edit: in utf-8 á is 0xC3 0xA1 which in cp437 are ┬ and á.

I realized I could just copy-paste the á in ngalák in dwarf.txt and that worked, although it does feel a bit cheaty. Thank you for the clarification, though-- the raw file I made was still using the default utf-8 encoding and that was part of the problem.
Logged