Author Topic: GET_FILE_ARRAY and ACTION_BASH_FOR choke on filenames with special characters  (Read 2739 times)

Offline pro5

  • Planewalker
  • *****
  • Posts: 68
Such as those currently used as filename prefixes in Infinity Animations (¢A¢E1A1.BAM, £aag2e.bam, Øcpg13.bam, etc.)

This:
Code: [Select]
GET_FILE_ARRAY ~FooFiles1~ ~p5test/anim/src~ ~.*~

ACTION_PHP_EACH FooFiles1 AS String => Filename BEGIN
PRINT ~%Filename%~
END
works as expected for "normal" filenames, but with IA files it produces errors like this:
Quote
ERROR: Unix.Unix_error(20, "stat", "p5test/anim/src/?A?E1A1.BAM")
« Last Edit: March 01, 2014, 02:09:53 PM by pro5 »

Offline Wisp

  • Moderator
  • Planewalker
  • *****
  • Posts: 1176
What OS are you running this on?

Offline pro5

  • Planewalker
  • *****
  • Posts: 68
Windows 7 x64 SP1, codepage 1251.
« Last Edit: February 23, 2014, 05:44:23 AM by pro5 »

Offline Wisp

  • Moderator
  • Planewalker
  • *****
  • Posts: 1176
Are you sure the filenames themselves are encoded in CP1251? Code works on my CP1252 Win 7 and AFAIK, WeiDU doesn't do anything that would cause it to trip on non-ASCII characters on Windows (Linux is a different matter).

Offline pro5

  • Planewalker
  • *****
  • Posts: 68
Error happens with filenames containing characters not present in character table for current (CP1251 in my case) code page.

For example, file µbrghe.bam is processed without error (symbol µ is in the table with the same code as in CP1252), while Æaag31e.bam produces described error, as there's no Æ symbol in CP1251. So in theory, you should be able to replicate this error by renaming a file to include any symbol not in CP1252.

The filenames in Windows are encoded in Unicode, I can see the correct names while browsing.

Offline Wisp

  • Moderator
  • Planewalker
  • *****
  • Posts: 1176
Error happens with filenames containing characters not present in character table for current (CP1251 in my case) code page.
There is nothing I or WeiDU can do here, then. You'll need to badger Miloch into completing the work on ASCII-fying IA, change your local code page to CP1252 while working with these files, or rename them into something that is valid under CP1251.

encoded in Unicode
Not on Windows, they're not. Windows does not use Unicode (rather than the sensible choice of Unicode, Windows uses a series of local charsets they collectively refer to as "ANSI". Included are CP1252, CP1251 and others).

Offline Miloch

  • Barbarian
  • Planewalker
  • *****
  • Posts: 1032
  • Gender: Male
There is nothing I or WeiDU can do here, then. You'll need to badger Miloch into completing the work on ASCII-fying IA
Yeah, he's already done that. Would that badgering me could also give me extra time...

For what it's worth (in the interim anyway), I've never had any issues with WeiDUing any of those characters on CP-1252. Not a great solution for those on other codepages, and I agree it needs changing, but it requires more work than it appears on the surface.

Offline pro5

  • Planewalker
  • *****
  • Posts: 68
Not on Windows, they're not. Windows does not use Unicode (rather than the sensible choice of Unicode, Windows uses a series of local charsets they collectively refer to as "ANSI". Included are CP1252, CP1251 and others).

Sorry, but you're mistaken. :)

Try changing the codepage and see for yourself how non-CP1252 symbol filenames will display, or just do some reading: http://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows.

Codepages are used for non-unicode files and programs, not the filesystem.
« Last Edit: March 02, 2014, 04:37:14 AM by pro5 »

Offline The Imp

  • Planewalker
  • *****
  • Posts: 288
  • Gender: Male
Not on Windows, they're not. Windows does not use Unicode (rather than the sensible choice of Unicode, Windows uses a series of local charsets they collectively refer to as "ANSI". Included are CP1252, CP1251 and others).
Sorry, but you're mistaken. :)

... changing the codepage and see for yourself how non-ANSI filenames will display, or just do some reading: http://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows

Codepages are used for non-unicode text files, not the filesystem.
Erhm, if we just insert the small "non-" word in the Wisp's quote, we see him as correct... the point being that the code page thing is there to standardize not just the look, but also how it SHOULD work independently of the codepage used... but as the jerks would have it, they(Microsoft!) failed at the base creation of a proper standard. But then again, we wouldn't know any of this if they hadn't... and/or if our system would not be so complex.

EDIT: I just looked one of the files in IA archives with Notepad++, and how was I not surprised when I saw that it had "µcd /elemental_earth_iwd" in the Encoded .2da file that used the "Encode ANSI" feature. So basically you are both wrong, there's a base miss communication at one of the steps and the very same thing is referred as ANSI and non-ANSI depending on the users preferred term, or dick. ;D

PS the original error might be there because the .2da file is encoded using the ANSI, if it's the cp1252... while it should be in the Cyrillic -> Windows-1251 (for you pro5)
The files:
¢A¢E1A1.BAM, £aag2e.bam, Øcpg13.bam
Become:
ўAўE1A1.BAM, Јaag2e.bam, Шcpg13.bam

Of course the ANSI might just mean "non-ANSI" in a shorter form, for concise sake.
« Last Edit: March 02, 2014, 03:52:25 AM by The Imp »

Offline pro5

  • Planewalker
  • *****
  • Posts: 68
Corrected "non-ANSI filenames" to "non-CP1252 filenames" in my previous post - that is what I meant. But that is irrelevant for my point there: Windows does use Unicode, and file names are encoded in Unicode in the file system, not using current ANSI codepage.

The contents of the file, or how said file contents is encoded (Unicode, UTF-8, KOI-8, CP866, CP-125x or whatever else) is a different matter which has no effect on whether the error in original post happens, it's just the filename that's giving WeiDU trouble.
« Last Edit: March 02, 2014, 04:42:01 AM by pro5 »

Offline Wisp

  • Moderator
  • Planewalker
  • *****
  • Posts: 1176
Sorry, but you're mistaken. :)

Try changing the codepage and see for yourself how non-CP1252 symbol filenames will display, or just do some reading: http://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows.

Codepages are used for non-unicode files and programs, not the filesystem.
Okay, there is something of Unicode in Windows. I did not know that.

However, consider this: IA et al. work on CP1252 systems. IA et al. do not work on systems using other code pages, but can be made to work on those systems simply by switching them over to CP1252 for the duration. Clearly the code page does matter. Additionally, WeiDU/OCaml does absolutely nothing with character encodings and probably just deals with byte sequences. The error here is in how Windows and OCaml talk (specifically, the error occurs when the result of one system call is passed as an argument to another system call) and is out of my hands.

 

With Quick-Reply you can write a post when viewing a topic without loading a new page. You can still use bulletin board code and smileys as you would in a normal post.

Warning: this topic has not been posted in for at least 120 days.
Unless you're sure you want to reply, please consider starting a new topic.

Name: Email:
Verification:
Type the letters shown in the picture
Listen to the letters / Request another image
Type the letters shown in the picture:
What color is grass?:
What is the seventh word in this sentence?:
What is five minus two (use the full word)?: