Author Topic: EOL matcher in regular expressions doesn't match CRLF line breaks correctly  (Read 131 times)

Offline Argent77

  • Planewalker
  • *****
  • Posts: 147
The following code always returns -1 (i.e. no match) when Windows-style line breaks are involved:
Code: [Select]
OUTER_TEXT_SPRINT nl ~%WNL%~   // replace %WNL% by %LNL% to make it work
OUTER_TEXT_SPRINT string ~First%nl%Second%nl%~

OUTER_SET ofs = INDEX("^First$" ~%string%~)
PRINT ~String: first offset = %ofs%~
OUTER_SET ofs = INDEX("^Second$" ~%string%~)
PRINT ~String: second offset = %ofs%~

All regex-related WeiDU commands I have tested so far behaved the same way. It appears that '$' does only match LF correctly. I was able to make it work by adding "%MNL%" to the search strings (which resolves into CR).

Offline Wisp

  • Moderator
  • Planewalker
  • *****
  • Posts: 907
Known issue. OCaml's $ only matches LF. OCaml's philosophy on the subject is that when you read a file, all line ends should be converted to LF, so you never work with other line ends in code. WeiDU does not do this, and specifically uses CRLF in places, resulting in this issue.

I've mentioned it before, but I've been thinking of attempting to resolve this by reading everything into LF, writing into native line ends and transparently converting all not-LFs in match-regexps to LF. Reading stuff into LF breaks backward-compatibility unless you also do something about regexps that are matched against the buffer contents. I'm just not sure it's sufficiently robust.

 

With Quick-Reply you can write a post when viewing a topic without loading a new page. You can still use bulletin board code and smileys as you would in a normal post.

Name: Email:
Verification:
Type the letters shown in the picture
Listen to the letters / Request another image
Type the letters shown in the picture:
What color is grass?:
What is the seventh word in this sentence?:
What is five minus two (use the full word)?: