Post reply

Warning: this topic has not been posted in for at least 120 days.
Unless you're sure you want to reply, please consider starting a new topic.
Name:
Email:
Subject:
Message icon:

Verification:
Type the letters shown in the picture
Listen to the letters / Request another image

Type the letters shown in the picture:
What color is grass?:
What is the seventh word in this sentence?:
What is five minus two (use the full word)?:

shortcuts: hit alt+s to submit/post or alt+p to preview


Topic Summary

Posted by: Wisp
« on: May 22, 2018, 05:25:06 PM »

I'm trying out some ideas for implementations that don't use Hashtbl.
Well, with the way LOCAL variables and macros work, any effort is more or less doomed to re-implement Hashtbl, so I'll just use the one from 4.03.0.
Posted by: Wisp
« on: May 21, 2018, 10:51:39 AM »

Disclaimer: Not knowning anything about anything...
I wonder if `Hashtbl.randomize` would help here?
This behavior reminds me of an "accidentally quadratic" issue that rust had when resizing hash tables.
Hashtbl is implemented on top of Array, which is itself largely implemented in C. Up until 4.04.0, Hashtbl was an array of primitives and copying one was as simple as copying the array (which is probably just a bit of C for allocating and copying memory). After that, the implementation of Hashtbl was changed to be an array of records, and to copy it, you need to walk the array and copy each record, in addition to copying the array structure itself. The latter implementation is simply more work.
Posted by: Wisp
« on: May 21, 2018, 10:46:38 AM »

I'd rather not see WeiDU internals made visible to the user/modder (via TP2 flag or otherwise). It would be a hack at best, and difficult to get rid of after fixing the root cause. Is there any way to replace the Hashtbl.copy calls (without too much effort)?
I don't see it as making the internals visible. It's (probably) an optimisation that breaks current functionality. Creating a new hash table is (probably) always going to be faster than copying an existing one.

I could also lift the implementation of Hashtbl from OCaml 4.03.0 and include it in WeiDU's source tree. It seems to work well; performance of WeiDU+4.06.0 does not differ significantly from that of plain +4.03.0.

I'm trying out some ideas for implementations that don't use Hashtbl.
Posted by: aqrit
« on: May 19, 2018, 11:52:27 AM »

Disclaimer: Not knowning anything about anything...
I wonder if `Hashtbl.randomize` would help here?
This behavior reminds me of an "accidentally quadratic" issue that rust had when resizing hash tables.
Posted by: Argent77
« on: May 18, 2018, 12:40:01 PM »

I'd rather not see WeiDU internals made visible to the user/modder (via TP2 flag or otherwise). It would be a hack at best, and difficult to get rid of after fixing the root cause. Is there any way to replace the Hashtbl.copy calls (without too much effort)?
Posted by: Wisp
« on: May 18, 2018, 11:48:19 AM »

Yeah, so these are the (most significant) timings of Initialise on my machine with WeiDU 245+4.03.0:
Code: [Select]
COPY                             1.209
READ_*                           4.036
eval_pe                          8.723
process_patch2                  18.244
function overhead               70.090
TOTAL                          104.303
Pretty grim, but whatever. Then we try 245+4.04.0:
Code: [Select]
COPY                             1.894
READ_*                           6.195
eval_pe                         18.201
process_patch2                  32.231
function overhead              643.688
TOTAL                          704.793
Yeah...
Breaking out the Hashtbl.copy that is used to set up the in-function variable environment (still 245+4.04.0):
Code: [Select]
COPY                             1.849
READ_*                           6.232
eval_pe                         18.207
process_patch2                  31.968
function overhead              125.434
copying hash tables            555.494
TOTAL                          741.566
So we've found the culprit.

It seems like a compelling case could be made for staying on 4.03.0 for the time being. You could possibly work around the issue with Hashtbl.copy by changing functions to enter into an empty variable scope, rather than one copied from the function's parent scope, but it'd be a breaking change, so it'd have to be opt-in (say, TP2 flag). But there is also the matter of 245+4.04.0 being slower overall than 245+4.03.0, not just on function overhead, but stuff like eval_pe and process_patch2, though maybe that time is from one or more of the other Hashtbl.copies found in WeiDU.
Posted by: Wisp
« on: May 17, 2018, 04:07:01 PM »

Seems to be caused by a change to the implementation of hash tables. From the change log of 4.04.0:
Quote
Optimize Hashtbl by using in-place updates of its internal bucket lists. All operations run in constant stack size and are usually faster, except Hashtbl.copy which can be much slower
(And the bisect stops at seemingly related commits.) WeiDU does a number of Hashtbl.copys, notably any time variable scopes change (e.g., functions). OCaml probably would not accept this as a regression, given that the change seems to be intentional.
Posted by: Wisp
« on: May 16, 2018, 05:16:30 PM »

Heh, on 64-bit Linux, 4.03.0 reduced install time by about 5 % compared to 4.02.3 (obviously this does not mean it can't be different on other systems). I'll continue with later versions tomorrow.
Posted by: Argent77
« on: May 16, 2018, 03:58:20 PM »

64-bit macOS and Windows binaries were built with OCaml 4.03. 4.02 and earlier may be slightly faster (less than 10%), but I have noticed it only by inspecting the timings from the SCS logs.
Posted by: Wisp
« on: May 16, 2018, 03:38:25 PM »

I compile with 4.02 or 4.01. Argent77 said 4.03 exhibited a smaller performance penalty (than 4.05, which is what we used to build the affected binaries) and that it went off the cliff after that. I'll confirm and bisect.
Posted by: StefanO
« on: May 16, 2018, 01:54:45 PM »

The 64 bit v245 binary is now just as fast as the 32 bit binary. Thanks.

What OCaml compiler version did you use?
Posted by: subtledoctor
« on: May 16, 2018, 01:47:30 PM »

EDIT - sorry, wrong thread, nothing to see here...
Posted by: Wisp
« on: May 15, 2018, 05:01:01 PM »

Yeah, I get about 80 seconds with v237, as well as with v245 compiled with OCaml 4.01.00 and 4.02.3. I guess I'll by trying out the different OCaml versions between 4.02 and 4.05 to see if it's one of them that's doing it. It could be a gcc thing or something else, too, although I suppose gcc and the rest are unlikely compared to OCaml's compiler.
Posted by: Argent77
« on: May 15, 2018, 03:16:03 PM »

You could be right. The 32-bit macOS binary is compiled by OCaml 3.12.1, while 64-bit binary for macOS and Windows were both compiled by OCaml 4.05.

Your Windows timings aren't too far off either. I'm also getting timings of around 400-500 seconds for SCS init on Windows (with both 32/64 bit variants). Out of curiosity I have installed the same mod component with WeiDU 237 that came with the original SCS package, and it finished installation after about 50 seconds! There is definitely something not right with more recent OCaml versions.
Posted by: Wisp
« on: May 15, 2018, 01:56:51 PM »

This prompted me to compare performance of the Linux builds I do (for the first time ever) and I see a similar difference between 32-bit and 64-bit. However, it seems to be a toolchain thing, as the 32-bit version is compiled on 32-bit Debian Jessie (OCaml 4.01 and old toolchain) while the 64-bit version is compiled on 64-bit Fedora 27 (OCaml 4.05 and recent-ish toolchain); if I compile a 32-bit WeiDU on 32-bit Fedora 27, it's a lot slower than the version compiled on Debian even though they are both 32-bit. Both Windows versions are compiled with modern-ish OCamls and toolchains (right?) and I'm guessing so is the 64-bit Mac version, while the 32-bit Mac version is probably from and old-ish toolchain (right?), same as the performant 32-bit Linux version. Next, I'll be setting up a 64-bit Debian Jessie (if I can) to attempt to see if that produces a performant 64-bit WeiDU. If so, I guess I'll have to check with the OCaml people if they know what's going on.

Edit: am I and my old Sandy Bridge Windows system the only ones to get ~500-600 seconds to install Initialise with Windows WeiDU? Like StefanO, I get ~90 seconds with the good 32-bit Linux WeiDU.