Yeah, so these are the (most significant) timings of Initialise on my machine with WeiDU 245+4.03.0:
COPY 1.209
READ_* 4.036
eval_pe 8.723
process_patch2 18.244
function overhead 70.090
TOTAL 104.303
Pretty grim, but whatever. Then we try 245+4.04.0:
COPY 1.894
READ_* 6.195
eval_pe 18.201
process_patch2 32.231
function overhead 643.688
TOTAL 704.793
Yeah...
Breaking out the Hashtbl.copy that is used to set up the in-function variable environment (still 245+4.04.0):
COPY 1.849
READ_* 6.232
eval_pe 18.207
process_patch2 31.968
function overhead 125.434
copying hash tables 555.494
TOTAL 741.566
So we've found the culprit.
It seems like a compelling case could be made for staying on 4.03.0 for the time being. You could possibly work around the issue with Hashtbl.copy by changing functions to enter into an empty variable scope, rather than one copied from the function's parent scope, but it'd be a breaking change, so it'd have to be opt-in (say, TP2 flag). But there is also the matter of 245+4.04.0 being slower overall than 245+4.03.0, not just on function overhead, but stuff like eval_pe and process_patch2, though maybe that time is from one or more of the other Hashtbl.copies found in WeiDU.