Jump to content

Optimization idea for weidu/weinstall (sqlite3 or some index, vs file read/iteration)


Recommended Posts

I notice a LOT of tp2 files (or tpa I guess) have actions that read every file in override/ of some type (cre, itm, spl etc.) looking for files that "match" based on some criteria.

This can be pretty bad when a heavy install is present, and a mod wants to find every troll to apply some patch/update, yet iterates (and opens) 15,000+ .cre files, when there may be less than 50 troll files.

Imagine if as mods added spells (ADD_SPELL) or wrote to the override directory, an indexing program would go through override/, and add each file (and it's various queryable params) into an sqlite3 .db (or some home grown db system).

Then, in a single query, one could ask weidu "give me all troll overrides", and get that list in a fraction of a second, vs taking potentially 1+ minutes to go through each.

Has this ever been considered?  (I'm guessing yes, but I'm new to the scene, and the install/uninstall cycle is so inefficient)

Link to comment

It would require all tools (not just WeiDU but also NearInfinity) to maintain a lot of arbitrary databases for external info to cover most of these filtering, which would need to be generated atleast during the first launch of WeiDU which would make everyone suffer compared to megamod installations being slower. I'd rather have longer install times than trying and failing to come up with an universal database duplicating out the entire game for faster queries.

Link to comment

There's also a common (inefficient) pattern used a lot, which makes sense from ease of implementation and configuration (and probably the undo mechanisms), that is a lack of batching patch updates.

What happens a lot, is something like this:

1. Do you want to add Xyz to all of X type items? (if [y] proceed to iterate+patch all files in the set)

2. Do you want to add Abc to all of X type items? (if [y] proceed to iterate+patch all files in the set)

So, in a large install, this becomes N*X, where N = files in override of the extension type, and X = questions which touch the same set of files.

If the questions were deferred and batched (I've seen this on a couple, like the "tnt" (BG2 Tweaks) trap related questions), the iteration would only have to occur once, vs many times.

Ideally, one file would never be opened more than once per mod (and on that one opening, all related patches would be applied - very similar to how clojure transducers work with large data).

Link to comment
13 minutes ago, ahungry said:

Ideally, one file would never be opened more than once per mod (and on that one opening, all related patches would be applied - very similar to how clojure transducers work with large data).

But ideally you would only install one mod and you wouldn't need to bother with any of this shit ever again. Do you see where this is going ? And yes, the mod maker has all the priviliges of pre-knowing what they intend to do and they can write their ¤%%&/&¤% code as they want to... do you think they can do it with a little bit more ease with knowing that a single atom doesn't break and shatter the whole %&/%# ton of a game ?

Link to comment

Previous changes may affect later changes, so I don't see how batching could be done universally. If it would help with anything at all.

If you have enough memory, an in-memory "disk" would be an interesting test. Copy the game there, install mods there, record timing info and see the difference.

Link to comment
1 hour ago, ahungry said:

Ideally, one file would never be opened more than once per mod (and on that one opening, all related patches would be applied - very similar to how clojure transducers work with large data).

Yeah no. That's not going to work. Not even "once per component". For example, here's the code for one component of the mod I'm developing, with the intent of doing exactly one thing - fixing the interaction of one creature with a fireshield hitting another creature with a fireshield in 2.6.

Spoiler

    ACTION_CLEAR_ARRAY fireshields
    OUTER_SET fsid = 0

    COPY_EXISTING_REGEXP GLOB ~^.+\.spl$~ ~override~
        READ_LONG 0x64 ofsAb
        READ_SHORT 0x68 numAb
        FOR (idx = 0; idx < numAb; ++idx) BEGIN
            SET ofsProj = ofsAb + 0x26 + (idx * 40)
            READ_SHORT ofsProj proj
            PATCH_IF ((343 < proj) AND (proj < 347)) BEGIN
                SPRINT $fireshields(~%fsid%~) ~%SOURCE_RES%~
                SET fsid = fsid + 1
                SET idx = numAb
            END
        END
    BUT_ONLY // First pass: find the fireshields, make an array.

    COPY_EXISTING_REGEXP GLOB ~^.+\.spl$~ ~override~
        READ_LONG 0x64 ofsAb
        READ_SHORT 0x68 numAb
        FOR (idx = 0; idx < numAb; ++idx) BEGIN
            SET ofsProj = ofsAb + 0x26 + (idx * 40)
            READ_SHORT ofsProj proj
            PATCH_IF ((343 < proj) AND (proj < 347)) BEGIN
                PHP_EACH fireshields AS index => fsref BEGIN
                    LPF ADD_SPELL_EFFECT INT_VAR opcode=206 target=1 timing=10 duration=1 header=idx+1 STR_VAR resource=EVAL "%fsref%" END
                END
            END
        END
    BUT_ONLY // Second pass: each fireshield hit gives caster 1-tick immunity to all fireshields
 

Two passes through the list of spells - one to figure out which ones are fireshields and generate a list (based on using certain projectiles new in 2.6), one to actually apply the patches. Because each spell in the list needs a new effect for each spell in the list.

The only way to avoid this two-pass behavior (or worse) is to start with a predefined static list of fireshields. That's naturally error-prone, and guaranteed to go wrong if you interact with another mod that adds spells of this type. In this case, if either fireshield wasn't on the list for some reason, the mod would have no effect on the interaction; the hits would bounce back and forth at an extremely high rate until either one party died or they moved far enough away, instead of each creature getting hit once as in older versions and the mod's intent.

Link to comment
8 hours ago, jmerry said:

The only way to avoid this two-pass behavior (or worse) is to start with a predefined static list of fireshields.

Like an sqlite database which is tracking it?  😆

But yea, as noted in my prior posts, I'm sure things are as they are for a practical reason and they've been considered before.

Having install scripts maintain an index would be bad (error prone), but a single program could do it (including using something like inotify/other OS 'watching' mechanisms to know when to update a file's metadata in said index - some tool that scans override/, builds up the queryable data, and keeps it up to date).

 

Link to comment
12 minutes ago, Graion Dilach said:

Biffing aka creating file packages during mod installations is a possibility.

And it is worth it in the old engine. But now we have RAM and it's not. Yes, the difference of 32 MB's and 16 GB's quite a different, in the engine design.

Link to comment
2 hours ago, ahungry said:

Like an sqlite database which is tracking it?

How many characteristics of files will be recorded? Every byte value of every file? It’s a nice idea, I guess... a map at 1:1 scale can have lots of detail, but using it is no more efficient than walking.

The post does not really identify a problem to be solved. “Opening” 15,000 .cre files to look for 50 trolls sounds bad, but is checking the byte value at offset 0x91 of 15,000 files appreciably different from checking the value of 15,000 entries in a database? 

I suppose the answer may be “yes,” but the important question is, how different. Is it a quarter-second versus a half-second? In an average mod install, will it cut install time from 20 seconds to 15 seconds? How much time should be devoted to this, in exchange for such meager gains? Will the result work for all the various platforms and systems people use?

Note that you can already do this in Weidu, building arrays on the fly with whatever file information you want, and later poll that information to set about patching stuff. Jmerry’s code posted above is a great example. It doesn’t look like a database to someone used to SQLite, but that’s effectively what it is. This is often used in an ad-hoc manner, but again I would have to be convinced of the benefit of doing it in other than an ad-hoc manner before I considered putting time and energy into changing anything in a mod. 

Last point: to the extent this thread suggests anything concrete, it implicitly suggests changing mods. Given the hassle of testing, the likelihood that making any changes could introduce new bugs that need fixing and testing, and the fact that many modders are not professionally competent programmers but rather random folks who managed to get a handle on the ins and outs of a weird little scripting language... and this is likely to be a real, real hard sell. 

A better idea, methinks, would be to generate some actual useful functions to effectively do this, drop-in code that modders could borrow and use. There’s a subforum here where such a thing could find a home. 

Edited by subtledoctor
Link to comment
28 minutes ago, subtledoctor said:

There’s a subforum here where such a thing could find a home. 

Uhh, where is this ? It's not this is it ? Cause that's a thread, not a subforum. The problem being that you didn't link into it. Yes, I like to sound clever, but I am not, but that's what's trolls are for... so we ended up with identifying at least 1.

Link to comment

Some benchmarks would be interesting, I've done a lot of little things to wrap projects in SQLITE databases (nintendo e-shop API, hundreds of github repo readmes/docs) for fast interfaces in the past, so I might just do this as a proof of concept (time to index, time to retrieve all of <something> by common index, and compare with some stubbed out weidu equivalents).

Assuming you have a unique index on something (such as a "race/type" in a .cre based file) its 0(1) for the db, and 0(n) for a full scan - and these complexities get magnified greatly when run repeatedly (or heaven forbid, in a loop that is, itself, 0(n)).

I agree it'd need to be abstracted and opaque, so script/mod authors would not have to concern themselves with the back end and what it was doing (just something like, those common loop patterns scanning/reading bytes of files would change to for loops that immediately pull a subset).

Time wise, I would estimate you'd be changing ~20 second full directory iterations into 200ms lookups (I would expect at least a 100x improvement minimum, given how often I tail a *.debug in the install directory, and see every .CRE, .ARE, .ITM, .SPL file type being opened/closed).

Edited by ahungry
Link to comment

Join the conversation

You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...