Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  
Pages: 1 ... 20 21 [22] 23 24

Author Topic: DFHack plugin embark-assistant  (Read 94174 times)

PatrikLundell

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #315 on: May 31, 2021, 05:51:00 pm »

No, I don't think the chance of catching it on a system where it doesn't happen when looked for are high, and, unfortunately, I don't think the installation is the culprit, but I might be wrong. The best odds are probably for some victim to be able to debug it, preferably with a locally compiled DFHack (so debug symbols are available).

However, the only thing lost trying a zipped installation that doesn't reproduce it is some time...
Logged

RedDwarfStepper

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #316 on: June 01, 2021, 05:45:30 pm »

dumb question incoming: Won't "any" debug-dll - even one we built - give more details than the release version?
less dumb question: Have you ever worked with "Dr. Watson"? It should be available on most Windows systems and could allow us to get a crash log...
Or this might work as well:
https://docs.acrolinx.com/kb/en/how-do-i-capture-a-process-dump-of-a-crashing-application-for-support-13731081.html
Logged

PatrikLundell

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #317 on: June 02, 2021, 02:25:09 am »

I'd expect a DLL built with symbols would potentially give more info to someone who's hooked up a debugger to DF, yes, but without a debugger I'd expect the same crash and no info, but I can't say with certainty as I haven't tried.

I've never used Dr Watson. However, I've enabled crash dumping in the past (several years ago), for something (I've done it for DF, and found Toady can't use them, but I haven't used them myself). If a crash dump is generated, I'd expect a DLL with symbols to be more useful than one without.
Logged

Deuslinks

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #318 on: June 02, 2021, 10:53:43 am »

Hi there I managed to get the dump file https://dffd.bay12games.com/file.php?id=15556 and I've uploaded it here.

One thing I have noticed and I don't know if it might cause some issues is the tilesets dont seem to be complete so things like workshops looking weird and the up down stairs being X when it should be a image would any issues with the tilesets potentially cause issues with the searching or are they completely separate?
Logged

PatrikLundell

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #319 on: June 02, 2021, 11:51:45 am »

Thanks. I'll see if I can get anything of of it (I think I've looked at a dump only once, and that wasn't on a PC).

I think it's unlikely the tile set should be the culprit, as the plugin doesn't use the tile set at all, but rather characters (which aren't tiles, at least not in the normal sense).

Edit: I can sort of open the dump file with Visual Studio, but it seems to demand a debug symbol file for SDLreal.dll in the form of an SDL.pdb file. I first tried the one generated when DFHack was compiled with debug symbols, but that file was rejected as it wasn't considered to match. I then tried to recompile DFHack in Release mode, but that didn't produce any debug symbol file (which isn't completely unexpected).
VS' Output also shows a lot of reports that DLLs were compiled without symbols.

However, the Call Stack tab shows an address in SDLreal.dll, with the rest of the call stack entries pointing to DF itself, and the fact that it refers to SDLreal.dll is odd, since that ought to be DF's original version of SDL.dll, rather than the DFHack one, as if DF was running with the original DLL but somehow managed to call DFHack ones anyway (I tried to start DF with DFHack disabled, and couldn't reach the Embark Assistant, as expected).

Thus, unless someone who actually knows how to examine a crash dump can get something out of it (e.g. by matching the assembly at the address against code generated with symbols), I suspect we need it captured with symbols.

This is the code at the last call stack location:

00007FFDC856E804  test        eax,eax 
00007FFDC856E806  je          00007FFDC856E82F 
00007FFDC856E808  cmp         eax,102h 
00007FFDC856E80D  je          00007FFDC856E824 
00007FFDC856E80F  lea         rcx,[7FFDC859A200h] 
00007FFDC856E816  call        00007FFDC856C2E0 
00007FFDC856E81B  or          eax,0FFFFFFFFh 
00007FFDC856E81E  add         rsp,20h 
00007FFDC856E822  pop         rbx 
00007FFDC856E823  ret 
00007FFDC856E824  mov         eax,1 
00007FFDC856E829  add         rsp,20h 
00007FFDC856E82D  pop         rbx 
00007FFDC856E82E  ret 
00007FFDC856E82F  lock dec    dword ptr [rbx+8] 
00007FFDC856E833  xor         eax,eax 
00007FFDC856E835  add         rsp,20h 
00007FFDC856E839  pop         rbx 
00007FFDC856E83A  ret 

This operation looks like it could point to an invalid location "00007FFDC856E82F  lock dec    dword ptr [rbx+8]" if rbx contained garbage.
« Last Edit: June 02, 2021, 01:13:58 pm by PatrikLundell »
Logged

lethosor

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #320 on: June 02, 2021, 03:52:03 pm »

However, the Call Stack tab shows an address in SDLreal.dll, with the rest of the call stack entries pointing to DF itself, and the fact that it refers to SDLreal.dll is odd, since that ought to be DF's original version of SDL.dll, rather than the DFHack one, as if DF was running with the original DLL but somehow managed to call DFHack ones anyway (I tried to start DF with DFHack disabled, and couldn't reach the Embark Assistant, as expected).

Could you post the call stack? I'm not really sure how to interpret this.

I'm assuming you are aware that SDL.dll is essentially the DFHack core on Windows, and that SDLreal.dll is just a renamed copy of DF's SDL.dll (so I wouldn't be surprised if it expects debug info to be in "SDL.pdb", since that is likely compiled into the DLL). To my knowledge, DF will call functions in SDL.dll for any SDL functions, and DFHack will forward most of those calls to SDLreal.dll untouched. I don't believe there is a mechanism for SDLreal.dll to call into DF directly - there is some SDL code that gets called before main(), but that should be compiled into the DF executable as part of SDLmain.lib.
Logged
DFHack - Dwarf Manipulator (Lua) - DF Wiki talk

There was a typo in the siegers' campfire code. When the fires went out, so did the game.

RedDwarfStepper

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #321 on: June 02, 2021, 04:30:35 pm »

I suspect we need it captured with symbols.
I'll try and create a debug version (aka "RelWithDebInfo") of the master-branch and upload it.
It might be so slow though that the crash won't be provoked, the next step would be "ReleaseOptWithDebSymbs" which I cobbled together myself when I needed to debug something in the later search process.

00007FFDC856E82F  lock dec    dword ptr [rbx+8] 
This operation looks like it could point to an invalid location "00007FFDC856E82F  lock dec    dword ptr [rbx+8]" if rbx contained garbage.
Searching for "lock dec dword ptr" brings up hits for "semaphore", "Multiprocessor Protection" and "atomics" - this all to me points to code that handles multi-threading or thread-safety - which kind of makes sense for the real SDL code, as I think I recall that it is the only part of DF that is multi-threaded.
If SDL crashes after embark-assistant somehow corrupted the memory this might be tricky
Anyone knows of any compiler options that activate runtime memory checks that induce a crash exactly at that moment when some invalid operation happens?
Logged

RedDwarfStepper

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #322 on: June 02, 2021, 05:43:06 pm »

I'll try and create a debug version (aka "RelWithDebInfo") of the master-branch and upload it.
=> here we go https://dffd.bay12games.com/file.php?id=15559
@Deuslink: Could you please replace "SDL.dll" and "hack/plugins/embark-assistant.plug.dll" in your LNP with the files in the zip and try to reproduce the crash?
Then upload the new dump again - thank you very much!
Logged

PatrikLundell

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #323 on: June 02, 2021, 05:52:17 pm »

@lethosor: This is what the call stack looks like.

>   SDLreal.dll!00007ffdc856e804()   Unknown
    Dwarf Fortress.exe!00007ff775ad9a6d()   Unknown
    Dwarf Fortress.exe!00007ff775ada0a7()   Unknown
    Dwarf Fortress.exe!00007ff775ada580()   Unknown
    Dwarf Fortress.exe!00007ff775ada87d()   Unknown
    Dwarf Fortress.exe!00007ff775adb062()   Unknown
    Dwarf Fortress.exe!00007ff77645cf8e()   Unknown
    Dwarf Fortress.exe!00007ff77645cd25()   Unknown
    Dwarf Fortress.exe!00007ff77645c1fa()   Unknown
    [External Code]   

And, @Deuslink: I suggest renaming SDL.dll to e.g. SDL.dll.orig and embark-assistant.plug.dll to embark-assistant.plug.dll.orig rather than replacing them. This would allow you to easily restore your system to the original configuration. And thanks for your support so far!
Logged

Deuslinks

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #324 on: June 02, 2021, 06:19:15 pm »

Hi there Happy to help so I found and replaced them thanks for the advice about putting .orig after them. Just to confirm the only two files I  replaced are the SDL.dll and embark-assistant.plug.dll

I have done this and used a previous world and it crashed when I searched Good - Present
Dump file https://dffd.bay12games.com/file.php?id=15560

I created a new world just in case and it also froze (not responding) with the same filter
Logged

lethosor

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #325 on: June 02, 2021, 11:43:50 pm »

@lethosor: This is what the call stack looks like.

>   SDLreal.dll!00007ffdc856e804()   Unknown
    Dwarf Fortress.exe!00007ff775ad9a6d()   Unknown
    Dwarf Fortress.exe!00007ff775ada0a7()   Unknown
    Dwarf Fortress.exe!00007ff775ada580()   Unknown
    Dwarf Fortress.exe!00007ff775ada87d()   Unknown
    Dwarf Fortress.exe!00007ff775adb062()   Unknown
    Dwarf Fortress.exe!00007ff77645cf8e()   Unknown
    Dwarf Fortress.exe!00007ff77645cd25()   Unknown
    Dwarf Fortress.exe!00007ff77645c1fa()   Unknown
    [External Code]   

I suppose the answer to this question isn't really relevant, but I'd be interested in knowing whether ">" marks the current or oldest frame. If current, I would expect to see an intermediate call to SDL.dll, and I don't think such a call could have been optimized out. If oldest, it's possible that DF has registered a callback that SDL calls - this feels more feasible to me, since it wouldn't need to go through DFHack's SDL.dll, but I'm unaware of DF code that does this explicitly.

To RedDwarfStepper's points: DF does use some threading primitives implemented by SDL. Hard to say if that's part of this stack trace without knowing what "SDLreal.dll!00007ffdc856e804()" refers to, though. Multithreading isn't exclusive to the SDL layer, but that's what provides e.g. locks that DF uses.
Logged
DFHack - Dwarf Manipulator (Lua) - DF Wiki talk

There was a typo in the siegers' campfire code. When the fires went out, so did the game.

PatrikLundell

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #326 on: June 03, 2021, 02:37:57 am »

I assume the stack trace has the most recent address at the top, but this is the first time I've looked at a Windows dump (I'm using Visual Studio), so I certainly can't claim to know.
However, to me it looks like Windows starts DF which then goes through a number of internal calls before ending up in SDLreal.dll (as noted, without any registered redirection via SDL.dll).

I'll grab Deuslinks' new dump to see if I can get anything useful out of that.

Edit:
This call stack looks a lot saner:
    SDLreal.dll!00007ffdcbcee804()   Unknown
>   SDL.dll!00007ffd42a1da19()   Unknown
    Dwarf Fortress.exe!00007ff775ad9a6d()   Unknown
    Dwarf Fortress.exe!00007ff775ada0a7()   Unknown
    Dwarf Fortress.exe!00007ff775ada580()   Unknown
    Dwarf Fortress.exe!00007ff775ada87d()   Unknown
    Dwarf Fortress.exe!00007ff775adb062()   Unknown
    Dwarf Fortress.exe!00007ff77645cf8e()   Unknown
    Dwarf Fortress.exe!00007ff77645cd25()   Unknown
    Dwarf Fortress.exe!00007ff77645c1fa()   Unknown
    [External Code]   

(The ">" is at SDL.dll because I've shifted the focus there: it was at the top originally).
The SDL.pdb file I've got still doesn't match.

However, I'm not sure if it's new data or just me noticing it, but EAX and EDX both are shown to have a value of 0 in SDLreal.dll. RBX = 1549282935824, corresponding to 168B8758410h, which doesn't seem to be an address anywhere near the other addresses used, which I suspect is the direct cause of the crash. However, I don't see where it got corrupted.

The code referenced in SDL.dll is
00007FFD42A1DA13  call        qword ptr [7FFD432CFE90h]  ; This is the instruction before the one referenced.
> 00007FFD42A1DA19  add         rsp,28h  ; This is the address referenced. Might it be the address the SDLreal.dll call should return to?
00007FFD42A1DA1D  ret
with RSP having a value of 0000000F104FF5F0h

The last (i.e. topmost) DF call stack entry refers to this:
00007FF775AD9A67  call        qword ptr [7FF7764B85C0h]  ; This is the instruction before the one referenced.
> 00007FF775AD9A6D  mov         rcx,qword ptr [rbx+100h]  ; This is the address referenced.
00007FF775AD9A74  call        qword ptr [7FF7764B85C0h] 
00007FF775AD9A7A  lea         rdx,[rsp+30h] 
00007FF775AD9A7F  mov         rcx,rdi 
00007FF775AD9A82  call        00007FF775ADC430 
00007FF775AD9A87  mov         rdx,rax 
00007FF775AD9A8A  mov         rax,qword ptr [rax] 
00007FF775AD9A8D  test        rax,rax 
00007FF775AD9A90  jne         00007FF775AD9A97 

Here RBX has a value of 00007FF777293C10h

(I've added ">" to the assembly snippets above to indicate the instructions indicated by VS, plus the comments after the instructions).

Edit 2:
Yes, the SDL.dll call instruction before the one in the call stack calls this (I've enabled display of data in addition to assembly to be able to decode the address from the referenced memory, which is why it looks messy, as columns aren't preserved):
> 00007FFDCBCEE7C0 83 CA FF             or          edx,0FFFFFFFFh 
00007FFDCBCEE7C3 E9 08 00 00 00       jmp         00007FFDCBCEE7D0 
00007FFDCBCEE7C8 CC                   int         3 
00007FFDCBCEE7C9 CC                   int         3 
00007FFDCBCEE7CA CC                   int         3 
00007FFDCBCEE7CB CC                   int         3 
00007FFDCBCEE7CC CC                   int         3 
00007FFDCBCEE7CD CC                   int         3 
00007FFDCBCEE7CE CC                   int         3 
00007FFDCBCEE7CF CC                   int         3 
2>00007FFDCBCEE7D0 40 53                push        rbx 
00007FFDCBCEE7D2 48 83 EC 20          sub         rsp,20h 
00007FFDCBCEE7D6 48 8B D9             mov         rbx,rcx 
00007FFDCBCEE7D9 48 85 C9             test        rcx,rcx 
00007FFDCBCEE7DC 75 15                jne         00007FFDCBCEE7F3 
00007FFDCBCEE7DE 48 8D 0D 03 BA 02 00 lea         rcx,[7FFDCBD1A1E8h] 
00007FFDCBCEE7E5 E8 F6 DA FF FF       call        00007FFDCBCEC2E0 
00007FFDCBCEE7EA 83 C8 FF             or          eax,0FFFFFFFFh 
00007FFDCBCEE7ED 48 83 C4 20          add         rsp,20h 
00007FFDCBCEE7F1 5B                   pop         rbx 
00007FFDCBCEE7F2 C3                   ret 
00007FFDCBCEE7F3 48 8B 09             mov         rcx,qword ptr [rcx] 
00007FFDCBCEE7F6 83 C8 FF             or          eax,0FFFFFFFFh 
00007FFDCBCEE7F9 3B D0                cmp         edx,eax 
00007FFDCBCEE7FB 0F 44 D0             cmove       edx,eax 
00007FFDCBCEE7FE FF 15 C4 99 02 00    call        qword ptr [7FFDCBD181C8h] 
The snippet above is the SDLreal.dll code immediately before the one recorded in the call stack, and "2>" is where the call should jump to. As far as I can see, RBX is pushed and the code then loads something else into it.
At the end there is the call to:
00007FFDE17D4AB0 FF 25 72 D7 05 00    jmp         qword ptr [7FFDE1832228h] 
with the data at that address, unfortunately, shown as 00 00 7F FD ?? ?? ?? 90, so I can't determine the destination of the jump.
« Last Edit: June 03, 2021, 04:59:12 am by PatrikLundell »
Logged

RedDwarfStepper

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #327 on: June 03, 2021, 07:45:04 am »

I'll have to confess I'm really out of my depth here.
My initial hope was that using a debug-version of the plugin would provide more readable call stacks.
But seeing that the last call stack mostly happens in Dwarf Fortress.exe (for which we don't have any sources) itself and not in any code associated with DFhack this probably won't be the case.
Especially if it is really is corrupted memory, which like a booby trap only triggers after being set and left alone.

So I'd like to suggest an additional/alternative approach, one where I can actually contribute something apart from confusion:
- our current assumption is that the error has its cause in the embark-assistant plugin. As we have 1.5/2 data points and the CTD is very likely to happen when a search with embark-assistant is started I think this is a strong hypothesis.
- my/our (implicit) assumption is that it is caused by code that was added recently (version-wise) as we haven't had any such bug reports previously. Not as strong a hypothesis as the previous one but good enough to work with for the moment.
To prove or refute these assumptions we could create a version of the plugin that corresponds to version 0.47.05-beta1 or perhaps better 0.47.04-r5.
If the error is gone then we are sure that something added after that version is the cause - otherwise we have to go further back.
Rinse and repeat until the error is gone or until it gets very unlikely that embark-assistant is the sole cause.
Then we can slowly start readding isolated changes until we get the crash again... so some kind of binary search + divide and conquer on the source code level.
Any thoughts?

PS: I can probably get a first version done tonight if Deuslinks still is game.
Logged

Deuslinks

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #328 on: June 03, 2021, 07:47:37 am »

Im still game for helping and testing
Logged

PatrikLundell

  • Bay Watcher
    • View Profile
Re: DFHack plugin embark-assistant
« Reply #329 on: June 03, 2021, 10:09:41 am »

Well, RedDwarfStepper, you're not the only one whose feet don't reach the bottom...

The regression method seems to be our most likely avenue forwards (or is it backwards), although I don't have much hope. The Embark Assistant triggering the issue doesn't necessarily mean it actually caused it, but exploring what we can is better than giving up, and we still have a willing test subject.
Logged
Pages: 1 ... 20 21 [22] 23 24