When trying to solve a complex computer science problem faster you generally have 2 approaches:
1. getting a more powerful computing engine and adapting your code to work with it.
2. simplifying the problem
The approach you took is approach 1. Many people think this approach is the best course of action but in my opinion it's not. There is a limit to how fast you can make the program run using multi-core and multi-threading and in the age of mobile computing pushing the CPU to the edge is not the answer.
The best approach is to simplify the problem and the most common approach is divide and conquer . If you replace the problem with a many smaller problem you'll generally find a more efficient solution.
I had this problem with my path finding simulator. The original A* I used did 3 linear searches (O(N)), 1 to find best node, and two to check if the node is in the open or closed set . I replaced the "push" (O(1)) to "insert in order" (O(log N) ) which converted the "search for minimum" from O(N) to O(1). This made it run about 1000 times faster. with 128X128 map max(N)=16384.those 3 linear searchs take about 3*(N(N+1)/2) that means max complexity is 268.5 million iterations. This is reduced to 16384 insert in order which is much less than NlogN = 0.229 million iterations.
Now here is the problem. If you have all those optimizations in and I'm guessing Toady got them all. The problem is still too complicated when you have an max(N) = ~1 million. You go through max(N) every time 19*1million is still a lot . For this reason I generated an overlaying graph of "rooms" which takes about max(N) iterations . It generates room , how they are connected and the access point (I pick the first - not optimal). This reduces the "path doesn't exist " penalty from the number of nodes to the number of rooms. Instead of looking at the "big world" the pathfinding creature only looks at the road they picked . Most of the paths also become trivial where A* performs best case almost every time. This is also less memory intensive should have very good cache performance. Unfortunately it needs more work
because I didn't even start room updates yet .
I admit pathfinding isn't the only thing that's slowing DF down but I can work on 1 suggestion at a time
. I think that performance issues are a pressing concern but I'm hoping that showing a working alternative would make Toady's life easier when he finds the time to work on them.