Really, the weakness in any search algorithm is in the quantity of nodes in your search space. Most of your effort involves (finding and) searching around barriers: The result of an optimal search is really a straight line from edge-of-barrier to edge-of-barrier---minimizing the deflection from the optimal path. I sat down & wrote up an algorithm that does better than raw A* w/ minimal caching. It works by processing the map for "interesting" points---areas where ideally paths must deflect through to obey impassable areas. Then a mapping of line-of-sight connected interesting points is constructed. From this, no further processing is required (but is useful, as pathing maps in DF are read far more often than modified). Paths are generated by first locating all line-of-sight attached interesting points from the start, all line-of-sight interesting points attached to the destination, and then A* from that initial set to destination set. Finally, connect everything by lines. Empirically, I found that ~7-10% of tiles are interesting for my style of architecture.
Picking interesting nodes:
A node is interesting if any one of three things is true:
- It has a diagonal, planar-passable neighbor with which 2 common, planar-impassable neighbors are shared.
- It has a cardinal (NSWE) neighbor which adjoins no more than 1 common, planar-impassable neighbor.
- It has a vertical connection.
- The tile contains a sock.
In essence: A diagonal-only connection between two tiles makes both interesting; tiles at a cardinal (NSWE) of an outer corner (not an inner corner) are interesting; Stairwells are interesting; Socks are important.
This list allows you to determine if a node is interesting simply by looking at its local neighbors, without concern for where those neighbors might connect.
Strictly, the third can be constrained further in the case of NxM staircases where N, M > 2 by not considering anything but the edge staircases to be interesting. That requires more pre-processing effort, however, as now you become concerned with the connection characteristics of your neighbors.
You can then calculate which interesting nodes have line-of-sight to each other, storing that mapping.
Line-of-sight:
A node has line-of-sight to another node if you have some stepped-line algorithm that can route from the start to the end without passing over some impassable tile. (
Bresenham's, for example. But note that this line algorithm is sensitive to which tile is "first" and which is "second". hasLOS(a, b) does not imply hasLOS(b, a))
The algorithm for searching:
- From start location, calculate all interesting nodes that are in line-of-sight (hasLOS(start, interestingx) == true). This is the starting open set for A*.
- Calculate all interesting nodes that are in line-of-sight of the end (hasLOS(interestingy, end) == true. Note: order of arguments is often important). This is the destination set for A*.
- Run a standard A* across the interesting nodes (until you hit one in the destination set). Remember: the hasLOS(interestingj, interestingk) mapping was calculated in advance for all interesting nodes.
- Connect the chain of interesting nodes, as well as the start & end (in their correct spots). Use the line-of-sight algorithm to fill in any missing steps (which is simple step-calculations, as opposed to searching. And in the case of interesting node connections, will be cached).
Caching:
Any call to hasLOS(a,b) can be cached. Such a function should calculate the path & distance if there exists a line-of-sight. Line-of-sight information between interesting nodes
must be stored (and updated if mutations occur, as below). Line-of-sight info for normal nodes
may be cached, and said cache may be dropped in the face of nearby mutation (as opposed to updating)--in addition--if the language supports soft-references, I recommend the cache be setup with that in mind to aid with memory consumption.
Mutation:
If the map needs to be mutated (and this
is DF, after all), you can modify the caches without requiring a full re-processing. Consider the set of modified tiles and their immediate neighbors as the "changed set." Recalculate the interesting status for each of those tiles. If any tile becomes not-interesting, make note, and remove any mappings from other interesting nodes to it. For any interesting node that has a path crossing the changed set and if that path is now invalid, remove the path. If a tile becomes interesting, calculate the paths to-and-from it as in the pre-processing step.
For the above interesting nodes that lost paths but were not themselves in the changed set, they get their new paths when the new interesting tiles in the changed set calculate their own paths. And as a characteristic of the algorithm, if there are paths deleted due to a blocked line-of-sight (but there is still some total path around the obstruction), new interesting points (and their connections) that are made will ensure the "pathing hole" is filled.
For non-interesting nodes that are not in the changed set, but are in line-of-sight with it, you have 2 options: drop their cached hasLOS() information, or attempt to drop any invalid entries. I recommend the former.
Empirical:
Feeding one of my forts in---I use a consistent architecture, so they all look alike---and time-to-path was decreased by ~10x over raw A*. Substantially more if there was no path from start to end. Much of my time seems to being eaten by I/O concerns, and the time-to-cache for hasLOS(non-interesting, interesting). Both are a characteristic of the testing, as I'm choosing random start and end pairs that do not share line-of-sight, and thus for almost every path there is no cached hasLOS(start, interesting) or hasLOS(interesting, end) information. Both of which are O(nm), where n is the number of interesting points, and m is the average distance from either or end to an interesting point. In short, if I had ~7 dwarves standing in my dining hall, all the non-interesting tiles would know which interesting tiles they're connected to within a few dozen frames, and that information won't be changing, especially once the map stops changing nearby. I'd expect ~20-30x improvement over raw A* by then.