Topic: Improving Pathfinding. (Computer sciency) (Read 60904 times)

NW_Kohaku · « **Reply #345 on:** April 26, 2011, 12:29:00 pm »

Quote from: tps12 on April 26, 2011, 09:42:06 am

If "just wait until Toady gets around to it" isn't enough for people, then a real investigation into pathfinding should start with trying to reproduce the existing DF pathfinding on actual maps (via dfhack) and establish plausible test scenarios that demonstrate the slowdowns observed in real games. You can't make any progress on a solution until you know what problem you're trying to solve.

This has been done, however.

There have been threads specifically on testing out FPS results from different build strategies. This is why things like using ramps instead of regular stairs are used by those wanting a little extra edge in FPS. As Phoebus was recently pointing out, the game prefers to look for ramps over looking for stairs. The game will check for ramps before settling on stairs if no ramps are around, and as such, you save the game on 8 failed checks (although you probably will fail a number less than 8 if your spiral rampway requires going "backwards" horizontally in order to move the correct direction vertically, which will happen about half the time in a spiral ramp, but it is still faster than failing all eight diagonals every time before settling on just moving straight up or down).

While I disagree with Phoebus's assessment of what is the "tweak" and what is the "standard", his math is borne out by actual in-game results, and is verifiable.

Likewise, there have been threads which detail testing of how the A* is set up, and how the pseudo-ideal fort shape is a giant cube of up/down stairs that has no walls whatsoever, and allows dwarves to pathfind freely through space. (Making everything ramps would be more ideal, but impossible since you can't stack ramps on ramps.)

The reason why pathfinding takes up so much processor time is because the game needs to randomly access very large amounts of memory in nonsequential (functionally random) order, meaning that the typical pipelining performance boosts generally just burn processor power for no gain. The CPU has to idle while waiting for memory fetches of a huge number of tested tiles in a maze that are stored in very large vectors.

This is why the focus of all of this talk about pathfinding has been on one major issue: accessing memory less often.

This is why nodal pathfinding is faster, because you are capable of bundling many accesses to memory all into one single access to memory.

The fact that nodal pathfinding is faster is entirely verifiable, regardless of how, specifically, Toady implemented some special alteration of A* or not. Simply having to perform less idle times waiting on memory fetching will always result in faster performance, as you are basically getting the computer to make educated guesses about what tiles are going to be more likely to have an optimal result, and as such, no longer have to test every single individual tile by trial-and-error, waiting on a memory fetch for every single one.

Properly implemented hierarchical nodal pathfinding will undeniably result in less memory fetches because you bundle the guesses and data of multiple fetches into single accesses of memory. The actual amount of memory bandwidth the game uses is fairly small, as the game has to wait on each memory fetch sequentially, meaning passing a larger abstract data structure would not have any negative performance consequence due to a memory bottleneck. By passing a node that abstractly contains the data to pathfind through an area consisting of tens or even hundreds of tiles in a single fetch of memory as opposed to tens or hundreds of fetches as the computer blindly trial-and-errors through each individual tile, you are speeding up the pathfinding process by an order of magnitude or two, at the cost of the memory consumed by the data structure, and the processor cost of building and updating the data structure.

These things are not really disputable. What is really disputed is where, exactly, to strike the balance of the costs in return for the benefits. (Or, in many cases, explaining how all these things work.) To say that nothing about the process is knowable is an almost solipsistic approach.

devek · « **Reply #346 on:** April 26, 2011, 12:33:10 pm »

Designations are a separate 16x16 array per mapblock from the pathfinding data, they are for... designations like digging or channeling and such

If it was up to me, that data would be replaced with a designation list but whatever hehe.

Niseg · « **Reply #347 on:** April 26, 2011, 12:34:38 pm »

Quote from: devek on April 26, 2011, 10:50:33 am

Learning and applying this stuff is a good thing. Sadly, most colleges don't teach up to date programming techniques :/ CPUs today are fast, like REALLY fast but memory isn't very fast. When you get it, you'll understand why it is just as expensive to path up one z-level as it is to path out 16 tiles horizontally. Check out this pdf from Sony for example...

http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf

It takes a common programming case and improves its performance from 20ms to 3ms with simple reorganization. The case is very similar to a lot of processing DF does

Being an embedded programmer I know this kind of stuff. I can tell you a thing or two about udp packet transmission and the downfalls of classic network stacks

. I am a big supporter of "link list = caching disaster" . I once improved udp transmission from 200us to 20us

.

This is generally why I look at node accesses as my benchmark . everytime you go through a node you don't know what kind of disastrous memory fetches you'll get unless the map is a very small bit vector. just to think that you have to go to check the weights too. Who know what other useless data you are gonna fetch along with it.

This is why a room pather is so great. let assume our map is 192X192 (4X4) with depth 100 . If we choose room size of 16x16 we can divide each Z level into 12X12 rooms . then instead of traversing a maximum of 3,686,400 nodes you'll traverse 14400 nodes. By saving the "edges" you can also make pathing between them cost nothing. You can also apply a smoothing algorithm and alternate paths . and if that isn't enough you can say those 12X12 rooms are 3X3 regions then from 14000 you are down to major regions and pathing inside them is 4X4 then a 16X16 intera-room pathing.You can also make regions cross the boundaries of Z levels.

This can make it possible to embark on 10X10 (provided you got enough memory) with little to no slow down.

on other news

I added room highlighting on my BFS-Collider output

. Now I know I got well defined rooms I can work with... Soon I'll have room pathing and I'll work on gaps.

tps12 · « **Reply #348 on:** April 26, 2011, 02:01:01 pm »

Quote from: devek on April 26, 2011, 12:33:10 pm

Designations are a separate 16x16 array per mapblock from the pathfinding data, they are for... designations like digging or channeling and such If it was up to me, that data would be replaced with a designation list but whatever hehe.

Right here I mean, it looks like traffic is stored as part of the designation data: https://github.com/peterix/dfhack/blob/master/library/include/dfhack/modules/Maps.h#L300

Or maybe I'm misreading.

Quote from: NW_Kohaku on April 26, 2011, 12:29:00 pm

This has been done, however.

...

As Phoebus was recently pointing out, the game prefers to look for ramps over looking for stairs.

...

Likewise, there have been threads which detail testing of how the A* is set up, and how the pseudo-ideal fort shape is a giant cube of up/down stairs that has no walls whatsoever, and allows dwarves to pathfind freely through space.

The part that hasn't been done (to my knowledge) is reproducing the DF pathfinding algorithm separately, so that it can be benchmarked in isolation and used as a baseline for evaluating other approaches. Which is the only way you can demonstrate that specific improvements can have any impact on actual performance.

Because something like this:

Quote from: NW_Kohaku on April 26, 2011, 12:29:00 pm

The reason why pathfinding takes up so much processor time is because the game needs to randomly access very large amounts of memory in nonsequential (functionally random) order, meaning that the typical pipelining performance boosts generally just burn processor power for no gain.

is an entirely empirical question and can't just be assumed.

devek · « **Reply #349 on:** April 26, 2011, 02:21:34 pm »

Quote from: tps12 on April 26, 2011, 02:01:01 pm

Quote from: devek on April 26, 2011, 12:33:10 pm
Designations are a separate 16x16 array per mapblock from the pathfinding data, they are for... designations like digging or channeling and such If it was up to me, that data would be replaced with a designation list but whatever hehe.

Right here I mean, it looks like traffic is stored as part of the designation data: https://github.com/peterix/dfhack/blob/master/library/include/dfhack/modules/Maps.h#L300

Or maybe I'm misreading.

No, you're not misreading *facepalm*. If that is true, then it means DF might be accessing more data than it needs to lol. The designation data for a mapblock is stored in a different memory allocation than the pathfinding data...

I had assumed it was packing all the data it needed for pathfinding into the pathfinding data :/ That still might be the case though, the traffic designations might be stored in both which wouldn't be a problem since its just 2 bits.

Niseg · « **Reply #350 on:** April 26, 2011, 04:17:21 pm »

It's not that bad even with 32 bit per block the weighted A* might cause it some trouble but the use of bit vectors means it should run pretty fast and should fit in cache easily enough.

Anyways I managed to create my first room navigation search based on the BFS-Collider rooms . The way I save the dimensions isn't optimal(x,y,dx,dy,width,height,mask[8]). I should start node exchange soon. My search function is fairly stupid it goes through all the rooms(it doesn't have to it can guess pretty well), find the start and end room , build a paths to their central points and then find the path between rooms( I need to fix that heuristic).

I think room pathing should cover anything the update scheme shouldn't be bad. if you dig a square you need to check which box it's in ,update bit vector or expand the box if needed. I guess I'll do it one step at a time

.

NW_Kohaku · « **Reply #351 on:** April 26, 2011, 09:14:37 pm »

Quote from: tps12 on April 26, 2011, 02:01:01 pm

Because something like this:

Quote from: NW_Kohaku on April 26, 2011, 12:29:00 pm
The reason why pathfinding takes up so much processor time is because the game needs to randomly access very large amounts of memory in nonsequential (functionally random) order, meaning that the typical pipelining performance boosts generally just burn processor power for no gain.

is an entirely empirical question and can't just be assumed.

It's not assumed, it's known that searching very large vector matrices thousands of times takes a lot of memory fetches.

This is also exactly what the others have just been saying:

Quote from: Niseg on April 26, 2011, 12:34:38 pm

This is generally why I look at node accesses as my benchmark . everytime you go through a node you don't know what kind of disastrous memory fetches you'll get unless the map is a very small bit vector. just to think that you have to go to check the weights too. Who know what other useless data you are gonna fetch along with it.

Quote from: devek on April 26, 2011, 10:50:33 am

CPUs today are fast, like REALLY fast but memory isn't very fast.

The trick is to somehow require less accesses to memory because memory is slower than the CPU. And when I say "slower", I mean we are comparing a CPU that can perform billions of tasks in a second to RAM, which can only be accessed thousands of times a second.

You can shove more data through the bandwidth the RAM has, but you need to make the CPU stop and wait for less accesses to RAM, which means you either want to put more of the weight of pathfinding on either having a more complex data structure that takes up extra RAM space, but which requires less fetches because it stores paths, as almost all of these hierarchical data structures that almost everyone else, including Niseg and myself have been talking about.

You can also try to put more of the burden upon the CPU and whatever data you can actually keep in the registers or cache, although I don't know of any trick for manipulating that sort of data that requires finding the solution to a maze without having to look at a tiny portion of the map, and relying purely upon computation, unless you combine it with a very wacky and sophisticated data structure I'm not imagining right now.

Niseg · « **Reply #352 on:** April 27, 2011, 03:56:11 am »

doh my response got erased

.

anyways I said :

- I looked at some of the files and without optimization like a passible map it would be HFS for the CPU.
-Sorting creatures by start and destination may help with caching using compare ={return hur(s1,s2) -hur(d1,d2} .stat from center go up x, go up x then go up x+1 , go down x+2 etc.

I also started writing about my progress so I made a new maze .

results a* 542 visits advanced room pathing 23 node visits.
here are times from chrome timeline in java console (just runtimes which are yellow):

`a*-max(draw)`	`room-min(draw)`	`a*`	`room`	`redraw`
139	7	184	37	40
85	19	130	49	33
86	9	131	39	45
96	12	141	42	30
97	11	142	41	38

Still that A* function I got isn't optimized and doesn't insert nodes in order.

Edit:
After messing with this maze I found out that my A* needs fixing because it isn't pushing far nodes on my room grid , while it should... $:-\$ I should be fixed by tomorrow.

EDIT: FIXED! now I need to add some kind of room spliting and node trading algorithms. someone managed to fool my program to go a silly distance when the destination was around the corner

. I can overcome it by splitting rooms that are way too big and then comes some update scheme ... I can just recreate them a second pass won't be that bad.

tps12 · « **Reply #353 on:** April 27, 2011, 05:20:45 am »

Quote from: NW_Kohaku on April 26, 2011, 09:14:37 pm

It's not assumed, it's known that searching very large vector matrices thousands of times takes a lot of memory fetches.

That part is known. The assumption is that that phenomenon is the main factor determining the performance of the current pathfinding algorithm.

EDIT: Or maybe "hypothesis" is a better word. It's based on reasonable assumptions, and fits previous observations, but has yet to be tested against alternative hypotheses.

kaenneth · « **Reply #354 on:** April 28, 2011, 04:56:33 pm »

So, not being totaly up to date on processor designs, is the time spent waiting on a memory fetch nearly completely idle on modern CPU's? or do other threads whos code/data are in cache get a chance to run? (and only 'hardware' threads, like Intel's 'Hyperthreading', or OS level threads/processes?

I'd love to see some benchmark results for DF on otherwise indentical CPU's except for L1 cache size.

NW_Kohaku · « **Reply #355 on:** April 28, 2011, 05:41:05 pm »

Quote from: kaenneth on April 28, 2011, 04:56:33 pm

So, not being totaly up to date on processor designs, is the time spent waiting on a memory fetch nearly completely idle on modern CPU's? or do other threads whos code/data are in cache get a chance to run? (and only 'hardware' threads, like Intel's 'Hyperthreading', or OS level threads/processes?

I'd love to see some benchmark results for DF on otherwise indentical CPU's except for L1 cache size.

The whole point of threads and "slicing" is to transfer processor power onto another task while waiting for a memory fetch. This is something that has existed for decades. Thing is, DF is not multi-threaded, so all this means is that your processor will spend its time working on all the other programs running on your computer during this time. (And please don't make this a multi-threading argument... there are enough threads on multi-threading already.)

Modern processors have the ability to make guesses about what data will be fetched, and just keep on processing based upon that guess, and then check if the guess was correct or not later on, when the memory fetch comes back... this is why I was talking about the problem of giant vector matrices making the data almost random, though. The guesses are almost never correct, and all that processing based upon an assumption gets tossed out as soon as the fetch comes back, and proves it was a poor guess.

Quote from: tps12 on April 27, 2011, 05:20:45 am

Quote from: NW_Kohaku on April 26, 2011, 09:14:37 pm
It's not assumed, it's known that searching very large vector matrices thousands of times takes a lot of memory fetches.

That part is known. The assumption is that that phenomenon is the main factor determining the performance of the current pathfinding algorithm.

EDIT: Or maybe "hypothesis" is a better word. It's based on reasonable assumptions, and fits previous observations, but has yet to be tested against alternative hypotheses.

Well, if we are agreeing that memory fetch time is a bottleneck, but not that it is the bottleneck, then what it comes down to is if there is something out there that is an even worse bottleneck that causes the CPU to burn tremendous numbers of cycles trying to calculate something when the math is actually fairly simple for a computer.

A* is simple, processor-wise. You generally just need to perform simple addition on a very small set of numbers, and test whether X, Y, and Z distance to destination is positive or negative or zero to determine preferred direction. There's several if-then-else-then comparisons to go with that. It's node accesses that consume time waiting on memory fetches.

It's theoretically possible that Toady did something extremely wasteful in coding how the game crunches numbers in something that is based on A*, but keep in mind that it takes about a million cycles of the CPU to equal one wait for a memory fetch, so it's hard to imagine someone doing that by accident.

tps12 · « **Reply #356 on:** April 29, 2011, 06:57:35 am »

Quote from: NW_Kohaku on April 28, 2011, 05:41:05 pm

Well, if we are agreeing that memory fetch time is a bottleneck, but not that it is the bottleneck, then what it comes down to is if there is something out there that is an even worse bottleneck that causes the CPU to burn tremendous numbers of cycles trying to calculate something when the math is actually fairly simple for a computer.

Exactly. And if you're right about the bottleneck then it's easy to contrive test cases that demonstrate the pathological case. Not only do you confirm that you're solving the right problem, but you end up with a baseline reference against which to test your solutions.

Niseg · « **Reply #357 on:** April 29, 2011, 03:13:09 pm »

I'm happy to inform everyone I just wrote a non path caching 2 layer A* (it's called " advanced rooms no path cache") that simplify pathfinding cases to its most trivial cases. Room shapes and handling need a lot of work but it's not as important because it works with any shape room. It's also a very simple program. the only complexity in it is A* and it's changes (not that bad either)

In my early example A* finds a 110 length path visiting 813 nodes (it's really lost). My algorithm finds it in 203 this may sound bad compared to path caching but the big difference is that it finds to finding 13+1 paths with an average length of about 8 steps . The "destination unreachable" penalty is also really low about 16 node visits ( I got 16 rooms ).

The way it works is simple : it uses BFS collider to find rooms and saves their shape (square + bit vector). The collider also save the first collision point of the path ( I think I need to move it but it's not important) . The info about the destination id now also saves the collision point which means the room know at what coordinate is closest to the other room.
After building all the rooms all it does to get from S to D is find the path between the rooms

S and D exist in and then it pushes S to the output_path. There is a loop that looks for a path from the end of the output_path to the next room's access point then it concatenate the path to the output_path but it slice the last element for it.

With this system all a mob need to do is save the overlaying path and then calculate paths as it go from room to room . It's so trivial it runs as fast as the room pather with the caching.

Be aware though - you we have to recreate rooms whenever you change the maze. As I said I'm not updating the room structure when the maze is changed yet. It may still work if you keep access points between rooms.

you can check other examples it should work you may get unusual results because I just save the first access point and doesn't consider others in any way.

EDIT: I'm marking the subpathes now with 'O'
you can see it here

Spoiler (click to show/hide)

devek · « **Reply #358 on:** May 01, 2011, 03:01:17 am »

Creating nodes(waypoints) over 2d space is pretty trivial, but you have to do it in 3d space.

Once you're done, you might as well be generating vectors. With a mesh you can do things like create monsters that take up more than one tile. Imagine a dragon that can't get in your fort because it is too tall. Imagine siege engines that can shoot downwards. Imagine a dwarf grabbing a nearby stone instead of crossing the entire map.

I brought this up once but you shot it down because it had to be generated and changed each time the map was modified, now that is exactly what you're talking about

Niseg · « **Reply #359 on:** May 01, 2011, 07:59:02 am »

Quote from: devek on May 01, 2011, 03:01:17 am

Creating nodes(waypoints) over 2d space is pretty trivial, but you have to do it in 3d space.

doing this in 3d space is trivial it's also not that necessary because you can keep 2d space per level and keep looking per z level . It might be a lot of memory but it's not too bad. I can convert my simulator to work in 3d but the transitions between Z levels would need a modification to alow it to know if there are "stairs" or something like that in the node.

The no cache algorithm acts as a guide to tell A* where to go .

Quote from: devek on May 01, 2011, 03:01:17 am

Once you're done, you might as well be generating vectors. With a mesh you can do things like create monsters that take up more than one tile. Imagine a dragon that can't get in your fort because it is too tall. Imagine siege engines that can shoot downwards.

The width can be handled easily by saving multiple access points and their width. I'm not sure A* can handle width anyways.

Quote from: devek on May 01, 2011, 03:01:17 am

Imagine a dwarf grabbing a nearby stone instead of crossing the entire map.

Nearby stone is simplified by this algorithm because if you need to find a path for each stone you will pay a smaller penalty for unreachable objects.

Quote from: devek on May 01, 2011, 03:01:17 am

I brought this up once but you shot it down because it had to be generated and changed each time the map was modified, now that is exactly what you're talking about

The fact the my map has to be regenerated with each modification is because I didn't finish the update scheme yet. It's not that bad.

If I know all the nodes that change I can check each room to see if something changed. I still need to fix my collider so it limits the depth correctly. be aware that you can still change the maze without updating as long as you don't change the access points. There are many ways to implement an update scheme I"m just starting it.

About meshes .. I can convert my rooms into meshes using some interpolation scheme or something maybe after I split them up( or find the shapes using another way). I'm not sure what benefit would come of this because A* is not that bad when the distances are really small. I should add another monitor to A * which would say how many neighbors were pushed to the queues . this would show you the insanity in super long path A*.

I can also add a traffic zone handling to the no cache room pathing algorithm. It can look at traffic zones when doing sub-paths (it's really trivial). The update scheme is a more important part of the simulation at the moment and after that's done it would be a pretty good alternative for super long A* paths. The solution I'm presenting is a simple application of the "divide and conquer" approach.

News:

Author Topic: Improving Pathfinding. (Computer sciency) (Read 60904 times)

NW_Kohaku

Re: Improving Pathfinding. (Computer sciency)

devek

Re: Improving Pathfinding. (Computer sciency)

Niseg

Re: Improving Pathfinding. (Computer sciency)

tps12

Re: Improving Pathfinding. (Computer sciency)

devek

Re: Improving Pathfinding. (Computer sciency)

Niseg

Re: Improving Pathfinding. (Computer sciency)

NW_Kohaku

Re: Improving Pathfinding. (Computer sciency)

Niseg

Re: Improving Pathfinding. (Computer sciency)

tps12

Re: Improving Pathfinding. (Computer sciency)

kaenneth

Re: Improving Pathfinding. (Computer sciency)

NW_Kohaku

Re: Improving Pathfinding. (Computer sciency)

tps12

Re: Improving Pathfinding. (Computer sciency)

Niseg

Re: Improving Pathfinding. (Computer sciency)

devek

Re: Improving Pathfinding. (Computer sciency)

Niseg

Re: Improving Pathfinding. (Computer sciency)