Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  
Pages: 1 2 [3] 4 5

Author Topic: Hardware and Game Performance  (Read 9766 times)

Pseudo

  • Bay Watcher
    • View Profile
Re: Hardware and Game Performance
« Reply #30 on: December 10, 2015, 10:16:27 am »

Thief, there are incremental pathfinding algorithms that work with changing geometry. For instance, D* lite.

If you want to get fancy, there are even incremental pathfinding algorithms that work with changing geometry and changing targets! (MTD* lite, for instance)
Logged
The lady the dog the paper the boy writ hit bit knit.

English is weird.

miauw62

  • Bay Watcher
  • Every time you get ahead / it's just another hit
    • View Profile
Re: Hardware and Game Performance
« Reply #31 on: December 10, 2015, 11:12:20 am »

so if a dwarf is pathing he does not get hungry, sleepy, bleed, etc?
it would still apply, the only thing it does is dump the time it takes to come up with a path onto another thread, and that time is translated into that specific dwarf literally waiting (hence the "paused") for an answer. its not like hes frozen in time
you are not processing the dwarf for however long it takes to compute a path, so they would not get hungry. hungry is just an arbitrary example. say that a dwarf ticks down nutrition every tick, and if it gets too low the dwarf generates a "path to food" task. how would you resolve this?

multithreading in large oo projects is a huge complicated mess because everything affects everything else and you have to carefully take apart things so they dont blow up. saying that multithreading df would be simple is just ignorant.
Logged

Quote from: NW_Kohaku
they wouldn't be able to tell the difference between the raving confessions of a mass murdering cannibal from a recipe to bake a pie.
Knowing Belgium, everyone will vote for themselves out of mistrust for anyone else, and some kind of weird direct democracy coalition will need to be formed from 11 million or so individuals.

Pseudo

  • Bay Watcher
    • View Profile
Re: Hardware and Game Performance
« Reply #32 on: December 10, 2015, 12:26:30 pm »

Ah. You are mistaken as to what exactly gets dumped onto another thread. It's just the pathfinding, not everything.

Roughly speaking, each dwarf has a slot for a path, and a slot for the current target. You replace any instance of pathfinding being called on the main thread with updating the current target, and if the target changed, scheduling a pathfinding request for the dwarf and invalidating the path. Oh, and when the dwarf would move, check if the path is valid, and if not, don't move. There are some subtleties (e.g. is it worth it to allow pathfinding requests to be cancelled?), but it's doable.

This does have the disadvantage of causing pathing to become dependent on current lagginess, but meh. DF isn't exactly deterministic to start with.

The bigger problem is overhead.
Logged
The lady the dog the paper the boy writ hit bit knit.

English is weird.

Thief^

  • Bay Watcher
  • Official crazy person
    • View Profile
Re: Hardware and Game Performance
« Reply #33 on: December 10, 2015, 12:32:37 pm »

Thief, there are incremental pathfinding algorithms that work with changing geometry. For instance, D* lite.

If you want to get fancy, there are even incremental pathfinding algorithms that work with changing geometry and changing targets! (MTD* lite, for instance)
Sure, but they generally assume that the geometry is not changing in the middle of the pathfinding code actually running. They run in discrete increments, with the changes expected to happen between those increments. That's still no good for threading without cloning the world, which is most likely hilariously expensive for DF.

Professional games coder here, remember :P
« Last Edit: December 10, 2015, 12:34:54 pm by Thief^ »
Logged
Dwarven blood types are not A, B, AB, O but Ale, Wine, Beer, Rum, Whisky and so forth.
It's not an embark so much as seven dwarves having a simultaneous strange mood and going off to build an artifact fortress that menaces with spikes of awesome and hanging rings of death.

Pseudo

  • Bay Watcher
    • View Profile
Re: Hardware and Game Performance
« Reply #34 on: December 10, 2015, 01:01:15 pm »

Eh, that's fairly simple to overcome. The main thread wants to change the world, it alerts the pathfinding threads that it wants to, waits for them all to get to a safe point, then modifies the world, then tells them to resume.

Alternatively, main thread has a(n atomic) counter. Every time the main thread changes the world (in such a way it affects pathfinding), it increments the counter. The pathfinding threads check the counter at the start and end of each increment, and if it has changed, restarts the increment. It's actually a little more complex than that (you need memory barriers and two counters, one incremented before the change and one after, lest you get hit with race conditions), but that's the general idea. Can fall back to the above if threads abort and retry too often.
Logged
The lady the dog the paper the boy writ hit bit knit.

English is weird.

Miczu

  • Escaped Lunatic
    • View Profile
Re: Hardware and Game Performance
« Reply #35 on: December 10, 2015, 04:59:20 pm »

I wounder if it would be smarter to just expose an interface for path finding and hope the community brute force the optimal solution.
It probably would require Toady One to do necessary changes to allow swapping the path finding library without requiring to recompile the game or at least separate all path finding calls to that library.

There could be at least 4 possible results from it:
1. Community doesn't create anything better that we already have - Toady One would sadly waste time providing the back door for this. Time invested in it should be still significantly less then learning the complexities of multi threading and/or redoing the path finding.
2. Community implements more efficient single threaded path finding algorithm - overall win, probably good ratio of invested time to performance increase.
3. Community reuses existing path finding, but builds multi threading around it - big win, most modern cpu's have multiple cores, so it would directly increase raw computation power available for DF.
4. Community implements different path finding with multi thread capability - big win (as in 2nd and 3rd result).

Don't think anyone is against modding, as there are many great projects that already change or hack into inner working of dwarf fortress (DFHack).
Possible gain probably outweigh initial cost and the risk (assuming the path finding is in fact big bottleneck in the simulation frame rate).
Hopping for community to do something could possibly cause bad taste, but if community is willing and taking the first step in this, maybe this isn't as bad.

I'm myself a software developer, but I'm not ignorant enough to say I would be the one that will implements best (or even as good) path finding module (limited free time and experience).
But if it was possible and entry to try not to high, I would definitively try my strengths (seeing dwarfs running into or through wall could be fun enough). Guessing there would be others with similar attitude.
Logged

BoogieMan

  • Bay Watcher
  • Hi
    • View Profile
Re: Hardware and Game Performance
« Reply #36 on: December 10, 2015, 08:11:28 pm »

When a unit chooses to move, what exactly is the process - put simply? Does it do a scan from the unit's current location to the destination and then calculate a path and begin? How often does it recheck for obstructions or alternate routes?

Would something like taking a periodic snapshot of the layout of the map and then telling the unit something like East 12, South East 4, East 3, North 12, and then West 3 to get to the destination, and only ask for help if something occurs that prevents the instructions from being followed exactly?

Like if the pathing, at least a times, isn't the unit "using it's senses" and is more like a blind unit being told X number of paces in what direction to arrive? A file that could be written that is sort of a map of available pathing and how it all links up and it updated when certain things are done cause a change, such as digging, placing objects, etc.

Of course I have no idea how it works so this could just all be silly nonsense.
Logged
(╯°□°)╯︵ ┻━┻ BoogieMan, Forumscrub cancels tantrum: Seeking Dr. Pepper

Pseudo

  • Bay Watcher
    • View Profile
Re: Hardware and Game Performance
« Reply #37 on: December 10, 2015, 09:04:01 pm »

That's actually pretty close to how it currently works.

Currently when a unit wants to move somewhere, it checks to make sure that the start and destination are in the same connected component (more on that in a moment). If it is, it calculates the path and stores it. (if they aren't it knows that there is no path.) Then every tick it replays from the stored path. If the creature tries to move into an invalid square (e.g. door closed, etc) it waits for a few ticks then calculates a new path. (Large amounts of lag from creatures in weird situations (like the former "pets through tightly closed doors" one) is often due to DF thinking that there is a path but being unable to actually find one (hence "path fail" errors in DF's errorlog) - every time it paths but there isn't actually a path it ends up scanning every connected bit of the entire map!)

The connected components thing is a little weird, but just think of it like this. Every tile has an identifier indicating what connected component it is in. When a tile is made pathable, it sets all adjacent connected components to the same ID. When a tile is made impassable, it floodfills each adjacent tile to a new ID. It's a little more complex than that, but it's just a little optimization (avoiding the entire floodfill unless it actually needs to), nothing that changes the core of the idea. (This is why a door / drawbridge rapidly opening and closing to a large dead-end space can cause a lot of lag, by the way.)

And, interestingly enough, although there are pathfinding algorithms that have "sort of a map of available pathing and how it all links up", in practice it isn't really worth it, weirdly enough.

I suspect that Toady could make vast improvements to pathfinding if he decided to. But it doesn't make a lot of sense to do so until you've figured out how things move! (For instance, jumping and climbing changes pathfinding more than it would first appear.)

Oh, and a thought. Anytime D* lite is a thing (roughly speaking, a variant of D* lite where you can stop at any time and get a (potentially suboptimal) route. Could be interesting to go "every dwarf has 100 pathfinding steps / tick" or something.
Logged
The lady the dog the paper the boy writ hit bit knit.

English is weird.

guessingo

  • Bay Watcher
    • View Profile
Re: Hardware and Game Performance
« Reply #38 on: December 10, 2015, 09:16:21 pm »

What is involved in converting DF from 32 bit to 64 bit? Isn't it just a recompile with a 64 bit compiler?
Logged

guessingo

  • Bay Watcher
    • View Profile
Re: Hardware and Game Performance
« Reply #39 on: December 10, 2015, 09:18:55 pm »

You have to remember that Dwarf Fortress is essentially written in C. (Yes, I know it's actually C++, but it's still paradigmatically C and a lot of the base code was written in C proper.) That means manual thread management. In Java you can lock an object and all the threads that want to access it will just wait; in C, you've got to tell each thread that might try to use the thread-unsafe thing that it needs to check and see if it's allowed to do the thing.

That said, I wish Toady would go ahead and bite the bullet. The longer he puts it off, the harder it's going to be.
boost::threads would do the job nicely with very little learning.

People should really understand the following though: multithreaded code is not that hard to design or write (although it's quite tricky to retrofit, even in a frame-based game). It is, however, a massive pain to debug compared to single-threaded code. Quite a lot of multithreaded programs spend a good portion of their development phase as single-threaded, especially while new features are being developed. Otherwise every problem takes at least twice as long.

Right now, my adventures are being cut short when I run into a tunnel full of horses. Unless I buy a 3600-core CPU to handle the lag, that's where the effort is rightly focused.

It says all about the alpha on the screen you ESCed past.

Toady has said he is concerned about switching to threading due to complexity and that it will take a long time. He mentioned that if he makes the game multi threaded it could be a long time between releases and the release won't add features. He mentioned that he is worried that donations may taper off if that happens. I can understand why he would be so worried.
Logged

BoogieMan

  • Bay Watcher
  • Hi
    • View Profile
Re: Hardware and Game Performance
« Reply #40 on: December 10, 2015, 09:40:42 pm »

I can understand something being yours, but sounds like the best option would be to contract that part out or take a part timer who just gets a flat rate or something. Or trust that enough of us would stick around.

Hasn't it already been like upwards of two years for a release before? Would it really be worse than that? More features is always good, but I don't know about all of you but more of my forts die do to performance problems rather than anything else. It can get just too slow to bear. So then you restrict yourself to increase performance, essentially ignoring features for fear of the extra load. I've already changed my playstyle away from nifty aesthetics and to more plain efficient design, smaller embarks, less use of flowing water, magma, fewer animals/pets.. Some cities in adventure mode are impossible to navigate due to insane lag.

When features aren't utilized to try and gain performance, maybe new features isn't always the way to go. I think more information would be needed to form an educated opinion on it though. A tricky situation for sure.

Still doesn't change the fact that Dwarf Fortress is one of my favorite games of all time. :)
Logged
(╯°□°)╯︵ ┻━┻ BoogieMan, Forumscrub cancels tantrum: Seeking Dr. Pepper

athenalras

  • Bay Watcher
    • View Profile
Re: Hardware and Game Performance
« Reply #41 on: December 11, 2015, 03:17:34 am »

Regardless of what Toady's fears concerning donations and an active audience, this is a very real problem that Toady will have to face sooner or later. Here's my short list:

- Technology is not built around Dwarf Fortress
- Donations from the general, 'average' populace
- Each subsequent DF release sees a slower performance
- Toady is working on a 64-bit release
- Toady works alone...sometimes with his blood relative
--------------------------------------------------------------------------------------------------------------------------------------------------
In a macro-sense, there are two paths that Toady can choose:
1. Retrofit for multi-threading and utilizing multiple cores
2. Choose not to retrofit for multi-threading and utilizing multiple cores

What Toady fears with going with choice 1.
[Very long time to retrofit] -> [much less frequent updates] -> [Fewer audience] -> [Fewer donations] -> [?]

Problems I see with going with choice 2.
[More frequent yearly, bi-annual or centennial releases] -> [Slower performance per subsequent release] -> [DF becomes near unplayable] -> [Fewer audience] -> [Fewer donations] -> [?]

Putting aside the hilarity of the linearity of the flowcharts above, what I'm trying to say is that without changing any of his work ethics, Toady and DF are likely to encounter the same ending.

That said, I might understand some rationale behind not releasing any of his coding being about intellectual property. Unfortunately the concept of the protection of intellectual property never managed to establish a firm foothold in my head but I do imagine there may be other ways to conscript a helping hand for either programming or to help maintain public relations while DF undergoes modernization.
Logged

Thief^

  • Bay Watcher
  • Official crazy person
    • View Profile
Re: Hardware and Game Performance
« Reply #42 on: December 11, 2015, 04:06:25 am »

Eh, that's fairly simple to overcome. The main thread wants to change the world, it alerts the pathfinding threads that it wants to, waits for them all to get to a safe point, then modifies the world, then tells them to resume.

Alternatively, main thread has a(n atomic) counter. Every time the main thread changes the world (in such a way it affects pathfinding), it increments the counter. The pathfinding threads check the counter at the start and end of each increment, and if it has changed, restarts the increment. It's actually a little more complex than that (you need memory barriers and two counters, one incremented before the change and one after, lest you get hit with race conditions), but that's the general idea. Can fall back to the above if threads abort and retry too often.
With the former I'd worry that the locks become a bottleneck, particularly given that some parts of the main thread are doing nothing but world updates (e.g. the water/magma simulation step). That said, I imagine large amounts of the main thread aren't doing updates, so you may well be able to get away with locks.

The latter is hilariously dangerous, changing the world while the algorithm is running on another thread could cause all sorts of crashes if the data it reads contains pointers or indices and you get partial writes/reads (e.g. due to unaligned data) or stale reads. Having an atomic counter that tells you the world has changed is no good if the code has already crashed!

I think personally the best approach would be an incremental algorithm, or even just terminating the pathfinding after a fixed amount of exploration to avoid the terminal case of flood-filling the entire map with the pathfinder.

Another possibility that may work well would be to queue all pathfinding requests and then execute them across multiple threads at a single point of the main thread. No concurrent updates of the world data from the main thread, and pathfinding still gets a lovely speed-up. The pathfinding itself wouldn't modify the world data, and read-only sharing is perfectly safe :)

If pathfinding's really that big of a bottleneck, you could make decent gains from that.

What is involved in converting DF from 32 bit to 64 bit? Isn't it just a recompile with a 64 bit compiler?
That, and fixing any assumptions in the code about the size of data pointers. e.g. replacing a hardcoded "* 4" when allocating memory with "* sizeof(int*)".
Logged
Dwarven blood types are not A, B, AB, O but Ale, Wine, Beer, Rum, Whisky and so forth.
It's not an embark so much as seven dwarves having a simultaneous strange mood and going off to build an artifact fortress that menaces with spikes of awesome and hanging rings of death.

guessingo

  • Bay Watcher
    • View Profile
Re: Hardware and Game Performance
« Reply #43 on: December 11, 2015, 07:10:27 am »

If you want Toady to make a multi-threaded release. Create a kickstarter in his name and raise a bunch of money. Right now DF raises between $45,000 - $60,000/year. This is pay without benefits and split between 2 people. He also has expenses to cover for his website, etc... This is far less than an entry level developer makes.

Raise $100k for Toady under the condition that he make a multi-threaded release then get back to him. Cut the guy some slack. Financially he is barely getting by doing this.
Logged

Pseudo

  • Bay Watcher
    • View Profile
Re: Hardware and Game Performance
« Reply #44 on: December 11, 2015, 08:29:04 am »

What is involved in converting DF from 32 bit to 64 bit? Isn't it just a recompile with a 64 bit compiler?
In theory, yes. In practice, a whole lot more than that.

Among other things:

  • Hidden assumptions in pointer arithmetic
  • Sudden potential over/underflows due to lengths (array and otherwise) being stored as 32-bit values
  • Casting pointers / pointer offsets to 32-bit values
  • Hidden assumptions about sizes of structs
  • External libraries that do any / all of the above
  • Other hisenbugs that become more common due to compilation differences

Most of these are (relatively) easy to avoid, if you are trying to avoid them from the start. But it's a whole lot harder to retrofit into an existing codebase, especially as many of the above aren't always reproducible.

With the former I'd worry that the locks become a bottleneck, particularly given that some parts of the main thread are doing nothing but world updates (e.g. the water/magma simulation step). That said, I imagine large amounts of the main thread aren't doing updates, so you may well be able to get away with locks.
You should, of course, tune the number of operations / amount of time between grabbing and releasing the lock. E.g. grab the lock, do all water / magma / cavein updates, then ungrab the lock.

The latter is hilariously dangerous, changing the world while the algorithm is running on another thread could cause all sorts of crashes if the data it reads contains pointers or indices and you get partial writes/reads (e.g. due to unaligned data) or stale reads. Having an atomic counter that tells you the world has changed is no good if the code has already crashed!
Of course it'd crash if you potentially followed invalid pointers, and I am aware of amusements with partial / stale / unexpected reads. I wasn't explicit, whoops. The pathfinding threads never follow (those) pointers. You have an array of bits (one / tile / passibility type) indicating passibility of each tile. The main thread (atomically) increments counter A, does a store fence, does (an) update(s) to the passibility array, then does another store fence, then (atomically) increments B. (If B wraps, you also need to do a store fence here at the end as well.) The pathfinding threads (atomically) load counter A, then issue a load fence, then work with the passibility array, then issue another load fence, then (atomicially) loads counter B and retries if they aren't the same.

Give or take, at least. I'd have to sit down to actually figure out the exact memory barriers required. (All that's required is that the write to A is visible before any writes to the passibility array, and the write to B is visible only after said writes. And ditto for pathfinding but for reads.) And "atomic" here for incrementing just means "no intermediate values can be visible". So more like a read of <value>, then an atomic set to <value + 1> than an actual atomic increment. (As only one thread is setting it anyways.)

I think personally the best approach would be an incremental algorithm, or even just terminating the pathfinding after a fixed amount of exploration to avoid the terminal case of flood-filling the entire map with the pathfinder.
Agreed with incremental. With termination - again, the connected components thing prevents that worst-case. The problem is that bugs happen, and occasionally a bug gets through that makes the connected components not match up with actual pathfinding.

Another possibility that may work well would be to queue all pathfinding requests and then execute them across multiple threads at a single point of the main thread. No concurrent updates of the world data from the main thread, and pathfinding still gets a lovely speed-up. The pathfinding itself wouldn't modify the world data, and read-only sharing is perfectly safe :)

If pathfinding's really that big of a bottleneck, you could make decent gains from that.
It'd certainly be a whole lot more simpler than the full version. I worry about the overhead, however. 60FPS is ~16.7ms / frame.

Actually, there are a number of things you could do in that same manner. For instance, power calculations and cavein detection (not the actual cavein itself). Maybe even creature decisionmaking. Although I wonder how much of a bottleneck those are.

Logged
The lady the dog the paper the boy writ hit bit knit.

English is weird.
Pages: 1 2 [3] 4 5