Bay 12 Games Forum
Dwarf Fortress => DF Suggestions => Topic started by: Magnani on October 19, 2020, 06:38:56 am
-
Hello, I'm an IT student and I'm into HPC and multi-threading.
I don't participate too much in Forum and social activities, because I have no time, but today I've done the effort and seached a bit about WHY dwarf fortress has no MT support.
In the end (let me write a bit summary) there are two main reasons:
-ToadyOne is not used to parallel code and debugging;
-the refactory of the code doesn't worth the efforts, because it will be likely be written and modified.
I'm aware of all of this, so I just wondering: why not doing a harmless, minimal parallelization?
OpenMP framework allow to do that with there low effort and minimal code modification (almost no code modification itself, just a little #pragma line)
The very firsts examples we faced at university were about for-loop: with the single line #pragma omp parallel for, thread are created, work is assigned and performance rises!
Toady would not have to spend months and months in the hard magic of fine-tuned parallelism (and believe, it's REALLY HARD to master), nor to learn advanced concept of parallel coding, but just to find normal loop without dependencies and add a single pragma line.
I think this would be a "minimum effort" approach with a significant gain, since I GUESS (since I've not seen the code) there are a lot of loop that can advantage of that.
Sorry for any error in my first post and for any possible repost, but in my search I didn't find this specific suggestion.
Losing is fun to everyone from Italy!
-
Toady mentioned in a DFTalk or and interview or something being suggested OpenMP, and I think he even mentioned putting a few #pragmas here and there relating to it.
Personally I wouldn't do #pragma omp parallel for the most part, because C++17 has features that replace it, and Microsoft's all up-to-date on that. Stuff like:
std::for_each(std::execution::par_unseq,v.begin(),v.end(),[](auto &arg) { // stuff });
Which will tell the compiler that the stuff in the loop may be done in parallel, and interleaved, such that e.g. thread 1 can be given jobs #3, 7, and 25 rather than #1, 2, 3. It can also ignore this, mind.
performance rises!
(note: performance does not rise unless the work done is greater than the overhead of parallelization. This is usually approximately 2 microseconds, which, yeah, does indeed mean that performance is likely to be gained here and there. The programmer is fully responsible for any synchronization issues parallelization may cause.)