Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1] 2 3 4

Author Topic: Advanced Tweaking (Warning: !!SCIENCE!!)  (Read 5809 times)

janglur

  • Bay Watcher
  • +Blood Soup+
    • View Profile
Advanced Tweaking (Warning: !!SCIENCE!!)
« on: September 25, 2010, 06:39:31 pm »

Part 1:

So I have been doing a LOT of research into what DF seems to prefer in terms of hardware and software settings, and i've uncovered a few tips to squeeze that extra 1% out when you really need it.
I am only testing with Vanilla DF, the standard map generation settings, nothing changed or disabled.  This is purely a test for research on the PC and hardware itself, not modding DF.

The Threads Dwarves Weave:
As we all know, DF is largely single-threaded, which has led to a lot of myths to study regarding it's performance.  What i've discovered for tweaks are as follows:
DF enjoys being dedicated to a core.  Simply saying 'Stick to core 2' will improve performance a good 5-10%, though this is largely noticeable with pathfinding and only at already high (100+) FPSs.
This can further be enhanced by un-dedicating every other system processes and applications (with windows task manager or, better Process Explorer) to a different core, leaving DF it's onw dedicated CPU and cache.  This gives another 5-20%, depending wildly on how much crap you actually have running, how much cache your cores have dedicated, and the phase of the moon.

Stockpiling Data:
Based on the above, a logical theory is that having more dedicated cache would improve performance.  I tested this theory by setting the L3 cache allocation to 'All Cores' and 'DSP only'.  DSP Only is a setting meaning only the main core (core0) has access to using L3 cache.  Thoretically, in my system, this means an extra dedicated 6 MB of cache for DF.  However my testing showed quite the opposite result:  FPS actually dropped by a tiny 2-4% margin on initial embarks, and then became quite even after the '100,000 stone' test.  Thus suggesting allowing the L3 to be used by other cores is better.  I hypothesize this is because what can't fit into L1 and L2 goes to RAM if the L3 is not accessible, creating more cache competition in RAM.  More on this later.

Gangs and Undead Gangs:
I also tested whether Ganged and Unganged memory modes (on 2 sticks of Dual Channel memory) would work better.  First, an explanation.  Ganged and Unganged refer to CPUs with multiple onboard memory controllers. Namely, my Phenom II has two dedicated memory controllers.  These settings refer to how they are accessed.  Unganged means that your memory is effectively divided in half, and each of the controllers address it on a 64-bit lane.  So, 2x64bit.  Ganged means they work together, on a single 128-bit lane.  What this means in performance is that Unganged typically benefits multi-threaded tasks or extreme multitasking, while Ganged benefits single large transfers or single-threaded heavy duty apps like Dwarf Fortress.  My testing concluded almost no gain on initial embark, but a major gain in ultra-large (12x12+) maps and cluttered (100,000+ stones) tests.  This may be also due to RAM competition.  But Ganged is definitely better, by a 5-15% margin in large or older fortresses.

Lost but Not Forgotten:
Interestingly, although DF is single-threaded, it relies on hardware to perform many tasks.  And much of your hardware depends on the CPU.  As such, overclocking only the DF core is not the best solution:  Overclocking the not-in-use cores also helps!  Depending on the difference, it can be as much as a 60% FPS boost!  (This was testing with a 3.4 GHz core [DF dedicated] and two 100 MHz cores to save power and reduce heat.)  FPS suffered horribly with the other cores set to power save modes, so disabling Cool'n'Quiet or other features will help, as will synchronous overclocking.  This makes most effect on pathfinding and doesn't seem to affect clutter or map size speeds much.

RAM Competition:
The major killer of fort FPS for me has not been pathfinding, catsplosions, or even liquids, but in fact clutter!  Noticing the resource usage of DF over the course of a fort's 10 years, massive stone stockpiling (and crafts, and food, and waste, etc.) always crippled the system irrecoverably.  This is stemmed from simple RAM usage:  Not only does RAM usage increase from the much higher number of entries (though not as drastically as I expected), the number of read/writes, especially reads, on RAM increased astronomically.  It went from only a few tens of thousand a second (a few MHz) to a full bottleneck at well over 1 billion a second (Full 1066 MHz and up).  THIS is what has been behind the issue.  So I began testing RAM.  The first thing I noticed was that increasing the RAM's frequency, such as overclocking, made a fairly linear improvement.  A 5% increase in speed resulted in a 5% increase of reads, and about 1% in FPS.  However, I then clocked to 800MHz and dropped the timings from 5-5-5-15-30-2T to 4-4-4-12-26-2T, and the increase was much more noticeable.  While the reads/sec went down, the FPS went up!  I'm not entirely sure how this works, logically, and the drop from 1066 to 800 did cause overall damage to FPS.  So I went for a middleground.  Attempting to keep the FSB/NB, HT and CPU at 2.8 GHz, I clocked the RAM to 917 MHz at 4-4-4-14-28-2T timings, and got the best performance!  It beat both 800 MHz at 4 and 1066 MHz at 5.  This suggests that DF needs not only hefty througput, but moreso, rapid response time from memory and the memory controller!  Take this as you will.  The overall benefit was an increase from 20-24 to 22-28 FPS.  Not huge, but nice.
Reducing the number of services, programs, apps, etc. running will also reduce memory competition as less and less stuff takes up DF's insatiable consumption of accesses.

Give It A Break Already:
Although there does not seem to be a memory leak, DF gets a fair boost in FPS if you just save and exit (the client completely) every so often.  How frequently this makes a difference depends on your system's speed, as our P4 and Celeron test systems didn't benefit until many hours of play, while the Phenom II I use personally benefits after around 3-4 hours.  I'm not sure why this is.  Take it as you will.  The benefit is small, and proportional to how long since you closed it.  It increased FPS from 3-4 FPS to 15-30 FPS after I restarted from a 72-hour long nonstop Dwarfathon.  (There was also a lot of unattended Fun).  Readings were taken before and after the restart to ensure it wasn't from a sudden drop in pathfinding on dead dwarves or something trapped somewhere.

Goblin Processing Unit:
Another, albeit tiny, boost is in the videocard.  Using different driver versions on your system can make various improvements and detriments, and seems to vary from system to system and card to card.  I can't give a very solid prediction on this.
However the GPU is a key point.  A faster GPU of identical architecture gets a better FPS (and G_FPS) when operating at a higher frequency.  The test was done across an ATI 1300 and an AT 4830.  The 4830 was tested at 160 MHz, 300 MHz, 500 MHz, 575 MHz (default), 600 MHz, 65 0 MHz, and 695 MHz (max stable OC).  It increased linearly after 500 MHz, and drastically after 160 MHz.  We did the same test on the VRAM, and no frequency made any detectible difference good or bad, even at extreme OC and UC conditions.  This makes sense for the software rendering aspect being largely GPU based.

Go Flush Your Cold Buffer In You SCSI DASD:
Strangely, we tried enabling and disabling Virtual Memory/Swapfile/Pagefile/Disk Cache and the effects it had on DF.  I found that it is actually best to *enable* virtual memory and allocate a fair amount.  I assume this is the result of reducing the amount of clutter in the RAM and thus shaving off a bit more RAM Competition.  The effect was surprisingly substantial at a fairly steady 7-8% FPS improvement.  This has been the single best result short of CPU and RAM overclocking.



This concludes Part 1 of our tests.  Next, i'll be testing differences between equivocal Intel and AMD lineups.  This will cost more than the $80 on Part 1.  All for the love of dwarf.
« Last Edit: September 25, 2010, 06:41:07 pm by janglur »
Logged

Duelmaster409

  • Bay Watcher
  • [DOES_NOT_FIGHT]
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #1 on: September 25, 2010, 07:00:10 pm »

This thread reeks of science... I love it! This was an excellent read.
Logged
Dwarf fortress: Teaching uni level geology to sadistic elf killers for years.

Urist Imiknorris

  • Bay Watcher
  • In the flesh, on the phone and in your account...
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #2 on: September 25, 2010, 07:06:52 pm »

This is true Science. I tip my ☼adamantine helm☼ to thee.
Logged
Quote from: LordSlowpoke
I don't know how it works. It does.
Quote from: Jim Groovester
YOU CANT NOT HAVE SUSPECTS IN A GAME OF MAFIA

ITS THE WHOLE POINT OF THE GAME
Quote from: Cheeetar
If Tiruin redirected the lynch, then this means that, and... the Illuminati! Of course!

Gearheart

  • Bay Watcher
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #3 on: September 25, 2010, 07:11:42 pm »

Okay, so, what are the implications of this for those who are not quite technologically inept but fairly close to it?

I'm currently using a dual core 2.00GHz, with 4GB of RAM. The computer I recently ordered will have a 2.4GHz quad core with 8GB RAM.

How much impact will those two point alone have on the performance of DF, and what could average joe change if he wants to increase efficiency in a way which will not risk everything going FUBAR due to mistakes?
Logged

janglur

  • Bay Watcher
  • +Blood Soup+
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #4 on: September 25, 2010, 07:44:40 pm »

Okay, so, what are the implications of this for those who are not quite technologically inept but fairly close to it?

I'm currently using a dual core 2.00GHz, with 4GB of RAM. The computer I recently ordered will have a 2.4GHz quad core with 8GB RAM.

How much impact will those two point alone have on the performance of DF, and what could average joe change if he wants to increase efficiency in a way which will not risk everything going FUBAR due to mistakes?

Simple:
Turn Pagefile on.  This is usually on by default.  Allocate a fairly good size (I reccomend 4 GB).  The minimum and maximum should be the same.  This way the pagefile doesn't get (as) fragmented.

Easiest:
Use ProcessExplorer to switch *everything* to cores 0-3, and dedicated DF to Core4.

Easier:
Close all unnecessary programs, and I mean ALL.  Turn off or stop any services or applications (especially antivirus) not in use.  (Since your AV is off, it's a good idea to disable your internet as well and temporarily stop the services.)

Easy:
In the CMOS/BIOS settings, turn memory to Ganged mode and lower your RAM timings as low as it can stand.  Try with the highest first, and move them down one by one.  Test using something like Memtest86+ or Prime95 for several hours, and then repeat until it becomes unstable.  Once you find the best possible, test it for 12+ hours to ensure perfect stability.  If it fails, move it back another bit.  Adjusting timings is safe for hardware (but can cause data corruption and bluescreens in windows), so Memtest86+ is reccomended, as it doesn't touch the harddrive.  Note that changing the *frequency*, however, can cause failure of hardware.  Same for voltage.

Not As Easy:
Disable Cool'n'Quiet, Spread Spectrum, and any other power-saving or heat-reduction options.  This can cause overheating and failure of components if not thoroughly cooled.  Not reccomended on stock heatsinks that come with the CPUs.

Hard:
Manually adjust the frequencies of the CPUs, core buses, RAM, and GPU using one of many available programs.


Extra tips:
Go for dual core at minimum.  Having 3 or 4 cores makes no bigger difference.  Phenom II are the best AMD options.
After that, go for frequency.  When the 975's come out, they are going to be quad-core 3.6 GHz CPUs, stock.  These will be the best AMDs for a while.  Remember, DF is single-threaded, meaning it depends on *one* core to be very, very fast.  Also, set it to a single core, or else it slows down a LOT from rapidly switching between cores while running.
« Last Edit: September 25, 2010, 07:46:40 pm by janglur »
Logged

janglur

  • Bay Watcher
  • +Blood Soup+
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #5 on: September 25, 2010, 08:15:46 pm »

Oh, also, another note:

Using memory that is paired (sold together in one package as paired) equal to your memory controllers (2, in the case of Phenom II's) is ideal.

In the vast majority of cases, this means two sticks in dual-channel work best, in unganged mode.

Using three or more sticks, or non-matching memory will stress the memory controller on the motherboard (or CPU, in AMD's case) and can hurt performance, or worse, cause instability.

Remember, it takes twice the work to address 4 sticks than 2.

Also, smaller amounts of memory seem to work better, IF AND ONLY IF you still have adequate amounts of RAM.  Not enough RAM will KILL your FPS.  But having 24 GB is also counter-productive.  Think of it this way:
1 GB 800 MHz memory in single-channel reads at 3.2 GB/s.  So, it takes 1/3rd of a second, roughly.
2 GB 800 MHz memory in single-channel reads at about 3.2 GB/s.  So, it takes twice as long as before!  Realistically this means it takes a bit longer for the RAM to search through and find the requested data for DF.  (Not twice as long, more like 5-10%, because RAM can semi-non-sequentially read)
1 GB 800 MHz memory in dual-channel reads at 6.4 GB/s.  So, 1/6th of a second.
2 GB 800 MHz in dual channel is 6.4 GB/s.  So, 1/12th of a second.  See how that works?

It's a LOT more complex than that, but you get the idea.  Make sure you have plenty of RAM, but don't go overboard.  My 4 GB is plenty (2x2 GB)


!!SCIENCE!!:
The following diagram may help.  It uses 'made up' tliness, but are proportional to give you an idea of the differences in design and the impact they have on responsiveness, namely in how some setups can cause many more steps to access specific data.

http://207.177.39.129/Memory.JPG
« Last Edit: September 25, 2010, 08:23:59 pm by janglur »
Logged

Cyntrox

  • Bay Watcher
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #6 on: September 25, 2010, 08:35:38 pm »

I applaud your research.

However, there is one thing that irks me: You measure performance in FPS, which isn't accurate. I'll try to explain why.

Say you are running at 600 FPS. This means that each frame will take 1.6667 milliseconds (ms) to perform, because 1/600=0.0006667. Now let's say you drop to 200 FPS. This probably seems like a huge amount, and it means that each frame takes5 ms to perform. So when going from 600 to 200 FPS, you go from 1.6667 to 5 ms frame time. The difference is 3.3333 ms.

Now let's say you are running at 30 FPS, which means 33.3333 ms per frame. You drop to 27 FPS, which is 37.037 ms per frame. The difference here is 3.7036.

So when going from 600 FPS to 200 FPS, you use 3.3333 ms more per frame.
When going from 30 to 27 FPS, you lose 3.7036 ms.

See what this means? You lose more performance when going from 30 to 27 FPS than when you are going from 600 to 200 FPS. This means that saying that you gain 5 FPS by changing some setting doesn't say anything.

It's a shame I can't find the article that explains this a lot better than me. It'd probably also help if I was sober, but I hope that's understandable.
Logged
"[...] begin to seek immortality, the secrets of which they can receive directly from any available death god [...]" -Toady

janglur

  • Bay Watcher
  • +Blood Soup+
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #7 on: September 25, 2010, 08:53:22 pm »

Oh no, I understand entirely.  It's just that some tweaks only make a difference at high FPS, and some only at low.  Sadly, FPS is the only benchmarking available that's relevent to DF.  Synthetic benchmarks can be DRASTICALLY inaccurate on the grounds that they behave nothing like the real application.

Rest assured however that each tweak was tested across no less (but maybe more) than ten different maps of varying size, age, clutter, and activity, at the exact same frames and time periods.  (IE, I didn't test, save, then load and test again.  They were all tested on neutral grounds.)

I tried to only post those tests that were relevent, in FPS, on a fixed scale.  IE, the restarting made the same maps jump by a factor of ten or more.

Others were more unilateral, such as the CPU cores clocks-  the FPS drop was proportional to the frequency.  IE, 1000 went to 1200, 100 went to 120, 10 went to 12.


Also, one interesting note, I could not embark on an area larger than 42x42 in any but a dedicated-core, ganged and optimized site.  It ran well below 1 FPS.  To say it was crippled was rediculous.  After an hour, they moved the twelve spaces to begin mining.  X.x



Edit:
This is also why I completely omitted what various driver versions did, because they had no clear and substantiated link, and no differences that were appreciable.

I also omitted any differences below 200 FPS.  Going from 150 to 200 isn't really pertinant, and 200 to 400 is as simple as designating mining.  I made a strong effort to only include findings that made a reasonably significant improvement.
VRAM, for example, made lots of G_FPS differences at extreme FPS (1000+), but noone plays at that realistically.  In real forts, it made absolutely no difference on FPS or G_FPS.  Certainly not enough to bother tweaking to accomplish.

I follow the '2% rule'.  It the FPS didn't, under heavy load, make an average of 2% of better improvement at a very severely lagging fort (10 FPS), I didn't bother to report it here.

So, take all my research with a grain of rock salt, but rest assured that although it may not be as solid at Granite, it's as valuable as Limestone.
« Last Edit: September 25, 2010, 09:00:41 pm by janglur »
Logged

jei

  • Bay Watcher
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #8 on: September 25, 2010, 09:22:49 pm »

The major killer of fort FPS for me has not been pathfinding, catsplosions, or even liquids, but in fact clutter!  Noticing the resource usage of DF over the course of a fort's 10 years, massive stone stockpiling (and crafts, and food, and waste, etc.) always crippled the system irrecoverably. 

If you are interested, I have a savegame that lags from the embark start:

http://dffd.wimbli.com/file.php?id=3161

Extremely laggy savegame from the first Embark. FPS = 20-30.
Used same set to embark elsewhere, got 160 FPS last time.

Haven't dug or built much anything, just hauling stuff.

River & Volcano combination made with high variance and slow erosion cycles.
Year 500 due to elves having been all extinct in the previous worldgen, but it appears this is a NO-ELF world, ruled by titans and megabeasts.

No blood has been bled on this map yet.
Logged
Engraved on the monitor is an exceptionally designed image of FPS in Dwarf Fortress and it's multicore support by Toady. Toady is raising the multicore. The artwork relates to the masterful multicore support by Toady for the Dwarf Fortress in midwinter of 2010. Toady is surrounded by dwarves. The dwarves are rejoicing.

Urist Imiknorris

  • Bay Watcher
  • In the flesh, on the phone and in your account...
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #9 on: September 25, 2010, 09:27:38 pm »

My money is on the magma sea draining into HFS.
Logged
Quote from: LordSlowpoke
I don't know how it works. It does.
Quote from: Jim Groovester
YOU CANT NOT HAVE SUSPECTS IN A GAME OF MAFIA

ITS THE WHOLE POINT OF THE GAME
Quote from: Cheeetar
If Tiruin redirected the lynch, then this means that, and... the Illuminati! Of course!

Mel_Vixen

  • Bay Watcher
  • Hobby: accidently thread derailment
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #10 on: September 25, 2010, 09:38:36 pm »

Hmmm you mentioned a memory lea somewhere. Could you nail it down?
Logged
[sarcasm] You know what? I love grammar Nazis! They give me that warm and fuzzy feeling. I am so ashamed of my bad english and that my first language is German. [/sarcasm]

Proud to be a Furry.

fivex

  • Bay Watcher
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #11 on: September 25, 2010, 09:39:59 pm »

My money is on the magma sea draining into HFS.
That can happen?  :o
Logged

janglur

  • Bay Watcher
  • +Blood Soup+
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #12 on: September 25, 2010, 11:23:30 pm »

Hmmm you mentioned a memory lea somewhere. Could you nail it down?

You misread.  I could *not* find a memory leak.  Instead, it's legitimate memory allocation, but the reason why massive amounts of items (such as stone) causes lag is a sheer overload of read/writes.  I assume it polls the location of each item or something at an ultra-regular pace, causing severe memory thrashing.  It's the only theory that makes sense for why tighter timings would help so much compared to bus overclocking.
Logged

Vigilant

  • Bay Watcher
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #13 on: September 26, 2010, 12:08:26 am »

Comp Sci major reporting in.

Do not increase the page file. In fact unless you have a really low amount of memory you're better off disabling virtual memory to make sure your computer uses it as little as possible. Windows will always use some HD space for paging programs even if you disable it. The problem is it's memory management can be bad sometimes and it'll start using virtual memory when it doesn't actually need to. And that's bad because the harddrive is a half dying snail compared to your cpu and memory speeds. If it starts trying to use the harddrive for memory, your speed will drop to hard drive access speeds since that's the weakest link.

Before matching up memory size, if you have an older computer it's better to check and see if your memory is even dual channel at all. If it isn't dual channel you receive no speed increase from having memory chips of the same size :(

The issues with memory are a LOT more complex that has been thus described since newer operating systems and lower level controllers are designed to use your memory and cache as best they can and have all sorts of fun algorithms and shortcuts. Unless you're a Computer Engineer I'd l

But memory is the place to consider as tracking items and doing pathfinding are extremely memory intensive. The faster (higher quality) memory you can get for your machine paired with the processor with the best-est cache you can find is where you'll really get performance. My laptop's CPU can match performance of heftier machines because it's got a larger cache and pretty new memory.

Oh. To contribute to the science rather than just picking on people that have posted so far : for any DFtermer's here's something to consider. I have a server with two dual core processors and 3 Gb of memory, so it can theoretically sustain multiple games of DF at once. But the thing most people don't realize is multi-core work is still limited by the fact that there's usually only one channel to access memory, and the bottleneck will hurt performance. I loaded up my 100+ dwarf fort with for a while was running 30 fps on the server, and made a copy of it and ran that too. While they bicker for resources instead of splitting evenly, the results are it averages out to a 2/3's performance for both. Dropping to around 21 fps. Although the cache on that server are rather small, i'd expect a newer multi-core setup could host multiple games with less performance degradation.

Logged

Cyntrox

  • Bay Watcher
    • View Profile
Re: Advanced Tweaking (Warning: !!SCIENCE!!)
« Reply #14 on: September 26, 2010, 07:25:37 am »

Oh no, I understand entirely.  It's just that some tweaks only make a difference at high FPS, and some only at low.  Sadly, FPS is the only benchmarking available that's relevent to DF.  Synthetic benchmarks can be DRASTICALLY inaccurate on the grounds that they behave nothing like the real application.

*snip*
Although it's good that you're taking the measures you mentioned, it's very easy to convert between FPS and milliseconds per frame (ms/f): (1/FPS)*1000.

Now to try some of the stuff you mentioned...
« Last Edit: September 26, 2010, 09:24:27 am by Cyntrox »
Logged
"[...] begin to seek immortality, the secrets of which they can receive directly from any available death god [...]" -Toady
Pages: [1] 2 3 4