Part 1:
So I have been doing a LOT of research into what DF seems to prefer in terms of hardware and software settings, and i've uncovered a few tips to squeeze that extra 1% out when you really need it.
I am only testing with Vanilla DF, the standard map generation settings, nothing changed or disabled. This is purely a test for research on the PC and hardware itself, not modding DF.
The Threads Dwarves Weave:
As we all know, DF is largely single-threaded, which has led to a lot of myths to study regarding it's performance. What i've discovered for tweaks are as follows:
DF enjoys being dedicated to a core. Simply saying 'Stick to core 2' will improve performance a good 5-10%, though this is largely noticeable with pathfinding and only at already high (100+) FPSs.
This can further be enhanced by un-dedicating every other system processes and applications (with windows task manager or, better Process Explorer) to a different core, leaving DF it's onw dedicated CPU and cache. This gives another 5-20%, depending wildly on how much crap you actually have running, how much cache your cores have dedicated, and the phase of the moon.
Stockpiling Data:
Based on the above, a logical theory is that having more dedicated cache would improve performance. I tested this theory by setting the L3 cache allocation to 'All Cores' and 'DSP only'. DSP Only is a setting meaning only the main core (core0) has access to using L3 cache. Thoretically, in my system, this means an extra dedicated 6 MB of cache for DF. However my testing showed quite the opposite result: FPS actually dropped by a tiny 2-4% margin on initial embarks, and then became quite even after the '100,000 stone' test. Thus suggesting allowing the L3 to be used by other cores is better. I hypothesize this is because what can't fit into L1 and L2 goes to RAM if the L3 is not accessible, creating more cache competition in RAM. More on this later.
Gangs and Undead Gangs:
I also tested whether Ganged and Unganged memory modes (on 2 sticks of Dual Channel memory) would work better. First, an explanation. Ganged and Unganged refer to CPUs with multiple onboard memory controllers. Namely, my Phenom II has two dedicated memory controllers. These settings refer to how they are accessed. Unganged means that your memory is effectively divided in half, and each of the controllers address it on a 64-bit lane. So, 2x64bit. Ganged means they work together, on a single 128-bit lane. What this means in performance is that Unganged typically benefits multi-threaded tasks or extreme multitasking, while Ganged benefits single large transfers or single-threaded heavy duty apps like Dwarf Fortress. My testing concluded almost no gain on initial embark, but a major gain in ultra-large (12x12+) maps and cluttered (100,000+ stones) tests. This may be also due to RAM competition. But Ganged is definitely better, by a 5-15% margin in large or older fortresses.
Lost but Not Forgotten:
Interestingly, although DF is single-threaded, it relies on hardware to perform many tasks. And much of your hardware depends on the CPU. As such, overclocking only the DF core is not the best solution: Overclocking the not-in-use cores also helps! Depending on the difference, it can be as much as a 60% FPS boost! (This was testing with a 3.4 GHz core [DF dedicated] and two 100 MHz cores to save power and reduce heat.) FPS suffered horribly with the other cores set to power save modes, so disabling Cool'n'Quiet or other features will help, as will synchronous overclocking. This makes most effect on pathfinding and doesn't seem to affect clutter or map size speeds much.
RAM Competition:
The major killer of fort FPS for me has not been pathfinding, catsplosions, or even liquids, but in fact clutter! Noticing the resource usage of DF over the course of a fort's 10 years, massive stone stockpiling (and crafts, and food, and waste, etc.) always crippled the system irrecoverably. This is stemmed from simple RAM usage: Not only does RAM usage increase from the much higher number of entries (though not as drastically as I expected), the number of read/writes, especially reads, on RAM increased astronomically. It went from only a few tens of thousand a second (a few MHz) to a full bottleneck at well over 1 billion a second (Full 1066 MHz and up). THIS is what has been behind the issue. So I began testing RAM. The first thing I noticed was that increasing the RAM's frequency, such as overclocking, made a fairly linear improvement. A 5% increase in speed resulted in a 5% increase of reads, and about 1% in FPS. However, I then clocked to 800MHz and dropped the timings from 5-5-5-15-30-2T to 4-4-4-12-26-2T, and the increase was much more noticeable. While the reads/sec went down, the FPS went up! I'm not entirely sure how this works, logically, and the drop from 1066 to 800 did cause overall damage to FPS. So I went for a middleground. Attempting to keep the FSB/NB, HT and CPU at 2.8 GHz, I clocked the RAM to 917 MHz at 4-4-4-14-28-2T timings, and got the best performance! It beat both 800 MHz at 4 and 1066 MHz at 5. This suggests that DF needs not only hefty througput, but moreso, rapid response time from memory and the memory controller! Take this as you will. The overall benefit was an increase from 20-24 to 22-28 FPS. Not huge, but nice.
Reducing the number of services, programs, apps, etc. running will also reduce memory competition as less and less stuff takes up DF's insatiable consumption of accesses.
Give It A Break Already:
Although there does not seem to be a memory leak, DF gets a fair boost in FPS if you just save and exit (the client completely) every so often. How frequently this makes a difference depends on your system's speed, as our P4 and Celeron test systems didn't benefit until many hours of play, while the Phenom II I use personally benefits after around 3-4 hours. I'm not sure why this is. Take it as you will. The benefit is small, and proportional to how long since you closed it. It increased FPS from 3-4 FPS to 15-30 FPS after I restarted from a 72-hour long nonstop Dwarfathon. (There was also a lot of unattended Fun). Readings were taken before and after the restart to ensure it wasn't from a sudden drop in pathfinding on dead dwarves or something trapped somewhere.
Goblin Processing Unit:
Another, albeit tiny, boost is in the videocard. Using different driver versions on your system can make various improvements and detriments, and seems to vary from system to system and card to card. I can't give a very solid prediction on this.
However the GPU is a key point. A faster GPU of identical architecture gets a better FPS (and G_FPS) when operating at a higher frequency. The test was done across an ATI 1300 and an AT 4830. The 4830 was tested at 160 MHz, 300 MHz, 500 MHz, 575 MHz (default), 600 MHz, 65 0 MHz, and 695 MHz (max stable OC). It increased linearly after 500 MHz, and drastically after 160 MHz. We did the same test on the VRAM, and no frequency made any detectible difference good or bad, even at extreme OC and UC conditions. This makes sense for the software rendering aspect being largely GPU based.
Go Flush Your Cold Buffer In You SCSI DASD:
Strangely, we tried enabling and disabling Virtual Memory/Swapfile/Pagefile/Disk Cache and the effects it had on DF. I found that it is actually best to *enable* virtual memory and allocate a fair amount. I assume this is the result of reducing the amount of clutter in the RAM and thus shaving off a bit more RAM Competition. The effect was surprisingly substantial at a fairly steady 7-8% FPS improvement. This has been the single best result short of CPU and RAM overclocking.
This concludes Part 1 of our tests. Next, i'll be testing differences between equivocal Intel and AMD lineups. This will cost more than the $80 on Part 1. All for the love of dwarf.