1
DF Dwarf Mode Discussion / Re: Pathfinding is not a major cause of FPS death
« on: January 23, 2023, 11:23:08 pm »But we're talking like 100 ns for a round trip here. It's not "one hundred cycles" as Putnam threw out for a random example, it's certainly not 30 cycles as in your example, it's MULTIPLE HUNDREDS of cycles.Halving the clock still increases the total time from 100ns to 110ns, enough to make a measurable difference if this is the bottleneck imo.
Op time and data transfer windows are tiny fractions of the total time the CPU spends waiting. Nobody talks about this part because there isn't dick anyone can do about it, because we haven't managed to make electron wave propagation any faster over those kinds of distances yet, but when you hear things like "accessing data in RAM is an order of magnitude slower than accessing data in the L2 cache" this is what they are talking about. Adjusting the RAM clock is a fractional percent difference here.
Whereas it had no impact on FPS but CPU clock speed had a nearly 1:1 reduction in FPS.
If I could underclock my ram any more to get yet more evidence I would, but I think I'll try seeing how linked lists perform at 3200 vs 1600 instead since they are known to be ass slow because of the exact memory latency issue we're discussing.
Another thing to consider is that the unit list and other data structures in the game are afaik stored in vectors which are very apt to efficient memory prefetching when you're iterating it in order, so the latency issue isn't going to be pronounced.
1.6 vs. 3.2 billion cycles per second translates into each "gap" in communication taking 5/8ths of a nanosecond vs. 5/16ths of a nanosecond. I'm not really sure where you're getting '10 nanoseconds' from - I don't follow.
Regardless - Assuming you achieved a 10% difference in your primary bottleneck, that would make a measurable difference in a controlled benchmark - I don't know if that's going to reliably translate to a consistently measurable difference in a Dwarf Fortress FPS counter. My money would be on "no," because in my experience conducting proper benchmarks is hard and observation bias is a bitch.
The linked list vs. the flat vector is a a great case example and I would think that should illustrate the relative performance change in RAM I/O bandwidth vs. latency very clearly.
I would caution against attempting to reduce the problem space to 'oh but DF uses flat vectors so it can't fall victim to pathological cases involving a lot of blind pointer hopping' - this presumes an awful lot about the architecture of software that you don't have the source code for, and more to the point most of the shit I've heard Putnam say about this subject makes me extremely suspicious of that line of reasoning.