Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  
Pages: 1 2 3 [4]

Author Topic: DF CPU Benchmarking  (Read 10638 times)

Putnam

  • Bay Watcher
  • DAT WIZARD
    • View Profile
Re: DF CPU Benchmarking
« Reply #45 on: July 24, 2021, 06:52:19 pm »

I did some actual profiling. There's a region which the intel vtune profiler reports as taking up more and more of the total CPU time as worldgen goes on:

Code: [Select]
Address     Source Line Assembly Clockticks Instructions Retired CPI Rate Retiring Front-End Bound Bad Speculation Back-End Bound
0x1407bafc0 0 mov rcx, qword ptr [rbx] 1,173,600,000 756,000,000 1.552 0.5% 11.9% 0.6% 1.0%
0x1407bafc3 0 mov rax, qword ptr [rcx] 622,800,000 255,600,000 2.437 0.0% 0.0%
0x1407bafc6 0 mov rdx, qword ptr [rsp+0x78] 33,375,600,000 8,229,600,000 4.056 5.8% 0.0% 18.6% 22.4%
0x1407bafcb 0 mov edx, dword ptr [rdx+0xe0] 579,600,000 158,400,000 3.659 0.1% 0.0% 0.0% 0.0%
0x1407bafd1 0 call qword ptr [rax+0x88] 784,800,000 331,200,000 2.370 0.2% 0.0% 0.7% 0.3%

This is inside a function whose exact offset wasn't consistent when I tested it in multiple runs, but in these runs it was at 0x1407ba9d0. All profiler runs were 30 seconds.

I also did a microarchitecture profile of worldgen for 30 seconds, after all of those. The results of that for the particular slow function in question is included in the below spoiler, as the fourth image.

Spoiler: Screenshots (click to show/hide)
« Last Edit: July 24, 2021, 06:57:39 pm by Putnam »
Logged
Pages: 1 2 3 [4]