Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  

Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Veroule

Pages: 1 ... 17 18 [19] 20 21 ... 50
271
I have begun work on many optimisations in the published BC code.  While I can't be sure what code is actually used by DF I suggest staying tuned to this topic for faster verisons of DF.

My goal is to optimise everything I can find a faster way for in the BC code.  Some of that code is directly used by DF which is the entire point of this project and topic.  Other parts may provide Toady and idea about how to do things better in DF.

*toast* Here is to a better DF.

272
The only problem I see with this is if you have hundreds of dwarfs on a really slow computer, you may have dwarfs that think so long they get nothing done because they'd get hungry/tired before finishing and then have to calculate a path to their bedroom/dining room and sit in the queue again waiting to die.

Of course, if your processor is so busy that this happens, you probably have other issues besides thinking dwarfs.
Yes that is exactly what would happen.  If the FPS_CAP was set high enough the main thread would churn along at that rate and starve the pathing threads of processing time.  Because the main thread keeps going the dwarf gets hungry, thirsty, and sleepy while awaiting a path.  Then has to cancel that path thread and start another.  Things like this are part of the reason why multi threading hasn't been done.

273
I was actually just thinking about this in vague general terms earlier, before reading this topic.

My rough thoughts are:
1. All path finding should be done in seperate threads.  This requires sectioning for the connectivity map and the actual map.  When sections lock then the main thread can't update either of these.
2. In order to keep sectioning from putting us right back into a single threaded mode each thread would have to lock, copy, and release the needed data.
3. Pathfinding may then occur on outdated data, removing the some of the omniscience of creatures.  This is good.
4. When something wants to find a path it should start flashing a thinking icon.  DF already has a mechanism for this, but it is very rare to see.  Then it would put its path request into a distinct thread, and await the reply.  Each path request would be a completely seperate thread in order to simulate that specific object thinking.
5. Pathfinding threads must be created as a lower priority in order to see a speed gain in the main thread and naturally extend the 'thinking time' of path requesters.
6. Extending #4 and #5 we can have the thread sleep early or use a higher priority based on the inelligence of the path requester.  Further simulating the intelligence by speed of deciding where to go.

The sum of the thoughts I had earlier would result likely result in higher FPS with dwarves doing less per frame.  I think that from a design perspective it is the correct concept.

274
DF Bug Reports / Re: 40d8 Working correctly.
« on: January 10, 2009, 12:18:46 pm »
All of the 40d# versions are test versions, and any problems with them should reported in the discussion thread in the General Forum.

Those versions are based around code being worked on by Baughn and Bheyler.  The primary purpose of the work is to try make DF faster by improving the efficiency of its display code, and increase its portability by making system specific things handled by external modules.

275
DF Bug Reports / Re: [40d]Double-bed assigning
« on: January 10, 2009, 12:09:29 pm »
There is actually a small bug that goes with this.  This leads to trouble when the spouse of a noble is a non-legendary dwarf.  Nobles can have thoughts about pretentsious sleeping arrangements regarding the spouse.  I think it can also cause problems with the rent checks.

276
Regarding the minimizing and restoring I found this in the last BC source I downloaded. From enabler_SDL.cpp in function eventLoop (line 256),
Code: [Select]
      case SDL_ACTIVEEVENT:
if (event.active.state & SDL_APPACTIVE) {
  if (event.active.gain) {
    // FIXME: We must do a full screen refresh (not partial)
    // after being restored.
  } else {
    // TODO: Disable rendering when nobody would see it anyway
    // Or maybe pause?
  }
The original BC sources when to a Windows WaitMessage when it was minimized.

The appropiate else section for the above snippet would be to set the isVisible property of the enabler to false.  Then above at the SDLPollEvent you should check this property and call SDLWaitEvent when the window isn't visible.

Are the sources on the main page of this topic still kept up to date with changes?  I have been digging around in many of the various classes in the original BC source for some other things that can be optimized in the graphics area.  The original code is just a little easier on my eyes.  I checked a few things I had spotted since I last downloaded the modified BC, and they were still there.  However the last time I downloaded was before 40d2.

Edit: BTW, I threw all of my other coding projects off my desk.  It seems like this is the time to make this happen and we may never get a chance to do it again.  I will probably be digging for a while to produce a list of optomizations, but there are a lot of really small things that I spotted in various loop where I know all compilers will produce less then efficient code.

277
Grab a copy of the Kobold Quest source.  It includes a little bit wider range of the classes used in DF to handle different displays then the BC code does.

Also I completely understand Toady's disinterest on this point.  If you dig up some very old posts my first complaint about DF was speed.  I tracked it down to my graphics drivers, then ranted about DF doing a graphical redraw with each frame.  When no one would listen I put DF into my debugger and created an assembly patch that set the graphics redraw to once per 10 game frames.  Shortly after publishing that, which included some example C code from KQ, Toady added the G_FPS option.]

I said right from the start that doing graphics display when there is nothing requiring displaying is a waste.  I am glad to hear it makes sense to you Baughn.  I am in mostly the same boat, I don't really have time to completely rebuild the code that Toady has been willing to make public.  I also do not have nearly the level of coding expertise required to do it well, I do have enough expertise in more general design areas to be able to recognize the flaws without even seeing the code.

Perhaps based on the relationship you have established with Toady during the course of working on this portion he would be willing to release a few other portions to you privately.  This was the case with one developer I worked with in a similar voluteer fashion.  My established reputation for supporting his work permitted that code I wrote could be used to replace an existing function of the program.  When I suggested that a related function could be improved in a number of ways he was willing to provide me with the code for that function so that I could ensure anything I wrote had the highest level of compatibility.

What I am suggesting is that you may have taken things as far as possible at this point.  If Toady is willing to privately provide you with some of the pieces that are DF specific then you might be able to take it farther at a liesurely pace while Toady works on the actual gameplay.

My personal view is that the Interface Arc is more meant to better present information to the player.  Basic display stuff should not be considered as part of that.  Basic display is integral to all facets of gameplay and needs to be treated with that level of priority.

278
Correcting PP to only draw when there is a change in the data or an OS request will obviously eliminate this CPU overhead.  I don't know how many I have to state it before it sinks in, if the image hasn't changed don't redraw it.
Isn't that what Partial Print does already?
Yes and no. 

The way the PP code is now it loops through all tiles in the display area and draws those that have a redraw count count less then the PP redraw count.  It preforms this particular testing loop and orders a redraw of the window at your specified G_FPS.  This does eliminate a very large number of steps in the drawing.

What I am saying is use a more general flag to indicate whether there are any changes since the last draw.  When there are new changes then preform the testing loop and redraw the frame in the same way as the current PP.  When nothing new has occurred then we can skip looping through the display region and skip the redraw order, as the current frame contains the latest data and it is properly displayed.

In order to really see this you have to turn on an on screen display FPS in your drivers.  Once you have that you will notice that redraw orders occur at your G_FPS when nothing has changed.  It is a stupid waste of processing time.

279
After you manage to actually implement the things in my previous post we can then move on to this idocy of continuously looping when the app is only waiting on user input.  When sound is off the main menu, options menu, keybindings menu, stocks screen (and all of its sub screens), trade screen, relationships screen, histories, and jobslist (since it doesn't flash legenedaries) all rely solely on user input before change in the display can occur.

Why run a loop that eats CPU testing for input when there is a simple system in place that already tests for this.  In Windows that is the WaitMessage, SDL appears to have something similar so establishing multiplatform support through a single call should not be hard.

A single flag in each view (Toady would have to add it) that indicates whether the view should loop or just go to an appropiate wait.  Once that is done we can have 0 FPS and 0 CPU at all the views I listed above and still have instant user response.  Where this will actually make the most difference is in Adventure mode where everything is based on user input.

The keyhold, repeat characteristics would require adjustment for this.  The current message tracking is based off keydown and keyup messages.  It would be easiest to use looping after a keydown message then when its corresponding keyup message comes in switch back to waiting.

Just something to think about down the line.

280
Here is a simple thought.  The G_FPS cap is a maximum, but we don't actually have to draw that often.  One app I built dumped a bit map into openGL for rendering, did SwapBuffers, then would only call SwapBuffers again on a WM_DRAW.  Effectivelly this is a G_FPS of 0, but I was dealling with a static image.  Because my card is single buffered all I had to do for WM_DRAW was a SwapBuffers. If I had more buffers then I would have had to put the image into each, but once each had the image then no further drawing would be required, just the SwapBuffers.

DF on the other hand may change the data behind the image, but the same principle applies.  There is absolutely no reason to output anything to the graphics rendering routines unless the image has changed or the OS is calling for a redraw.

There should be an existing data point to indicate that the image has changed because of the existing partial print code.  It would take me a while to find it, but since you are already intimate with the code you should know where it is.  Doing a single check on this data point is much faster then calculating the time since last redraw, and again there is no need to draw anything if nothing has changed.

I remember from my study of KQ code a long time ago when I first proposed seperating the FPS and G_FPS that the timing calculations to flash different things around a symbol (like thirsty, hungry, fey) and 2 different things standing in the same spot is done within a portion of the redraw code.  A quick look around the BC code makes me think it is in refresh_tiles.  That call could be placed on a timer in the enablerst::loop.

When everything properly sets whether or not a redraw is needed then it should be possibly to set G_FPS to 50, but have the actual graphics redraws at the main menu be 0.  It would redraw once when a user input occurred.  This is part of what I was getting at when I initially suggested seperating FPS and G_FPS back in the 2d versions. There is absolutely no reason to draw ANYTHING when there have been no changes.

Quote
Yes, 40d7 does use more cpu time to draw frames than 40d; this is a necessary trade-off given DF's architecture and the apparently buggy buffer-object support on ATI cards.
Correcting PP to only draw when there is a change in the data or an OS request will obviously eliminate this CPU overhead.  I don't know how many I have to state it before it sinks in, if the image hasn't changed don't redraw it.

281
DF Bug Reports / Re: [40d5] Biome's animals stop spawning
« on: January 04, 2009, 10:18:49 am »
I believe it isn't so much a bug as a feature.  Animal populations in a given region can be hunted into extinction, for example wolves in the US.  Currently in DF creatures are created instead of actually breeding and moving acrossed the world.  The extinction feature just isn't fully completed yet, and this sometimes results in the region around your fortress reaching extinction early.

282
DF Bug Reports / Re: 40d6 - Trouble Generating Pocket World
« on: January 04, 2009, 10:09:47 am »
I have consistently seen large numbers of rejects with pocket worlds in all versions, it is quite normal.  The only thing I found to be bugged with it is the window that pops up after a certain number of rejections.  One of the options is to keep retrying and not pop up that window again for that rejection type.  I find this option does not work, and the window will often repear just a few hundred rejects later.

283
In game testing was to be my next series of test.  Continuing from where I left off in my last post.

I copied 2 different fortress games from 40d.  Both are pocket worlds generated with 40d, and have various amounts of play on them.  I end each test by using task manager to kill the active DF version, and then reset the save data from a clean backup.

Set MOUSE:NO
Set PAUSE_ON_LOAD:YES
Set TEMPERATURE:NO
Set WEATHER:NO
Set INVADERS:NO
Set SHOW_FLOW_AMOUNTS:YES
Test 10:
 Load game "a game"
 Watch it paused
40d7: FPS 2200-2300, with odd spikes as low as 1900 and highs of 5000.  This spikes are a tiny flicker, but they with a periodic consistency of every 2 seconds.  Processor usage 60% memory 84Mb

40d: FPS 1975, 2011-2050.  The 1975 number tends to be consistent for periods of 2 seconds, then the FPS moves through the 2000 range breifly then returns to 1975.  Processor usage 10 - 20% with the same pattern as the FPS. memory 79Mb.

 Press z
40d7: FPS 300-360, processor 100%
40d: FPS 370-386, processor 100%

 Press space
 Press tab until menu is gone
40d7: FPS 2380-2480, processor 70%
40d: FPS 1975, with fluctution like above, processor 10-20% but holds more on the 20%

 Move display to completely unrevealed area
40d7: FPS 1900-2000 processor 50-60%, the CPU numbers look like 40d behavior here
40d: FPS 1975, almost flat minor spiking, processor 0-10% mostly holding at 10%

Test 11:
 Load "main"
 Watch it paused
40d7: FPS 2800-2900, similar spiking, and I even saw it jump to the 10K range once.  Processor 70%, memory 72Mb

40d: FPS identical to test 10.  Processor 20%-30% same behavior as test 10, memory 65Mb

 Press z
40d7: FPS 345-363, processor 100%
40d: FPS 350-383, processor 100%

 Press space
 Press tab until menu gone
40d7: FPS 2800-2900, processor 67-71%
40d: FPS 1975, with small fluctuations into 2000's, processor 20-30% mostly hanging at 30%

 Move to unrevealed area
40d7: FPS 2100 range, processor 60% with occasional drops as low as 50%
40d: FPS 1975 with small fluctuations into 2000's, processor 10-20% mostly hanging at 10%

These tests are all with the game paused so we shouldn't be looking at pathfinding, flow calculations, or any real dwarf activity.

I did check a few other views such as the units, jobs, and options screens, they all resulted in FPS numbers like the stocks screen.  This indicates there is something Toady really needs to optomize with these views.  These numbers are the same as my main menu results below.  This indicates that the problem with these views is not affected by whether a save is loaded.

Looking over all the numbers from a more statistical standpoint we see a processor usage that is consitently 3 times higher with 40d7 then 40d.  The best FPS gain was 50%, with an average gain of 20%.  Using that much more processor for such a small gain is a less then optimal result.  An optimal result would be 3x processor nearly equals 3x FPS.

Of course neither version reached the FPS_CAP of 10K that I had set, and this is likely because of the Windows Sleep inaccuracy.  The code in 40d7 is definitely pushing more FPS through, and making greater utilization of my system.  It comes closer to doing what I have asked it to do, and the only reason I am aiming for such maximum numbers is limit testing.  Working with a more reasonable limit would likely show 40d superior due to its already more optimal code, and lower memory footprint.

I can move on to actually unpausing the game if anyone feels it is needed for comparative purposes.

284
My testing report on 40d7. 
OS: Win XP SP3.
1GHz Athalon
ATI Radeon 7200, AGP 4x, Omega 2.6.87

Each test is cumulative.

Unzip to new folder
Test 1:
Spoiler (click to show/hide)
Result: perfect

Set FPS display YES in init
Test 2: Repeat of Test1
Result: perfect

Set FPS_CAP to 10000
Test 3: Repeat of Test1
Result: Movies are not properly capped at 100 FPS

Set WINDOWED to YES
Set INTRO to OFF
Set SOUND to OFF
Set MOUSE to NO
Test 4: Main menu
Result FPS 1100 to 1400
Comparative 40d FPS 1500-1600

Set PARTIAL_PRINT to YES:0
Test 5: Main menu
Result FPS 1700-1800 FPS
Comparative 40d FPS 1700-1800, but with more in the 18xx range

GRID set to 112:50
Test 6:
Result FPS 390-440
Comparative 40d FPS 420-430, with odd highs and lows
Visibility identical

Test 7:
Bring any other window to the front, the task manager window works best
Click and hold on the title bar of the other window
Drag the other window over the DF window repeatedly, rapidly dispaying and obscuring portions of the DF window.
Result: all text of the main menu is improperly drawn (textures are torn horribly), latency whitespaces appear, FPS drops to 30-40
Comparative 40d: all text is properly drawn, refreshes are smooth, FPS drops to 60-90

Test 8:
 Click the main X for the window
 Press SPACE
 Result: DF becomes locked in the option menu
 Comparative 40d, pressing SPACE returns to the Main menu

Set WINDOWEDX:896
Set WIDNOWEDY:600
Test 9:
Result FPS 399-43x
Comparative 40d, 430-44x

Set G_FPS to 10
Test 10:
Result FPS 430-440
Coparative 40d, 440-450

So far my tests of 40d7 are showing it as having a slower overall draw rate on my system.  Also my tests 7 and 8 show bugs in the display.  In every case I repeated each of these tests multiple times in a row to make sure that stable numbers were reported for each version of DF.  No secondary loads could be called into account, and all tests starting with test 3 did utilize all of my processor.

285
Baughn, I finally got around to reading through the SDL documentation.  They of course do not provide anything remotely similar to what I mentioned in my last post.  It is yet another case of least common denominator.  If I didn't have 3 other programming projects taking up all my hobby time I would try to come up with something else to handle getting the PP stuff simpler.  Sadly it would involve me spending quite a bit of time with the Mac API docs, and the little bit of time I spent poking around those this morning made we want to skip work, get castrated, have my eyes gouged out, and finish by being immolated.


I did find that you could use SDL_ListModes to gather information about available modes for doing the full screen switches cleanly with the best useable space.  I haven't seen many people reporting it as an issue, but it would be good to get right.

Pages: 1 ... 17 18 [19] 20 21 ... 50