Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  
Pages: 1 ... 45 46 [47] 48 49 ... 222

Author Topic: Dwarf Therapist (Maintained Branch) v.37.0 | DF 42.06  (Read 974741 times)

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #690 on: June 14, 2014, 12:59:11 pm »

The goal of an ecdf is to grab the real distribution of values. So I disagree with you on garbage data. The roles are whatever a player feels important for a role.  Splinter z and I just found a reliable test that let's us find skewed distributions and preserve their low values properly. I have an even better proposal that truly works of the ecdf values (adjusted for skew) to create the true percent value of any given preference or attribute within a data set.

That is the purpose if statistical analysis. To quantify unknown based on samples. so to argue the distributions are not comparable doesn't focus on the reality that regardless if value. You have am equal number of varying attributes that can be measure on an.ordinal scale. I intend on using ordinal to normalize data around the median

The goal of ecdf is to say... Out of say 1000 dwarves, is egardless of the difference in between values. The value is indicative of 1/1000 distribution of dwarves. Its is the true % range of a value within that 1000 sample distribution
« Last Edit: June 14, 2014, 02:17:59 pm by thistleknot »
Logged

indyofcomo

  • Bay Watcher
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #691 on: June 14, 2014, 03:35:34 pm »

a simple script such as
d.agility() > 5
doesn't filter any of my dwarves out. And no, I haven't selected against un-agile dwarves. (Though that would be fairly dwarfy, I guess.)
Logged

sal880612m

  • Bay Watcher
  • [SANITY:OPTIONAL]
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #692 on: June 14, 2014, 03:40:25 pm »

Is there a place I can see the pre-defined roles Therapist is using?

How does it calculate the % suitability a dwarf has for a given role? Traits and stats?

As far as seeing what things are considered for the predefined roles you can go to new custom role and below the title for the new role there is a dropdown box of all the predefined ones that you can copy the settings from. Just to be careful you really should give the role a custom name as I do believe it is possible to alter/overwrite the default roles. They are fairly easy to restore but it can be tedious.

For how it calculates the suitability, my understanding (which could be way off) is that it evaluates it based on traits, stats, preferences, and skills taking into account the various weights you give each. At that point I lose any concrete idea of what is happening but I think it is: once all the roles are evaluated for every dwarf those numbers are then plotted on some sort of bell curve. Once that is done the curve is adjusted such that at least one of the roles is above 99.5%. This means if you put a high value on skill you can see a dwarf drop considerably in rating if you get a highly skilled immigrant or if you put more weight on traits, preferences and stats ( and stat potential ) and only stats change you are more likely to get more static values for your roles.
Logged
"I was chopping off little bits of 'im till he talked, startin' at the toes."
"You probably should have stopped sometime before his eyes."

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #693 on: June 14, 2014, 07:07:56 pm »

the way the %'s are calculated currently is based on pre-defined models of how we believed the data falls into frequency bins.

Traits are mapped into their own frequency bins based on http://dwarffortresswiki.org/index.php/DF2012:Personality_trait straight into a %

Attributes are a bit trickier.  Initially we based it on their frequency bins as defined in
http://dwarffortresswiki.org/index.php/DF2012:Attribute, attributes have 6 distinct frequency bins; then Splinterz did the hard work of scanning mods that have different frequency bins and took the mods varying castes into account (yeah, he did all that), so we got a more true representation of all possible values.  However, we realized that attributes can increase! (unlike traits).  So we had to come up with a way to scale the attributes from something like 5% to 95% based on created maximum possible value.  We decided then to scale up the % based on what a dwarf can train up to, so from whatever value you get from within 5 to 95%, the % can go up a bit more based on the amount of attribute potential a dwarf can train up to.(a dwarf can double his starting attribute through training), so we reserved something like 95% to 99.5% can be what a dwarf can train up to. tjos is based on some tricky sigmoid function math that takes the amount that a dwarf can train up to and ensures we never exceed the threshold 99.5%, then the last .5% is reserved if a player cheats his dwarf's values even higher.

Skills were a simple xp/max xp, but then I believe Maklak said we should be basing skills on the level of the skill vs exp.  However, there is training rate in some mods.  Some dwarfs train skills faster than other dwarf's.  So we did the sigmoid magic again to scale up the value based on this training rate.  You'll have to find the old post for it, I'm not going to go into it right now.

So then you get a 0 to 100% rating for each category, attributes, traits, skills.

Then there's preferences.

I'm not sure how Splinterz did Preferences, I believe he was just going to go with a simple additive value to the other 3.  I'm not 100% sure.  Preferences were hard to categorize.

the initial idea was to average the three values together using a weighted average based on the weights set for attributes, skills, traits.  With preferences adding at the end, or being part of this weight.

Not to sure.

I'm not sure if Splinterz will like a new idea I have to propose, but instead of doing all that hard fitting of data to pre-defined ranges as we have done, I was going to propose just scaling the values from 0 to 100% compared to the current forts distribution of values.  We did something like that initially, using statistics and a cumulative distribution function, but we found that statistics assumed a normal distribution.  However, I honestly believe that can be avoided by using an empircal cumulative distribution function.

But... I haven't heard from him on my new proposal of replacing the way we calculate %'s based on raw frequency categories to a new method that uses the ecdf of the current population set.

Trickiest part with attributes and skills (vs say traits, which supposedly NEVER change from embark value), is that attributes and skills change after a dwarf is created.

The great thing about attributes, is the amount a dwarf can "train" up to IS ALSO SET AT EMBARK, AND IS BASED ON THE INITIAL (aka VALUE) of the attribute; however, an attribute can also "decay" below starting value, which means that even below the lowest possible embark possible value (aka an ecdf of all possible starting values would produce a percent from 0 to 100% AND STILL wouldn't be descriptive of all possible ranges a fort can have).  What's hidden in the attribute starting value (which is stored in memory) is the amount a dwarf can increase/decay from.  So when we hard modelled the data, we had to incorporate that. 

An ecdf of the current values could take that into account, but some similar comparison on initial value would have to be performed on the values' current ordinal position compared with the rest of the data in the set, and what it's max ordinal [derived from the actual starting value of the attribute] position is compared with the rest of the data.  This means that a lot more would be stored in an ecdf conversion for an attribute.  Keeping track of it's initial, max, and current value would give it a 3 input variable that would be converted into a relative new # that is fed into an ecdf.  I guess you would take the current formula, and just before you transform it into a %, you just run those #'s through an ECDF function.

Skills could be done the exact same way, but flagged skewed data before it's ran.  0 values always [should] remain 0. 

Flagging is possible by using an ECDF conversion of a set of values produces a abs((mean average) - .5 )>.275, which is our error check function, flags the distribution as having an unnaceptable new mean.   Means should be within .25 from .5 to be considered normalized to each other (I actually tested this).

There's another test for skew: if the data has one value that is repeated >50% across the set of values; simple formula if (median of data set / count of data set) > 50%, then the distribution is skewed.

Then instead of ecdf, we do max-min conversion from 0 to 100% of the original skill exp or maklak formula, doesn't matter, as long as 0 still remains 0%. aka we do a (x - min) / (max - min) conversion (one issue with this is the conversion sets the lowest value to 0%, which was only intended for the value of 0 itself.  Update: I proposed running the non 0 values through their own ecdf (by removing 0's), then reinputting 0 values into the list.

The great thing about ecdf, is we can apply this same skewed concept to traitspreferences.  we can rank dwarf's from least # of matching traits preferences to most # of matching traits preferences and get a good 0 to 100% rank of our dwarf's who have or have not any preferences.  I'm still thinking on that one.

Yeah, I could talk all day about this, but I've got a lot of thinking to do about ecdf conversions right now.

« Last Edit: June 15, 2014, 08:15:04 pm by thistleknot »
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #694 on: June 15, 2014, 06:58:35 am »

Dffd was acting funny so I setup a drop box account

Here's how I would propose calculating roles based on ecdf. see the miner tab sheet


https://www.dropbox.com/s/5qop6oowf1jjfys/ECDF%20Proposal-1.xlsx

fixed some numbers

http://dffd.wimbli.com/file.php?id=8654

Dwarf's are numbered from left to right based on sorted role rating from lowest to highest of old vs new methods respectively.

http://imgur.com/wPBJh4i


reason it matched so well was due to a bias given to attribute weights, this is entirely to do because of skills being unrepresented in current role calculations.

see below updates
« Last Edit: June 15, 2014, 08:16:56 pm by thistleknot »
Logged

splinterz

  • Bay Watcher
    • View Profile
    • Dwarf Therapist Branch
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #695 on: June 15, 2014, 11:33:52 am »

The roles are in the etc/game_data.ini file.

Quote
The roles are garbage. Useful and stuff, but I mean, technically the numbers behind them, it's just nonsense. Deriving things from garbage data is not good for you.
please elaborate on this. while admittedly some of the roles could use some tweaking, saying they're generally garbage is somewhat alarming, considering the numbers behind them are based on a lot of compiled !!SCIENCE!! already done on how attributes, preferences, traits and skill rates affect jobs and combat.
Shearer's a fine example, with it being mentioned here. There's no role there for if the dorfs like wool, sheep, goats, alpacas, and whatever else you can shear, because no one's put that in yet. Spinner doesn't care if they like/hate wool or hair or yarn. That's free happy thoughts, even if it doesn't make them work faster. Shearing is also typically a long-distance walking job, so benefits more from agility than other jobs.

There's no roles at all for the hauling labours, when dorfs who like cages should be priority Animal Haulers, dorfs who like minecarts should be vehicle pushers, dorfs who like assorted furniture should be furniture movers, weak dorfs shouldn't bother hauling stone but fat dorfs should. They're all just lumped on young dorfs who have no other skills yet, when really some of your multi-skilled dorfs would be happier doing some of it rather than sitting on "no job".

Like I say, they're useful as is, and I do appreciate the thought and effort that's gone into producing them. But when you do fancy math on numbers that are non-calibrated guesses, what you get is even worse numbers. Normalising them so they all run 1-100 or whatever, that's got thistleknot thinking he can use them as a comparison tool when they are not comparable.

I'm using "garbage data" as a semi-technical term, eh. You can't gain real information by doing math on guesswork. You can't actually compare if someone's going to be a more effective speardorf or sworddorf because we don't even know which weapons are better (or we do, and unless you come with a like for swords or a lot of sword skill you should be a speardorf).
alright so you can create roles yourself for shearer and hauling. not to say that they shouldn't be included by default, but i'm not sure what this is a good example of, other than that some roles could be added to the default set.

i don't know what you're referring to when you say 'non-calibrated guesses'. do you mean the weights applied to the roles? you can change them to whatever you want. do you mean how the ratings are calculated? that's still fairly straightforward and based on numbers the game provides (ie. we know how much xp/level, we know how attributes increase and by how much, we know that traits can't change), so there isn't much guesswork there. do you mean what attributes/traits/preferences/skills are associated with different roles? most of them are from science performed by other players (ie. how creativity impacts the quality of craftsdwarves, or how strength/agility changes speed). however you can still override the default roles with your own, or create completely new roles to use instead.

so i suppose my question would be, what exactly are you stating is guesswork that makes the roles garbage?

the military roles are somewhat lacking, because the only thing that really separates them is a weapon preference and skill. but weapon skill plays an awfully big (and known) role in how proficient a soldier is going to be with a weapon, and we do know the difference between sword and spear attack on different types of armor.

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #696 on: June 15, 2014, 03:43:14 pm »

did some more work on this ecdf proposal setup for calculating role %'s

If you look at the 'Miner Role' you will see cell G8, that change these graphs between [old & new] proposed skill drawing method and how it compares to a current export role of the Miner with the same weights applied.

http://imgur.com/9DNeIRY

http://dffd.wimbli.com/file.php?id=8654

The graph on the right has the same ending values as the As Is (say dwarf 49 to 53).  This is because the values of high skills translates into a high %'s for those dwarf's in both skill calculating methods, but the one's on the right that were boosted higher than their counterpart drawing on the left is the reason for the spikes.  It is enhancing the affect of lower ranked skill ecdf ratings, while retaining all 0 values for skills, that's why you see the jump start at dwarf 41 and between the two charts why dwarf 41 to 53 (blue line) shape is a little different.

Removing the 1 value from cell g8 causes the lines to match more, but that is because skills work like method 0 currently in DT; I'm proposing the new method 1 to jump skill values a little bit more since they already overall have a lower mean than the other distributions.

Removing the skill weight in J13 removes the additive affect of skills in the drawing, and due to skills underrepresented nature in role calculations currently, the affect more mimics how the role normally works (i.e the orange line).

The goal is not to match the old method exactly, but to understand how and why the differences will occur, and to verify it is intended to operate that way.  So in affect, those values that had a little skill, will get a big boost compared to a dwarf with no skill, that is the intended purpose of re-balancing skewed [skill] distributions.
« Last Edit: June 15, 2014, 07:14:57 pm by thistleknot »
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #697 on: June 15, 2014, 11:30:41 pm »

It would be nice if Dwarf Therapist showed a dwarf's kill history.

Maklak

  • Bay Watcher
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #698 on: June 16, 2014, 01:36:00 pm »

Thistleknot asked me for input on labor optimization, but I just don't have the energy to read through the last two pages. so I don't even understand what you're doing and what you want done.

The problem of allocating Dwarves to jobs is a difficult one, but I have a few ideas which probably won't be of much use.

For something fully automated, you'd need to monitor the time a Dwarfs spend working, how many jobs there are and how long do they wait to get done. Based on this, some Dwarves could get some of their jobs disabled, while idling / peasant population got some labours enabled. But this is probably beyond what you're trying to accomplish here.
It should be a tad easier to just display this statistics and let the player decide "I want between these 2 numbers in this labour" and prefer enabling jobs on idlers. But this is still a difficult problem, especially if you want to take individual Dwarves suitability for various jobs int account.

If sorting by suitability for a job is all you want, then it is an aggregate of a few components:
* How high the skill is. Or rather how long it will take to max it out. I've given some formulas for taking into account the skill learning rates some time ago. Also, low skill combined with very low / disabled learning rates for the caste should pretty much disqualify a dwarf from a labour. (Which is why I prefer geometric (multiplicative) mean to arithmethic (additive) one. (The weights are the powers to which you're raising components)).
For a role you might want a combination of skills.
* Speed, based on caste SPEED, agility, strength, body mass and maybe a few other factors. You'd want to keep the worn equipment weight mostly out of the equation, except if you want to take armour user into account too. Anyway, speed is faster movement, faster attacks and I've conclusively proven that it also means workshop jobs get finished faster. That is, time to finish a workshop job takes a set number of actions, modified by skill. The faster a dwarf is, the less cooldown between actions and the job gets done faster.
* Attiributes and personality Stronger and tougher soldiers are better, Dwarves with high Artistic versatility make better engravings, Epmathic and Dutiful doctors and nurses are better and so on. I think you already did that part.

Anyways, I feel a bit apologetic about not contributing anything useful this time, but I don't have the energy and willpower it would take me to do so, and we'd probably argue anyway. I have a raging anger problem. (That was a pun.) Heck, I barely even visit the forum these days. Besides, I don't use the optimizer myself and just sort by skill. I also haven't used any recent version of DT and stick to v20 which had pretty much everything I needed.
Logged
Quote from: Omnicega
Since you seem to criticize most things harsher than concentrated acid, I'll take that as a compliment.
On mining Organics
Military guide for FoE mod.
Research: Crossbow with axe and shield.
Dropbox referral

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #699 on: June 16, 2014, 07:28:32 pm »

maklak, thanks for chiming in.

You gave us valuable insight before, especially with dealing with skill rates as well as attribute potential maximum's. 

I know you are very concerned with Skills ranks and DT, and the ability to sort by skill ranks, which was default DT behavior.  I believe the ability to sort by role rankings on the labor screen has been removed (I remember 3 sort options).  Anyways, the ability to sort by role rankings can be accomplished in the role view.  I do not wish to rob that feature from DT [sorting behaviour by skill]; personally, I think skills trump other values, but their hard to account for with our current setup, but read below....

So...

I had this new idea to normalize the values of our input variables for roles prior to manipulating them with an (a * weight) + (b * weight) + (c * weight) + (d * weight) / sum of weights.

I was thinking to use empirical cumulative distribution function to figure out how to normalize distributions of varying values next to each other.


There is a problem with comparative value between roles when you lose a lot of skilled dwarves inbetween.  Scenario is you have 6 great dwarfs in combat, you lose #2-5 respectively from highest to lowest.  Then... your difference in melee dwarf % from the next dwarf below it is a smaller % than it was before.  I believe I found an answer to that issue.

I was googling normalize distributions by ranking and I found this,

http://en.wikipedia.org/wiki/Quantile_normalization

It is the answer to our problems.

I don't know why they call it quartile, because it doesn't actually use quartiles. derp, quantile normalization.

It's quite ingenious though.

You rank all distributions (i.e. each role) to each other from lowest to highest value.

Then you run a mean across each row, so you have a new column at the end of the distributions.

Then you divide each rank's value by the sum at the end to get a %.

This normalizes the distributions next to each other.  So you can do this with all variables involved with a role's calculation.  This also solves the problem of seeking a distinct % variable for each value in our transformed %'s we're keeping.  I would suspect that you would get a very interesting set of % values resulted.

Originally I was thinking that normalization would save us time on waiting for an update to find the new variables being brought into play that determines a character (essentially traits).

However, an ECDF ignores distinct value differences between values.

I was worried that if you have 1 great warrior, and 5 also great warriors.  But your 5 die, and you are left with just your best dwarf.  The problem would be:  You would have no % gap showing the difference between the next value below it, as the % is tied with a rank position, it's basically a step up of % value * rank position.  So if you showed 30% difference between the 6th lower value before the loss, after the loss, it would be something like 5 or 6%.  It would drastically affect all the other role ratings.

INSTEAD, I think you should run it on the quantile normalization, and it preserves the difference in values between distributions.  It makes a lot of sense and solves the issue with above and still doesn't require as much work as modelling data after a lot of proposed frequency structures, and fixing whatever distributions we can best estimate to it.

I'm still on the fence about one aspect though.  Splinterz wanted to calculate the ecdf rating of all distributions, vs each time a labor optimization is ran and just the data within that run.  That will be a small decision though, either way differences in values will be respected more, and values that can't be directly compared with each other numerically (i.e. trait vs attribute).  If the % values are preserved at the most basic level for the population for just skills, traits, attributes, and preferences (although, I think we can do something with preferences to scale it from 0 to 100% or whatever we need to normalize it to, especially after this realization and my understanding of ecdf rankings), we will save ourselves a lot of work and truly give a representative value of each value next to each other.  Especially for purposes of the labor optimizer.

I figured the # of values in your population should give you the best estimate of your distribution, and if somehow the raw score value can be used in some way, you can preserve the meaning of difference. 

I found it with quantile normalization.  As we can normalize all attributes next to each other, then traits, then skills.  Skills would be even easier to deal with their skewed distributions using quantile vs my other proposed method.

By normalizing all attributes next to each other using quantile distributions.  It will always compare the value with the sum of their ordinal ranking positions.  It's ingenious.  It works with data that have the same # of rows[elements], which is even better!

Update:

haha, all this time I was describing the % step between values, it's called a quantile.

http://en.wikipedia.org/wiki/Quantile

Here's me adding the rows of each ordinal rank together compared with it/s flat % curve compared to sum of all values

http://imgur.com/qfJTzgK
« Last Edit: June 17, 2014, 04:31:22 am by thistleknot »
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #700 on: June 17, 2014, 12:18:54 pm »

Okay

This will hopefully be my last update before I do another rework of proposed changes.

To normalize say a category of values to each other.

For example, the 19 attributes, strength, agility, creativity, etc...

you would take a grid view of your # of dwarf's * # of attributes = # of values to normalize.

I.e. 90 dwarf's, 19 attributes, = 1710 values.

Step 1.

Calculate the sum of the 1710 values.

Divide any specific list in the 90x19 grid by the sum of the values.

That gives you a normalized value that you can do either a ecdf conversion, or simple max-min conversion.

Max-min conversion = (x - min)/(max-min)

that will give you a % from 0 to 100%.

A max-min conversion centers the values around the mean, i.e. 50% = mean.
An ecdf conversion centers the values around the median.

ECDF has the advantage of giving the lowest value a non 0 value, which is useful for weighting purposes, but min max can be altered to give a non 0 value for lowest value doing something like this:

Range = max - min
Newrange = range *1.01 (padding)

Newmin = max -new range
Newmax = min + newrange
« Last Edit: June 17, 2014, 02:35:42 pm by thistleknot »
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #701 on: June 17, 2014, 10:15:35 pm »

Here's what i'm working with.

Doing a simple divide x value by sum of grid values (i.e. dwarfs * attributes).

I get a few interesting charts showing how the values are "normalized"

http://imgur.com/kRSO8wE

and

http://imgur.com/TZf1t3W

Vs normalizing each range to each other and having differences in average values having negligible affect on comparison, you can directly compare attributes to each other now.

I went with a new minmax formula, but I'm not against the ecdf either.  I think if all values are equally probable (aka 1 to 1000 for 1000 dwarfs, then I think each value in my list represents a true distribution of my characters).

However... comparing to mean (i.e. non ecdf), you get a better # range from 1 to 99% that is relative to the population.  This could have unintended consequences when converting to a 0 - 50 - 100% range evenly distributed when compared to other categories such as traits, skills, preferences, etc.  So I recommend ECDF after we derive the new normalized % values.

However, I think a radio button allowing choice between ECDF vs Mean comparison would be warranted.  A player can either get a split of +/-50% around median, or +/-50% around mean.  It could have an interesting difference on optimization.

An equal up/down split across median values across the board.  However, skewed distributions could be addressed still with ecdf conversion.
« Last Edit: June 17, 2014, 10:22:58 pm by thistleknot »
Logged

tussock

  • Bay Watcher
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #702 on: June 18, 2014, 05:18:21 am »

The goal of an ecdf is to grab the real distribution of values. So I disagree with you on garbage data. The roles are whatever a player feels important for a role.
You're assuming anyone actually uses the custom role editor? I suppose I've seen a couple sets for download.

Quote
Splinter z and I just found a reliable test that let's us find skewed distributions and preserve their low values properly. I have an even better proposal that truly works of the ecdf values (adjusted for skew) to create the true percent value of any given preference or attribute within a data set.
I don't wish to dissuade your enthusiasm for fiddling, but you should be aware that it's not "true" anything, because your data is something you invented. It says that in your own readme and stuff. Yes, the stats and max and prefs and traits are real sampled data from a well-defined range, but then you go Stats * A + Max_Stats * B + Prefs * C + Traits * D.  The letters there represent the bits you invented by feel, which means your final numbers are garbage.

You haven't got a "true" order of dwarfs in the fort for any job, even though it's an order and that's very useful. I apologise for pointing out the most obvious cases rather than the typical ones. But a high Agility skill 15 miner will easily beat a high Strength skill 25 miner to the job and also mine faster than them until their endurance and persistence comes into play and various ones head off for a drink before finishing. Mining does train Strength though, so ex-miners make good military, so, um, what? What's "true" about the mining role? I like it because it helps me find dorfs who like picks, for instance, because they're happier when you let them carry one around, though it's good to check they're not a berserker first, because picks are lethal.

Quote
That is the purpose if statistical analysis. To quantify unknown based on samples.
That's not what you're doing here. You know the true limits of Stats and Max_Stats and Skills and Traits, sampling doesn't change those. Arbitrarily multiplying a sample of those by fudge factors and then combining them into new values and changing the range to fit those new values isn't a sample of anything: it's possibly useful but it's not valid data. You're not going to discover anything, other than how easy it is to obfuscate any real information.


Like, if my best dwarf is only 37% of the estimated maximum combination of numbers, that could be handy to know. I might want to put six of them on the job instead of the two I'd pick if heaps of them were at 100% (because we only care if they're altruistic). That's already hidden, you're just hiding it further. That's more of a problem when your best mace user is 60% of ideal, but they're 70% of ideal at axe, but because you've got a lot of good axedorfs they get a bigger number for maces (just let them all be axedorfs! Axes are fine).

Normalising everything, it just seems like you're taking what little accurate data you do have and hiding it under more layers of ... I struggle to find a kind word. Maybe I'm just blind to the utility of post-facto modifying the distributions of your fudges to make them look more like ideal random samples from a large population: despite them not being actual samples. As much fun as that may be.
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #703 on: June 18, 2014, 08:17:40 am »

How can u say they are not samples? I intend on normalizing the data to the population and the normalized numbers will retain all the properties of the original data.

Originally I had proposed to normalize each attribute to each other, but realized differences in [max/min] values are lost to each other unless you normalize all attributes at once to the [grid of (ie # dwarfs * number of attributes (19)] attributes min max range (i.e. min = ~0% and max = ~99%


I mean as is. What were doing is and of itself funky, but as is were transforming the data into ranges that have been co firmed with very large data sets and the work we put in to come up with valid formulas for skill and attribute potential I could see you having issue with...(but even then we did a fine darn job of working out a fair formula thanks to Maklak and I).  Anyways... if I can normalize in a fair and consistent manner that retains all the meaning if a current populations DAT I don't see how you can find fault for that, especially if any transformations of the data make sense and are applied in a consistent manner.
« Last Edit: June 18, 2014, 08:47:17 pm by thistleknot »
Logged

thistleknot

  • Bay Watcher
  • Escaped Normalized Spreadsheet Berserker
    • View Profile
Re: Dwarf Therapist (Maintained Branch) v.21.8
« Reply #704 on: June 19, 2014, 08:21:08 am »

my minmax proposal vs asis

http://imgur.com/BmoQPg2

preserves order, and offers a better spread of %'s


nevermind, needs a little more work.

Update:
Okay, I redid it [3 times now] for a new proposal on how to calculate role %'s.  Basically calculate any % without having to use frequency bins.  It's involves not just sampling each attribute individually, but samples all the attributes across each other at once [aka a matrice], getting a sample size that is x19 larger [for attributes specifically, traits will be even more!] than what we would normally expect, so immediately you have enough data to start drawing some conclusions about the range of whatever category your sampling and can start assigning percents based on your gridview size (aka number of dwarfs * elements in category, example: 45 dwarfs * 19 attributes = 855 elements in your sample to draw direct %'s to).

Same goes for skills.

We then use that to run either an ECDF special conversion that preserves 0 values and boosts non 0 values, or a minmax conversion.

http://dffd.wimbli.com/file.php?id=8679

Pics

http://imgur.com/L5WHshv

A Miner role Comparison using .75 attribute weight, and 1.25 skill weight preserved in current proposal vs asis.  There are some variations, and is due to skills skewed nature.

http://imgur.com/yUH2B1K

this is a comparison of the lowest mean attribute and highest mean attribute being compared with each other after transformation using either minmax or ecdf.

you can imagine ecdf vs minmax is like using the raw value vs using the ranking value.  The graph's would be very very similar.

Update:
best picture to make a case for the ecdf version.

http://imgur.com/LM8OeFd

and for comparative reference, this is what the minmax version looks like

http://imgur.com/9XeJQYT

what it looks like when sorted by asis vs proposed
http://imgur.com/cxOUpGM

What am I looking at?

The matching curve suggests that an ecdf conversion of attributes into respective quantile %'s equals the fair distribution breakdown of the gridview (btw, this is even after I lost about 2 of my best military dwarfs).  Since the values are all interrelated, you are getting a fair representation of each value within that gridview to each other [in terms of attributes when looking at attributes, same goes for traits, etc).

What your looking at is the miner role which has 7 attributes mixed in together, one is even weighted, and my [non caste defined (bin)] formula, matched a pretty good distribution curve to what is already programmed.

Note: I've put off play for a week to get these ideas out and do some meaningful comparisons
« Last Edit: June 20, 2014, 08:17:10 pm by thistleknot »
Logged
Pages: 1 ... 45 46 [47] 48 49 ... 222