Sphinxmoth: The Goodness of Fit scores I ended up with

Thursday, October 13, 2016

The Goodness of Fit scores I ended up with

During recursive narrative processing, one needs a goodness-of-fit score for a narrative that is linear so the sum of the sub-narrative scores is the same as the score of the whole OR one needs a score that is superimpose-able so the sub-narrative scores can super-impose as the score of the whole. I do both using the U=number of used slots acting as a linear aspect of the score; and the found indices ifound[] as a superimpose-able aspect of the score. At any time you can compare U to the number, N, of slots used. Also you can compare the number of words read, R, to the span of indices (F=last-first+1) where they occur.

If all N slots of a narrative are used then U=N and (U/N)=1. Similarly if every word is read between the first and last indices then (R/F)=1. Thus I propose the formula

GOF = (U/N)*(R/F)

This is between 0 and 1 and it is equal to 1 if an only if every word is read and every narrative slot is filled. (There are minor adjustments to F, for dull words and known control words. Also the high-level vaulting permits multiple occurrences of the narrative to be counted if they occur repeatedly.)

But that GOF score is not linear and does not transfer up from the sub-narratives to the whole. So when we come to the need for a goodness of fit score during recursion, the linear/superimpose-able aspects need to used. But how? It only matters when reading the two-part narratives: sequence(a,b) and cause(a,b). What I do is try splitting the text into textA followed by textB and consider U_A as the number of slots used when reading textA with the narrative 'a' and let U_B be the number of slots used when reading textB with the narrative 'b'. Now we seek to maximize

g = U_A * U_B

over all possible ways of dividing the text into two consecutive pieces. It is tricky because the return value from the reading of this text will be U_A+U_B (using plus! for linearity) where g was maximized. This formula for g favors dividing the text into equal size pieces but the sum does not.
Update: It occurs to me, after explaining that the linear and superimpose-able is preserved in a recursion regardless of what formula you use for g, I can see no reason not to use the full GOF formula for g, as well. I'll have to think about it.

Sphinxmoth

Thursday, October 13, 2016

The Goodness of Fit scores I ended up with

No comments:

Post a Comment

BookLink

RECENT PUBLICATIONS

Followers

Blog Archive

About Me

statcounter