Thursday, October 13, 2016

The Goodness of Fit scores I ended up with

During recursive narrative processing, one needs a goodness-of-fit score for a narrative that is linear so the sum of the sub-narrative scores is the same as the score of the whole OR one needs a score that is superimpose-able so the sub-narrative scores can super-impose as the score of the whole. I do both using the U=number of used slots acting as a linear aspect of the score; and the found indices ifound[] as a superimpose-able aspect of the score. At any time you can compare U to the number, N, of slots used. Also you can compare the number of words read, R, to the span of indices (F=last-first+1) where they occur.

If all N slots of a narrative are used then U=N and (U/N)=1. Similarly if every word is read between the first and last indices then (R/F)=1. Thus I propose the formula
GOF = (U/N)*(R/F)
This is between 0 and 1 and it is equal to 1 if an only if every word is read and every narrative slot is filled. (There are minor adjustments to F, for dull words and known control words. Also the high-level vaulting permits multiple occurrences of the narrative to be counted if they occur repeatedly.)

But that GOF score is not linear and does not transfer up from the sub-narratives to the whole. So when we come to the need for a goodness of fit score during recursion, the linear/superimpose-able aspects need to used. But how? It only matters when reading the two-part narratives: sequence(a,b) and cause(a,b). What I do is try splitting the text into textA followed by textB and consider U_A as the number of slots used when reading textA with the narrative 'a' and let U_B be the number of slots used when reading textB with the narrative 'b'. Now we seek to maximize
g = U_A * U_B
over all possible ways of dividing the text into two consecutive pieces. It is tricky because the return value from the reading of this text will be U_A+U_B (using plus! for linearity) where g was maximized. This formula for g favors dividing the text into equal size pieces but the sum does not.
Update: It occurs to me, after explaining that the linear and superimpose-able is preserved in a recursion regardless of what formula you use for g, I can see no reason not to use the full GOF formula for g, as well. I'll have to think about it.

No comments:

Post a Comment