*linear*so the sum of the sub-narrative scores is the same as the score of the whole OR one needs a score that is

*superimpose-able*so the sub-narrative scores can super-impose as the score of the whole. I do both using the U=number of used slots acting as a

__linear aspect__of the score; and the found indices ifound[] as a

__superimpose-able aspect__of the score. At any time you can compare U to the number, N, of slots used. Also you can compare the number of words read, R, to the span of indices (F=last-first+1) where they occur.

If all N slots of a narrative are used then U=N and (U/N)=1. Similarly if every word is read between the first and last indices then (R/F)=1. Thus I propose the formula

**GOF = (U/N)*(R/F)**

But that GOF score

*is not linear*and does not transfer up from the sub-narratives to the whole. So when we come to the need for a goodness of fit score during recursion, the linear/superimpose-able aspects need to used. But how? It only matters when reading the two-part narratives: sequence(a,b) and cause(a,b). What I do is try splitting the text into textA followed by textB and consider U_A as the number of slots used when reading textA with the narrative 'a' and let U_B be the number of slots used when reading textB with the narrative 'b'. Now we seek to maximize

g = U_A * U_B

over all possible ways of dividing the text into two consecutive pieces. It is tricky because the return value from the reading of this text will be U_A

**+**U_B (using plus! for linearity) where g was maximized. This formula for g favors dividing the text into equal size pieces but the sum does not.**Update**: It occurs to me, after explaining that the linear and superimpose-able is preserved in a recursion regardless of what formula you use for g, I can see no reason not to use the full GOF formula for g, as well. I'll have to think about it.
## No comments:

## Post a Comment