Thursday, October 6, 2016

First "Goodness Of Fit" measure

I am having a good experience using Microsoft's "Python Tools for Visual Studio". In any case, getting a goodness of fit score ("GOF") for the first time is great. Certainly, I have been aiming for this for months, so getting it to work in one case, is a start. The GOF score is 0.5714...
The text is: "the hotel was near to the border and far from downtown" and the narrative pattern I was looking for was 'room/hotel in proximity to noise sources'. The poor fit is because the word "border" is not in any of the noise source dictionaries. However the word "downtown" is a known noise source.

1 comment:

  1. As I begin to get this straight, certain subtleties arise that took some thinking to handle. First I note that the 0.57 number is incorrect. After more debugging I got score of 1.0 by an awkward eliding of the extra words. Currently I discover that the whole phrase, without elision or subdivision, gets a score of 0.88 which is good enough to proceed with along a different path that simply does NOT trigger on an "AND".

    A truly hard aspect of this is to use "scoreable data" to provide simpler scoring in the underlying read functions; followed by a compressing of that data into one number at the top level, so vaulting can driven by a final score. I sort of like the idea that before the high level "observation" the score remains probabalistic. Like a wave becoming a particle. The need for this is driven by the recursive nature of underlying read methods - where the "waves" of scoreable data superpose from the parts up to the whole and then become a "particle" of final score.