Tuesday, May 16, 2017

Guess the topic tree structure from a sentence with its keywords

I am browsing someones GitHub project (https://github.com/natsheh/sensim), which is about a metric of similarity between sentences, and am thinking: how would I do this calculation?

Since Narwhal is about goodness of fit between a narrative and a sentence, it is tempting to calculate a distance between sentences by regarding one of them as a narrative and the other as a sentence (the answer could be different if you reversed the roles of the two). But what is missing in this is how does one reconstruct a topic tree that encompasses the 'narrative' sentence?

Or maybe a better question is about how to build a tree from a whole corpus of sentences. So go like this: find the most unique words and look them up in PyDictionaries and get their synonyms. Now go discover some corpus that is rich in these same words and their synonyms. So: given two lists  of synonyms A and B and a cloud of sentences enriched with all the synonyms [not just A and B], how would you know when to have B below A in the tree?

The example is: "loud" implies a sound; and "noise" implies a sound. So if "loud", "noise", and "sound" are in synonym lists, then "loud" and "noise" should be below "sound". Can this be deduced automatically somehow?

You might ask: is there anything out there in 'reality' that guarantees these words should be in this relationship? I think the answer must be "no" since they are words. I cannot see how you would construe the relation between "loud" and "sound" as factual. But it sure does a good masquerade of it.

No comments:

Post a Comment