Tuesday, October 13, 2015

Theory of how to organize topic dictionaries

From past code:
// Dictionary.cpp : Defines the entry point for the console application.
The original "vision" wasn't followed too well. Here it is for reference (it is more or less nonsense at this point but mentions important topics):

Dictionary trees are built by linking together generic dictionary "nodes". In addition to properties described below, every dictionary is also a C++ class and is allowed to have other methods and member variables.

Every dictionary will have certain properties, to be explained below. These are:
 - an optional word list “self”
 - a "complete" flag that can be set or cleared. When cleared we say the dictionary is "in a neutral state". When "complete" the dictionary will have a current set of values that can be saved (or "vaulted" or “grasped”).
 - a readText( ) method.
 - a set of child dictionaries. The readToken function is called on them before it is called on the self. 
 - a vault "policy" which determines what information is saved when it is complete. [NAH, it is a way of delaying the saving until a subsequent event.]

The dictionary tree acts like a seive, reading text. The incoming text is tokenized and groups of the tokens [nowadays I pass a token and ALL of the text through ALL dictionaries, once for each for each token] are sent down into the tree via the readToken( ) being called at the root.
The simplest version of this is to pass in three tokens at a time: { previous, current, next }, looping through the text as current is incremented.
This results in various parts of the tree having content modified and, occasionally some part of the tree is in a "complete" state (or "lit up") . 
Whenever that happens, the whole tree gets a "vault" signal, and each dictionary node will implement a "vault policy". Unused tokens are stored in a "dreg" or discard pile for external processing.

Just as tokens are passed down through the tree from parent to child, so also the vault command is sent from above.[I WISH IT WAS SO CLEAN]

Simple- dictionary
A simple dictionary has no children but has a word list with a readText( ) method. It returns true if token was matched to an entry or entries in the dictionary. A simple dictionary is "complete" when this match has occurred.
This flag can be cleared. The vault policy is: no vault but always clear the "complete" flag. Vaulting is considered a parent responsibility.

Product- dictionary
A dictionary composed of an array of dictionaries called its "dimensions". A product- dictionary is "complete" only when all of its dimensions are complete. Like a simple- dictionary the vault policy is: no vault but clear the "complete" flag. (It is temping to say this kind of dictionary is just a parent with all children needing to be complete. But that feels like the wrong ontology.)

State- dictionary
This is a dictionary with an internal state, including a default state. This kind of dictionary is always "complete" and occurs as a dimension of a parent dictionary. It's readText( ) method can change its internal state. Its vault policy is to reset state to the default. When the parent gets a vault command it uses this state's current state to determine what and how the parent vaults its other information. Afterwards, the parent sends the vault command into the state dictionary (resetting it to default).

Parent- dictionary
A dictionary with a list of child dictionaries. It can also contain a "self" word list. The readText( ) is
implemented by calling it on each child. Then it calls readText( ) on its self dicitonary. A parent- dictionary is "complete" if any one of its children is complete and [optionally] if its self dictionary is complete. The parent- implements a vault policy of saving data from its children and variables. Then it invokes the vault policy of its children.
We may distinguish parent "self" dictionaries that must be complete, for the parent to be complete, versus parents with a self dictionary which is not needed for completeness but may be used in the vault policy.
Call the second sort of parent a "lenient".
[In fact this idea was never put into the code.]

1 comment:

  1. It has all been subsumed under Narwhal and its trees of keyword lists.