I am posting this lest I forget about it, as an unfinished idea. It is about a meta notation on top of proto semantics. Some of it is pretty unclear particularly the notational substitution '=>' versus a semantic '::'. So for what it is worth:
With reference to the proto semantics, here we develop some abstract entities built from sets of proto semantic narratives.
Narrative Context
To analyze the meaning of a sentence dynamically as it begins, continues, and ends requires some kind of space or context comprising a set of narrative fragments (with an internal structure that remains TBD). To represent the set, use curly braces around the narratives and use semi-color ‘;’ to separate them as elements of the set. For example {X_/A} is a context containing one narrative ‘X_/A’. And { X->T ; T_/A } is a context containing two narratives ‘X->T’ and ‘T_/A’.
Narrative equivalence
An immediate advantage of this notation is the possibility of an equivalence relation between narratives other than exact identity. For example, we will be able to equate {(X)} and {X} and assume other equivalences like {X_/(A_/B)} and {(X_/A)_/B}. Therefore we define a one-way equivalence ‘=>’ which permits substitution of the right hand side for the left hand side; and we write ‘=’ for bi-directional substitution. I will also use a ‘+’ to indicate that two contexts in braces are part of a single larger context. Some typical eqivalences are given later as “postulates”.
Effects – notation relating actions to persistent properties
An effect is a persistent attribute arising from a action. As a covenience, for an effect A we use the notation dA to indicate the action it arises from or is changed by. Thus the verb narrative:
X-dA -> Y
Corresponds to to the property: Y_/A
For example we will write
{X-dA->Y} => {Y_/A}
This convention can be modified or extended as needed to handle other aspects of the action/attribute relation, for example changes to the actor and differences between beginning and ending of a persistent property.
This convention is not part of proto semantics because it is a notational convention and not part of a narrative.
Another immediate advantage of the curly brace notation is the possibility of symbolic definition like the following.
Postulates
Notational equivalences
{(X)} => {X}
{X}+{Y} = {X ; Y}
{X_/[A]} = {[X_/A]}
{XdA->Y} => {Y_/A}
X**=>X (WRONG!)
There are probably many other notational equivalences that could be assumed.
{X ; Y } = {X,Y}
{ X_/(A_/B) } = { (X_/A)_/B }
{X->Y, Z->Y } = {(X,Z)->Y }
{X->Y, X->Z} = {(X->(Y,Z)}
X::(Y,Z) = (X::Y,X::Z)
[X::Y]* = [X::Y*]
etc.
Dynamic equivalences (and the ghosts of syllogism)
{XdA->Y} => {Y_/A; X_/A’ } (This is the assumption of effects)
{A::B} => {B}
{A::B} + {B::C} => {A::C}
{A::C} => {A::[B]} + {[B]::C}
Manipulators
Then there are those myriad of grammatical modifiers: plurals, possesives, genders, etc. What do they do? They imbue the existing narrative fragment with auxilliary meanings, and luckily that can be expressed in the proto semantics using ‘_/’. But there is no end in sight for what conventions can be assumed about the implicit ‘[Z]’ in a particular manipulator. It is a large family of entities.
A manipulator is a narrative valued function M() returning a narrative. They typically take the form:
M(X) = [X_/Z]
(Although I am sure you can come up with others.) For example “he” is a function M(X)=[X_/male]. The gender can remain implicit unless neded for later narrative continuity.
I don’t know how to formulate this. But by 6N, ‘X_/[Z]’ is expected to be followed by ‘X_/Z’.
Quantifiers
The parts of speech known as quantifiers or determiners such as “all”, “one”, “some”, “most”, “much”, “fifteen”, “every”, “the”, “a” are set-theoretic and beyond the scope of proto semantics. For now let us ignore words like “the” or “a” if they occur in an expression to be analyzed. In the same spirit, let us treat pluralized entities as nouns of type ‘thing’.
Pronouns
The word “she” adds female attribute to the noun role:
“she was hungry”
If desired we can diagram this with a plus sign and an implicit additional fragment:
{she_/hungry} + {[she_/female]}
Conjunctions
From our point of view, the conjunctions are a mixed bag. The word “because” is equivalent to the becoming operator ‘::’. The words “and” and “or” perform the consecutive operator ‘,’. They take their meaning from the entities being juxtaposed.
One version of the word “and”, in an asymmetric use, anticipates what follows:
“I was hungry and…”
We can diagram this as:
{I_/hungry} + {[Z]}
Words like “but”, and “while” are particularly interesting as they both anticipate and exclude:
“I was hungry but…”
6 THE GHOST OF SYLLOGISM
A more dynamic view of narrative allows for a collecion of narrative fragments to evolve together within a context. Let me assume the context for an expression has already been narrowed down to being the description of an object – even (why not) an instance of a C++ object with a default constructor. Call this class the topic, or the narrow world, or the frame.
An expression comes in and is determined to be about the topic or not. If it is about the topic it is examined by the object instance and becomes part of its description – something you can now query the object about. It has become available information.
In this dynamic context a variety of narrative fragments can be considered equivalent and to define this equivalence we define resolution. We say a narrative resolves to another narrative when we observe it happening or wish to postulate that it happens. Here are some postulates:
The Diagramming Loop
Practical language recognition builds up dictionaries and narrative fragments, etc. But there is a lot of work needed to parse and prepare incoming text.
You have incoming text:
~~~~~~~~~~~~~~~~~~~~~~~
You need topic specific dictionaries, fragments, punctuation handling, and other things in order to consider all possible meaningful tokenizations. For example “subgingival” could be broken in two, and “1.7 mm” could be retained as a single token. However you do it, you end by considering one or more tokenizations:
Assume we have “dictionaries” (small synonym lists) D1, D2, D3 and one narrative fragment N(d1,d2,d3) where the di are from dictionary Di; and a tokenization.
~~~ ~~~ ~ ~~~~~ ~~ ~~~~~
Every token should be marked as something in a known dictionary, or as an unrecognized dreg. (The basic rule for dregs is that if they come in a group you can ignore them as off topic. But when they are mixed with dictionary words, it is a bit riskier to ignore them.) Anyway, you get an ordered list of tokens:
~~~ ~~~ ~ ~~~~~ ~~ ~~~~~
Let D be the union of the Di. Here is an algorithm:
Step Find the first element of D among the ordered tokens, fill in that entry in N() . Then read to the right looking for other dictionary entries. Either all slots of N() are filled or we run out of tokens. The result is a partially or completely filled narrative object, say N1. Now throw away all tokens used and keep the dregs. If what is left is more than dregs, repeat the Step to produce another partially or completely filled N2. Proceed this way generating N1,N2, etc. until all that is left is dregs. The N1, N2, etc. are the candidate interpretations of the text.
The theory of Best Models says we can entertain all the alternative Nj, and try to choose one based on a “goodness of fit measure”. For the most part this will be a measure of how completely the narrative is filled and how “cleanly” within the grammar of the source language. Starting to get it? I think I am.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment