In quantum mechanical terms, the state of affairs that is associated with a natural language utterance is a superposition of eigenstates of an eigenbasis pertaining to a specific vocabulary
. The vocabulary is a set consisting of all symbols found in a language. Moreover, all these symbols are eigenstates corresponding to a language formulation operator
. That is,
where
and
is the projection of
on
.
In practice, a natural language utterance is usually written as an orthographic string. Generally speaking, this can be a string of phonetic transcriptions. In a sense, we are free to choose our ``atomic'' symbol set (alphabets, phonetic symbols, or ideographs). In the problems tackled in this chapter, however, orthographic words are used as the building blocks (symbols or eigenstates) of the string. For example, the eigenstate corresponding to the word loves can be denoted by
Our first question is then: how can we put together a string of symbols to refer to a state of affairs? Since we are taking a physicalist account, the answer is to be found in physics. We need a particular unitary operator (called the preparation operator
, which is a function of time
) to place a particular symbol in its particular position in an utterance. In general, the unitary operator
can be written as,
where
is an Hermitian operator. Suppose the string is constructed incrementally, we have,
where
is a string of symbols in the orthographic natural language utterance;
is the length of the string;
is the time of utterance of the
-th symbol;
is the phase (argument of a complex number) of the
-th symbol. Generally speaking, the preparation operator
may ``mix'' up one symbol with others if
is not a diagonal matrix. Indeed, this could occur quite often in natural language7.5. However, for simplicity, we assume that the symbols in the miniature languages discussed in this chapter do not mix with each other. That is,
is a diagonal matrix. In this case, we have
where
is the
-th diagonal component of
;
is the size of the vocabulary. To make the model even simpler, we assume that all
are equal. Furthermore, we assume that the symbols in a string are uttered at uniform intervals (
) and the argument
of each eigenstate
is zero. Thus we have, after all these simplifications,
A state of affairs
thus prepared is subject to a unitary operator
(the reasoning operator). That is,
We can then use the conjugate gradient method [41], starting with a small random initial vector to calculate
.
Once
is calculated, an unseen state of affairs can be subject to the same reasoning operator
. The end state of affairs should be then measured to generate the result of the natural language processing task. One should note that since the input state of affairs is not normalized, the end state of affairs is not normalized either. But this is not relevant because what we are interested in is an orthographic result; only the relative probability is crucial. Here, one needs another operator to generate the orthographic string. This should be a time-varying quantum state associated with the resulting utterance. This can be quite tricky and is very time-consuming to train7.6. Therefore, in this preliminary study a classical combinatorial optimizer is employed. Specifically, this is done by backward superposing possible orthographic strings and comparing them with the end state of affairs. Each candidate is given a score, which is calculated by preparing a candidate state according to Equation 7.1 and by calculating the absolute value of the complex inner product of the normalized state with the normalized end state of affairs. That is,
where
is a candidate state of affairs and
is the end state. In the ideal case, the inner product should be unity (1) for a perfect candidate. Since the vocabulary can be quite large, we suffer a combinatorial explosion if one employs a ``brute force'' (complete search) method. We therefore need heuristics to avoid such a disaster. This is done according to the following algorithm,
0. Normalize the end state of affairs; set the initial threshold
Theta=0.01;
1. Build a set S of all symbols with absolute value greater or equal
to Theta;
2. Calculate the score of each permutation in S; notice the one with
best score;
3. Theta := Theta+0.01;
4. If Theta <= 0.4 goto step 1;
5. Output the permutation with best score.
The string that yields the best score is taken as the orthographic result. The scheme described above is illustrated in Figure 7.3. We are now ready to apply this framework to NLP tasks.