next up previous contents index
Next: A scientific account of Up: Introduction Previous: Introduction   Contents   Index

A machine-translation example

Let us begin with an example of state-of-the-art machine translation. We have here a German sentence together with its English translation1.1:

Wovon man nicht sprechen kann, darüber mu man schweigen.
(Whereof one cannot speak, thereof one must be silent. [Translated by C.K. Ogden])

The German sentence is submitted to a popular machine translation system1.2, and the following translation in English is carried out automatically:

About which one cannot speak, over it one must be silent.

At first sight, the performance of the machine translation system seems fair. It is not a bad translation. In fact, the sense is kept almost faithfully except for somewhat bizarre wording. Heartened by this positive result, the translated English sentence is submitted to the machine translation system again, only this time the target language is set to German. We now have the following translation:

Über welche man nicht sprechen kann, über ihm mu man leise sein.

This result is amusing! For one thing, the English word ``it'' is translated as ``ihm''1.3, so we feel almost completely lost about what this word refers to. Perhaps more strangely, ``silent'' is translated as ``leise'', which should be considered an error as far as the meaning of the sentence is concerned. For a further test, this German sentence is submitted to the machine translation system again with English as target language. This time the translation turns out to be:

About which one cannot speak, over it one must be quiet.

It is sometimes surprising how creative the machine translation can be! Nevertheless, it cannot be considered a bad translation, for it has kept the sense of the ``bad'' German translation to a certain degree. Encouraged by the result, the experiment is continued. This sentence is submitted to the machine translation system again. Now we have:

Über welche man nicht sprechen kann, über ihm mu man ruhig sein.

Nevertheless, it is a translation which has gone a long way from the original. To see how far this procedure can go, the sentence is submitted to the machine translation again, this time we have

About which one cannot speak, over it one must be calm.

At this point, I guess one would be convinced that computers are not only creative but also humorous! The fact is, a computer program does not have the slightest understanding what an utterance might mean.

For one thing, a large-scale machine translation system has a huge linguistic database, in this case perhaps even bigger than that of a competent Germanic student. For someone who is unfamiliar with natural language processing (NLP), it is hard to believe that equipped with so much information (and so heavily invested), a computer cannot deliver a decent translation of a moderately complicated sentence. But this is no surprise for someone working in NLP. They know how difficult it is to program a computer so that it can understand a natural language sentence. In fact, many workers in NLP even assume that a computer will never understand what human says and direct their attention in more productive areas (such as computer-aided human NLP). The fact is, there is almost no adequate account of aboutness of natural language. At the present time, most NLP systems simply mechanically manipulate symbolic structures.

A problem immediately arises: are current main-stream NLP systems on the right track? After decades of endeavor in symbolic artificial intelligence (AI), we can hardly believe it is so [3,4]. For if it were the case, a state-of-the-art computer which can execute several million instructions per second (that could be millions of times faster than an ordinary human) would not have performed so poorly in natural language processing. The fact is, a computer can not even approximate a tiny fraction of human capability in natural language processing tasks. Indeed, it is very implausible that our own slow ``computer'' (the prevalent and one-sided, if not totally misleading, metaphor of the human mind) could achieve its current performance if it did not do it in a much smarter way than computers do. A revealing fact is to see how fast a computer can compile a very complex C++ program and how seldom an experienced C++ programmer can write a short program without a syntax error on the first try. A computer is a remarkable genius of Chomskyan languages [5,6], but natural language is not something it is good at.

Indeed, a common weakness of many NLP projects today can be mostly attributed to their inability to accommodate meaning and their unbalanced attention to syntax. Many errors of today's NLP systems can be traced to the radical differences between their way of representing meaning and context (or absence thereof) and that of a human. When we talk about syntax, this includes different kinds of semantic formalisms as well, because according to the computer metaphor of the human mind, slot-filler and category-instance can be regarded as syntactic objects at a more abstract level and therefore deprived of any meaning -- the meaning we human beings acquire in a bio-socio-cultural context. Specifically, meaning is something which is entangled with the experiences of individuals in a very complicated way. In this respect, meaning depends heavily on contexts -- linguistic, socio-cultural, and ontogenetic / phylogenetic biological factors, which are holistic in essence. This points out the first inadequacy of a computational approach, because classical computation is serial and local.

Moreover, something can make sense only if it makes sense for somebody, who must be a sentient being. So meaning is derived from subjectivity and intention. But there is no place for intention in a Turing machine -- a (for many, the) metaphor of the human mind. In this picture, at best, one has to smuggle intention into a program from without (that is, from the sentient program designer(s)) in order to ``breath the spirit into the nostril of the robot made of earth.'' Without an account of holistic context or sentient beings, we cannot avoid ending up with a theory of zombies. This summarizes the inadequacy of a top-down or computational approach as a unified scientific view of human mind and language. This also has an unfortunate impact on NLP, for meaning is the central issue of natural language understanding.

It is often argued, however, that NLP is an engineering discipline, thus the question of meaning is only remotely related to NLP and should be put off. Instead, it is argued, one should pay more attention to practical issues. But this view is very limited. History has taught us all too often that a more successful engineering (this includes medicine) is always based on a ``better'' science. Now how can we tell which theory is ``better''? An existing or an old theory backed up by authority does not make it automatically a good theory. A ``better'' science must explain Nature more intelligibly. Moreover, a ``good'' theory has to accommodate more facts -- especially anomalies, in addition to the facts deliberately selected to fit into the theory (the practitioners in a ``normal'' science tend to ignore the anomalies [7]; they usually postulate ad hoc solutions to these anomalies). So it usually begins with the account of anomalies. (We have already encountered an important anomaly that the top-down computational approach cannot account for -- holistic context and intention.)

At this moment, the reader may think I am advocating an alternative bottom-up or physicalist approach to mind and language. This is largely the case, but we should be careful not to fall into another questionable view -- that the human mind is the activities of a classical machine, or a clockwork. In this view, we will unfortunately end up with another theory of zombies. Before we continue, let us consider the hurdles for a theory of meaning in the existing scientific frameworks -- both from the top down and from the bottom up.


next up previous contents index
Next: A scientific account of Up: Introduction Previous: Introduction   Contents   Index
Joseph Chen 2002-09-05