Q: what  would happen  if unseen vocabulary occurs at the middle of the sequence( in POS labeling) in hiddden markov model. (MW)
A: in a pure HMM the emission probability of that word would be zero and as a consequence the model would assign a zero probability to the complete input string (because multiplying with zero always results in zero). One popular measure is to introduce an unknown word token (UNK) into the model. Unfortunately, it will (by definition) never occur in the training data. Therefore, it will also receive a zero emission probability. But smoothing helps. It redistributes probability mass from the seen to the unseen observations, and thus to the UNK token as well. That would even allow the model to assign a state label to the unknown word.
Additionally, a kind of guessing mechanism can be established which tries to estimate the probability of a certain output category.  E.g. unknown words ending in -ation are highly likely to be normal nouns in singular, words ending in -ing are less likely to be gerund verbs (as words like 'sing' are not).

Q: How can HMMs be used in combination with other machine learning models, and what are the benefits of doing so?[AH]
A: That happens often. In speech recognition and machine translation, the HMM for (acoustics and transfer respectively) are combined with language models. While the latter can be trained in  self-supervised mode on large amounts of monolingual data the former need annotated training data.
Even a combination with models from other ML paradigms is possible. Such models can be run in paralel and their output is either combined by a simple voting scheme or a separate classifier is trained to decide on the combination of the different outcomes.    

Q: how does named entity recognition(NER) recognize the entity like German or Jhon, does it use list of words in a dictionary?[YM]
A: As a starting point is does, since this is often lexical knowledge. Unfortunately, named entities are particularlx sensitive to out-of-vocabulary problems. In such cases, information from the context might help to compensate the lack of information, e.g. some verbs require persons as an agent. In the sentence 'XYZ talked about ...' it is highly save to assume that XYZ is a person. In a multi-pass approach this information can be used to correctly classify other occurrences of XYZ.
    
Q: how Ambiguities are handled in NER like the word Ethiopia -> it is a country name and could be person's name[YM]
A: It is the major advantage  of HMMs (and other approaches to sequence labeling) that they do not deal with isolated words, but with words in a sentential context. Compare the two sentences: 'Ethiopia kissed her husband.' and 'XY opened a representation in Ethiopia.' 

Q: are named entity recognizers pre-trained models? how can they contextually understand sentences?[YM]
A: No. Pretrained models are representation learners to be used in many different applications, while a NER is trained for  a very specific task. While pretrained models can be fine-tuned to be better adapted towards a certain task, NER models usually can not. Of course it is always possible to use the output of a pretrained model like word2vec or BERT in a NER system.

-- WolfgangMenzel - 09 Mar 2023
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback