Main Page | Modules | Alphabetical List | Data Structures | File List | Data Fields | Related Pages

Lexemgraph - maintainance of lexem graphs


Detailed Description

Author:
Ingo Schroeder (see also AUTHORS and THANKS for more)
Date:
1997-03-04
Id
lexemgraph.c,v 1.140 2004/09/27 17:07:05 micha Exp

Id
lexemgraph.h,v 1.68 2004/09/01 14:01:31 micha Exp


Functions

long long computeNoOfPathsFromStart (LexemGraph lg, GraphemNode gn, long long sofar, long long maximal)
long long computeNoOfPathsToEnd (LexemGraph lg, GraphemNode gn, long long sofar, long long maximal)
GraphemNode gnClone (GraphemNode gn, Lattice lat)
Boolean lgAreDeletableNodes (LexemGraph lg, List lexemes)
Boolean lgAreDeletedNodes (LexemGraph lg, List lexemes)
LexemGraph lgClone (LexemGraph lg)
Boolean lgCompatibleNodes (LexemGraph lg, LexemNode a, LexemNode b)
Boolean lgCompatibleSets (LexemGraph lg, List a, List b)
void lgComputeDistances (LexemGraph lg)
void lgComputeNoOfPaths (LexemGraph lg)
Boolean lgContains (LexemGraph lg, String form)
Boolean lgCopySelection (LexemGraph destination, LexemGraph source)
void lgCopyTagScores (LexemGraph destination, LexemGraph source)
void lgDelete (LexemGraph lg)
void lgDeleteNode (LexemGraph lg, LexemNode ln)
void lgDeleteNodes (LexemGraph lg, List nodes)
int lgDistanceOfNodes (LexemGraph lg, LexemNode a, LexemNode b)
Boolean lgForbiddenBy (LexemGraph lg, LexemNode ln, List lexemes)
void lgInitialize ()
Boolean lgIntersectingSets (List a, List b)
Boolean lgIsDeletedNode (LexemGraph lg, LexemNode ln)
Boolean lgIsEndNode (GraphemNode n)
Boolean lgIsStartNode (GraphemNode n)
Boolean lgLexemeInLexemNodeList (LexiconItem le, List list)
List lgMakePath (LexemGraph lg, List nodes)
Boolean lgMayModify (LexemGraph lg, GraphemNode down, GraphemNode up)
Boolean lgMember (LexemNode ln, List lexemes)
List lgMostProbablePath (LexemGraph lg)
LexemGraph lgNew (Lattice lat)
Boolean lgNewFinal (LexemGraph lg)
LexemGraph lgNewInit ()
Boolean lgNewIter (LexemGraph lg, Arc arc)
Boolean lgOverlap (LexemNode a, LexemNode b)
List lgPartitions (GraphemNode gn, BitString features)
void lgPrint (long unsigned int mode, LexemGraph lg)
void lgPrintNode (unsigned long mode, LexemNode ln)
List lgQueryCat (LexemGraph lg, GraphemNode gn)
void lgRequireLexeme (LexemGraph lg, ByteVector v, LexemNode ln)
void lgRequireLexemes (LexemGraph lg, ByteVector v, List which)
Boolean lgSimultaneous (LexemNode a, LexemNode b)
Boolean lgSpuriousUppercase (LexemGraph lg, Arc arc)
Boolean lgSubset (List a, List b)
Boolean lgUpdateArcs (LexemGraph lg, Lattice lat, List listArcs)
int lgWidth (LexemGraph lg)

Variables

Boolean lgCompactLVs = TRUE


Function Documentation

long long computeNoOfPathsFromStart LexemGraph  lg,
GraphemNode  gn,
long long  sofar,
long long  maximal
 

computes LexemGraph::noOfPathsFromStart

This function computes the number of paths leading to gn from the start of lg. If gn corresponds to a start node, this is simply the number of lexeme nodes sprung from gn. Otherwise it is that number multiplied by the sum of the numbers of paths leading from the start to immediately preceding grapheme nodes. If gn is deleted, the number is always zero. Definition at line 128 of file lexemgraph.c.

References GraphemNodeStruct::arc, CDG_ERROR, cdgPrintf(), GraphemNode, LexemGraphStruct::graphemnodes, GraphemNodeStruct::live, LexemGraphStruct::min, GraphemNodeStruct::no, and LexemGraphStruct::noOfPathsFromStart.

Referenced by lgComputeNoOfPaths().

long long computeNoOfPathsToEnd LexemGraph  lg,
GraphemNode  gn,
long long  sofar,
long long  maximal
 

computes LexemGraph::noOfPathsToEnd

This function computes the number of paths leading from gn to the end of lg. If gn corresponds to an end node, this is simply the number of lexeme nodes sprung from g. Otherwise it is that number multiplied by the sum of the numbers of paths leading to the end from immediately following grapheme nodes. If gn is deleted, the number is always zero. Definition at line 180 of file lexemgraph.c.

References GraphemNodeStruct::arc, CDG_ERROR, cdgPrintf(), GraphemNode, LexemGraphStruct::graphemnodes, GraphemNodeStruct::live, LexemGraphStruct::max, GraphemNodeStruct::no, and LexemGraphStruct::noOfPathsToEnd.

Referenced by lgComputeNoOfPaths().

GraphemNode gnClone GraphemNode  gn,
Lattice  lat
 

Clone a grapheme node.

The field GraphemNode::lexemes is not set; the caller has to do that. (The two-way links between grapheme nodes and lexeme nodes can be set easier when all nodes are known.) Definition at line 1713 of file lexemgraph.c.

References GraphemNodeStruct::arc, GraphemNodeStruct::chunk, GraphemNode, GraphemNodeStruct::lexemes, GraphemNodeStruct::lexemgraph, GraphemNodeStruct::no, and NULL.

Referenced by lgClone().

Boolean lgAreDeletableNodes LexemGraph  lg,
List  lexemes
 

This function checks whether all lexeme nodes passed in lexemes can be deleted at the same time. This is the case if doing so will leave at least one complete path though the lexeme graph, according to the current state of deletions. For this end, the function checks whether the sum of the number of paths through each lexeme node is smaller than the total number of paths in lg.

Precondition:
lexemes must be a List of lexeme nodes with identical time spans. If this is not the case, the behaviour is undefined.
Definition at line 979 of file lexemgraph.c.

References CDG_ERROR, cdgPrintf(), GraphemNode, LexemGraphStruct::isDeletedNode, GraphemNodeStruct::lexemes, LexemNode, LexemNodeStruct::no, GraphemNodeStruct::no, LexemGraphStruct::noOfPaths, LexemGraphStruct::noOfPathsFromStart, LexemGraphStruct::noOfPathsToEnd, NULL, and TRUE.

Referenced by cnOptimizeNode().

Boolean lgAreDeletedNodes LexemGraph  lg,
List  lexemes
 

This checks if the lexemes have been deleted TRUE is returned, if not FALSE Definition at line 957 of file lexemgraph.c.

References FALSE, LexemNode, lgIsDeletedNode(), NULL, and TRUE.

Referenced by cnBuildIter(), cnOptimizeNode(), cnPrint(), and lgComputeDistances().

LexemGraph lgClone LexemGraph  lg  ) 
 

Clone a lexeme graph.

This performs a totally deep copy; even the underlying lattice, lexicon items etc. are cloned. Definition at line 1735 of file lexemgraph.c.

References LexemNodeStruct::arc, LexemGraphStruct::chunks, LexemGraphStruct::distance, gnClone(), LexemNodeStruct::grapheme, GraphemNode, LexemGraphStruct::graphemnodes, LexemGraphStruct::isDeletedNode, LexemGraphStruct::lattice, LexemNodeStruct::lexem, LexemNode, lgComputeDistances(), lgComputeNoOfPaths(), lgCopyTagScores(), LexemNodeStruct::limit, LexemGraphStruct::max, LexemGraphStruct::min, LexemNodeStruct::no, GraphemNodeStruct::no, LexemGraphStruct::nodes, LexemGraphStruct::noOfPathsFromStart, LexemGraphStruct::noOfPathsToEnd, and NULL.

Boolean lgCompatibleNodes LexemGraph  lg,
LexemNode  a,
LexemNode  b
 

returns TRUE if lexem nodes a and b exist on one path.

This function checks whether, in principle, a complete path can exist through lg that includes both a and b. This is independent of the current state of deletions. In fact, the function merely checks whether the distance between the nodes is not 0 by using lgDistanceOfNodes(). Note that two nodes are not automatically compatible merely because they do not overlap in time. Also, a lexeme node is not compatible with itself by this definition. Definition at line 847 of file lexemgraph.c.

References CDG_ERROR, cdgPrintf(), LexemNode, lgDistanceOfNodes(), and NULL.

Referenced by cnOptimizeNode(), lgCompatibleSets(), lgForbiddenBy(), lgMakePath(), lgRequireLexeme(), and lgRequireLexemes().

Boolean lgCompatibleSets LexemGraph  lg,
List  a,
List  b
 

checks if these sets of lexemes are compatible, i.e. either unrelated or intersecting?

Precondition:
Both a and b must be Lists of lexeme nodes spanning the same respective time interval. If this is not the case, the behaviour is undefined.
This function checks whether both sets of lexeme nodes may be selected in a solution. This is defined as follows:
  • If either set is empty, the result is TRUE
  • If the first elements of a and b are compatible, the result is TRUE
  • If the sets intersect, the result is TRUE
  • Otherwise the result is FALS
Definition at line 903 of file lexemgraph.c.

References LexemNodeStruct::arc, FALSE, LexemNode, lgCompatibleNodes(), NULL, and TRUE.

void lgComputeDistances LexemGraph  lg  ) 
 

(re-)computes the distance matrix LexemGraph::distance

This function computes the distance between any two lexeme nodes in lg and stores the result in lg->distance. Definition at line 56 of file lexemgraph.c.

References GraphemNodeStruct::arc, LexemGraphStruct::distance, GraphemNode, LexemGraphStruct::graphemnodes, GraphemNodeStruct::lexemes, and lgAreDeletedNodes().

Referenced by cnRenew(), lgClone(), lgDeleteNode(), lgDeleteNodes(), and lgNewFinal().

void lgComputeNoOfPaths LexemGraph  lg  ) 
 

computes # of paths possible in the graph, given the current state of deletions.

This function computes the number of paths possible in lg, according to the state of its Vector LexemGraph::isDeletedNode. It calls computeNoOfPathsToEnd() and computeNoOfPathsFromStart() for each lexeme node. The total number of all paths is the sum of all numbers of paths leading to grapheme nodes that are end nodes. Definition at line 232 of file lexemgraph.c.

References GraphemNodeStruct::arc, computeNoOfPathsFromStart(), computeNoOfPathsToEnd(), FALSE, GraphemNode, LexemGraphStruct::graphemnodes, LexemGraphStruct::isDeletedNode, GraphemNodeStruct::lexemes, LexemNode, GraphemNodeStruct::live, LexemGraphStruct::min, LexemNodeStruct::no, GraphemNodeStruct::no, LexemGraphStruct::nodes, LexemGraphStruct::noOfPaths, LexemGraphStruct::noOfPathsFromStart, LexemGraphStruct::noOfPathsToEnd, NULL, and TRUE.

Referenced by cnRenew(), lgClone(), lgDeleteNode(), lgDeleteNodes(), and lgNewFinal().

Boolean lgContains LexemGraph  lg,
String  form
 

Does a lexemgraph contain at least one instance of a given form?

This function checks whether lg contains at least one instance of the form form. Capitalized versions of form are permissible if they are spurious (cf. lgSpuriousUppercase()). Definition at line 1657 of file lexemgraph.c.

References GraphemNodeStruct::arc, FALSE, GraphemNode, LexemGraphStruct::graphemnodes, lgSpuriousUppercase(), and TRUE.

Boolean lgCopySelection LexemGraph  destination,
LexemGraph  source
 

Select the path in DST whose parts most closely match SRC.

This function inspects the undeleted words in source and undeletes those words in destination that most closely correspond to them. (This is necessary because two lexeme graphs built from the same lattice may have their nodes in different order, so you cannot simply re-use an LexemGraph::isDeletedNode vector across lexeme graphs.) Definition at line 1569 of file lexemgraph.c.

References LexemNodeStruct::arc, FALSE, LexemGraphStruct::isDeletedNode, LexemNodeStruct::lexem, LexemNode, LexemNodeStruct::no, LexemGraphStruct::nodes, NULL, and TRUE.

void lgCopyTagScores LexemGraph  destination,
LexemGraph  source
 

This function simply transfers the field LexemGraph::tagscore from each node in source to the corresponding node in destination. (This is only useful to save repeated invocation of taggerTag() for two graphs produced from the same lattice.) Definition at line 1498 of file lexemgraph.c.

References Chunk, GraphemNodeStruct::chunk, chunkerCloneChunk(), chunkerReplaceGraphemes(), LexemGraphStruct::chunks, GraphemNode, LexemNode, ChunkStruct::nodes, LexemGraphStruct::nodes, NULL, ChunkStruct::subChunks, LexemGraphStruct::tags, and LexemNodeStruct::tagscore.

Referenced by lgClone().

void lgDelete LexemGraph  lg  ) 
 

deletes LexemGraph

This function deallocates a lexeme graph. This deallocates all parts of the structure, even the lexeme nodes and lexical entries themselves. The lexicon remains unchanged as the LexicalEntry structures are merely clones of the structures in inputCurrentGrammar. Definition at line 1313 of file lexemgraph.c.

References chunkerChunkDelete(), LexemGraphStruct::chunks, LexemGraphStruct::distance, GraphemNode, LexemGraphStruct::graphemnodes, LexemGraphStruct::isDeletedNode, LexemNodeStruct::lexem, GraphemNodeStruct::lexemes, LexemNode, LexemGraphStruct::nodes, LexemGraphStruct::noOfPathsFromStart, LexemGraphStruct::noOfPathsToEnd, NULL, and LexemGraphStruct::tags.

Referenced by cmdAnno2Parse(), cmdChunk(), cnDelete(), and lgNew().

void lgDeleteNode LexemGraph  lg,
LexemNode  ln
 

deletes a node from the lexeme graph itself.

This function marks a lexeme node as deleted. It does this by setting the cell ln->no in the Vector lg->isDeletedNode. If this destroys the last possible path through lg=, a warning is displayed. This function always re-computes the number of remaining paths in lg. Definition at line 1045 of file lexemgraph.c.

References CDG_WARNING, cdgPrintf(), LexemGraphStruct::isDeletedNode, LexemNodeStruct::lexem, LexemNode, lgComputeDistances(), lgComputeNoOfPaths(), LexemNodeStruct::no, LexemGraphStruct::noOfPaths, and TRUE.

Referenced by cnOptimizeNode().

void lgDeleteNodes LexemGraph  lg,
List  nodes
 

delete a list of lexeme nodes

This function behaves as lgDeleteNode() were called on each element of the nodes, but it is more efficient since it only re-computes the number of remaining paths once. Definition at line 1075 of file lexemgraph.c.

References CDG_WARNING, cdgPrintf(), LexemGraphStruct::isDeletedNode, LexemNodeStruct::lexem, LexemNode, lgComputeDistances(), lgComputeNoOfPaths(), LexemNodeStruct::no, LexemGraphStruct::noOfPaths, NULL, and TRUE.

Referenced by cnOptimizeNode().

int lgDistanceOfNodes LexemGraph  lg,
LexemNode  a,
LexemNode  b
 

returns a distance measure for two lexem nodes

This function computes the logical distance between a and b, measured in words. Usually this is just the corresponding element of LexemGraph::distance. If either of the nodes is underspecified it is treated as if it followed the latest specified lexeme node directly. Hence, the return value may be greater than value in LexemGraph::distance. Two underspecified lexeme nodes are considered to have distance zero. Definition at line 776 of file lexemgraph.c.

References CDG_ERROR, cdgPrintf(), LexemGraphStruct::distance, LexemNodeStruct::grapheme, LexemNode, GraphemNodeStruct::no, and NULL.

Referenced by cmdDistance(), lgCompatibleNodes(), and lgMayModify().

Boolean lgForbiddenBy LexemGraph  lg,
LexemNode  ln,
List  lexemes
 

does existence of these lexemes exclude that lexeme.

Precondition:
lexemes must be a List of lexeme nodes with identical time spans. If this is not the case, the behaviour is undefined.
This function checks whether the List lexemes and the lexeme node ln can be selected in a solution. This is the case if either of the following holds:
  • lexemes is empty
  • ln is compatible with the first element of lexemes
  • lexemes contains ln

In these cases FALSE is returned (ln is not forbidden). Otherwise TRUE is returned. Definition at line 872 of file lexemgraph.c.

References FALSE, LexemNode, lgCompatibleNodes(), and TRUE.

void lgInitialize  ) 
 

Initialize the input module.

This function initializes the module Lexemgraph and registers the variable compactlevelvalues. Definition at line 1615 of file lexemgraph.c.

References lgCompactLVs, and NULL.

Referenced by cdgInitialize().

Boolean lgIntersectingSets List  a,
List  b
 

Do two lexeme lists intersect.

Precondition:
Both a and b must be Lists of lexeme nodes spanning the same respective time interval. If this is not the case, the behaviour is undefined.
This function checks whether a and b intersect. Definition at line 1211 of file lexemgraph.c.

References FALSE, LexemNode, lgSimultaneous(), NULL, and TRUE.

Boolean lgIsDeletedNode LexemGraph  lg,
LexemNode  ln
 

This checks if a lexem node has been deleted TRUE is returned, if not FALSE. Definition at line 947 of file lexemgraph.c.

References LexemGraphStruct::isDeletedNode, LexemNode, and LexemNodeStruct::no.

Referenced by cnOptimizeNode(), lgAreDeletedNodes(), and lgMakePath().

Boolean lgIsEndNode GraphemNode  n  ) 
 

returns TRUE if node is an end node

This function checks whether n->arc->to is equal to the maximal time point in the lexeme graph.

Todo:
should nodes on segment boundaries of incremental parsed input be considered to be endnodes
Definition at line 758 of file lexemgraph.c.

References GraphemNodeStruct::arc, FALSE, GraphemNode, GraphemNodeStruct::lexemgraph, LexemGraphStruct::max, and NULL.

Referenced by cnIsEndNode().

Boolean lgIsStartNode GraphemNode  n  ) 
 

returns TRUE if node is a start node

This function checks whether n->arc->from is equal to the minimal time point in the lexeme graph. Definition at line 741 of file lexemgraph.c.

References GraphemNodeStruct::arc, FALSE, GraphemNode, GraphemNodeStruct::lexemgraph, LexemGraphStruct::min, and NULL.

Referenced by cnIsStartNode().

Boolean lgLexemeInLexemNodeList LexiconItem  le,
List  list
 

This function checks whether at least one of the LexemNode structures in list points to a lexicon element le Definition at line 1549 of file lexemgraph.c.

References FALSE, LexemNodeStruct::lexem, LexemNode, NULL, and TRUE.

List lgMakePath LexemGraph  lg,
List  nodes
 

Takes a set of lexeme nodes, and extends it to a complete path through the graph, composed of undeleted LexemNodes. Returns NULL if this is impossible, It returns a List of lexeme nodes that

  1. is a superset of nodes
  2. corresponds to a complete path through the graph and
  3. contains only undeleted lexeme nodes.

If this is not possible, NULL is returned.

We do this by simply appending arbitrary non-contradictory nodes until we have bound all time points. Note that for this approach to be correct, there must not be any undeleted dangling nodes in the graph. This condition must have ensured by cnOptimizeNet(). Definition at line 1250 of file lexemgraph.c.

References LexemNodeStruct::arc, FALSE, LexemNode, lgCompatibleNodes(), lgIsDeletedNode(), LexemGraphStruct::max, LexemGraphStruct::min, LexemGraphStruct::nodes, NULL, and TRUE.

Boolean lgMayModify LexemGraph  lg,
GraphemNode  down,
GraphemNode  up
 

may these words modify each other?

Precondition:
lexemes must be a List of lexeme nodes with identical time spans. If this is not the case, the behaviour is undefined.
This function checks whether an LevelValue can exist with the modifiers exemes and a modifiee from gn. This is the case iff both can coexist on one path and do not overlap. Definition at line 824 of file lexemgraph.c.

References GraphemNode, GraphemNodeStruct::lexemes, LexemNode, lgDistanceOfNodes(), and TRUE.

Referenced by cnBuildLevelValues().

Boolean lgMember LexemNode  ln,
List  lexemes
 

is this lexeme a member of the this set?

Precondition:
lexemes must be a List of lexeme nodes with identical time spans. If this is not the case, the behaviour is undefined.
This function checks whether ln is an element of lexemes.

In the following cases ln is not a member (return FALSE);

  • a NIL binding (ln is NULL) is not an element of anything
  • an empty set (lexemes is NULL) has no member,
  • ln belongs to another timespan
  • ln is not contained litteraly in the set

Otherwise TRUE is returned. Definition at line 1152 of file lexemgraph.c.

References FALSE, LexemNode, lgSimultaneous(), and NULL.

Referenced by lgSubset().

List lgMostProbablePath LexemGraph  lg  ) 
 

Returns the most probable path, as defined by tagging scores. Definition at line 1676 of file lexemgraph.c.

References CDG_WARNING, cdgPrintf(), GraphemNode, LexemGraphStruct::graphemnodes, LexemGraphStruct::lattice, GraphemNodeStruct::lexemes, LexemNode, NULL, and LexemNodeStruct::tagscore.

LexemGraph lgNew Lattice  lat  ) 
 

This function creates a lexeme graph from a Lattice lat and a cdg lexicon. For each arcs of the lattice a grapheme node is allocated and annotated with all possible lexical entries. (If there is no lexical entry for an arc, a warning is given, but processing continues.)

For each grapheme node, as many lexeme nodes are created as there are lexical alternatives in the lexicon.

Furthermore:

Definition at line 685 of file lexemgraph.c.

References LexemGraphStruct::lattice, lgDelete(), lgNewFinal(), lgNewInit(), lgNewIter(), and NULL.

Referenced by cmdAnno2Parse(), cmdChunk(), and cnTag().

Boolean lgNewFinal LexemGraph  lg  ) 
 

does the final computations for the lexemgraph

This function sets those fields of lg that can only be computed after all lexeme nodes are present:

The function can fail returning FALSE if there is no valid path through the lexeme graph. Definition at line 618 of file lexemgraph.c.

References CDG_INFO, CDG_WARNING, cdgPrintf(), FALSE, LexemGraphStruct::graphemnodes, LexemGraphStruct::isDeletedNode, LexemGraphStruct::lattice, lgComputeDistances(), lgComputeNoOfPaths(), lgPrint(), LexemGraphStruct::max, LexemGraphStruct::noOfPaths, NULL, LexemGraphStruct::tags, and TRUE.

Referenced by lgNew(), and lgUpdateArcs().

LexemGraph lgNewInit  ) 
 

initializes the lexemgraph

This function returns a new LexemGraph structure with all fields initialized to meaningless values. In particular, it contains no nodes whatsoever. Definition at line 427 of file lexemgraph.c.

References LexemGraphStruct::chunks, LexemGraphStruct::distance, LexemGraphStruct::graphemnodes, LexemGraphStruct::isDeletedNode, LexemGraphStruct::lattice, LexemGraphStruct::max, LexemGraphStruct::min, LexemGraphStruct::nodes, LexemGraphStruct::noOfPathsFromStart, LexemGraphStruct::noOfPathsToEnd, NULL, and LexemGraphStruct::tags.

Referenced by lgNew().

Boolean lgNewIter LexemGraph  lg,
Arc  arc
 

Insert lexeme nodes into the LexemGraph that correspond the Arc.

This function builds all possible lexeme nodes for the specific arc and adds them to lg. It fails if there is no matching entry in the lexicon.

Maybe undo capitalisation introduced by orthographic convention.

If the written word is uppercase, but that uppercase-ness is suspect because it is at the start of a phrase and might be mere orthographic convention, we have to decide which version we use for lexicon lookup.

If our lexicon contains items for the lower-case version but none for the upper-case versions, we use only those; if it contains only items for the upper-case version, we use those; and if it contains neither, we allow both and hope that there is a lexical template which will catch this word.

We do not use the obvious solution - look up both versions whenever a word is spurious - because it has the following defect: If a sentence starts with `Der', some naive lexical template could introduce a noun reading, and if POS tagging allows, it might actually survive even though it is exceedingly unlikely. Since we do not want this, we effectively force the reading to be `der'.

Moral: If you really need to have open-class items in your lexicon that are near-homonymous with closed-class items, you can bloody well write proper lexicon items for them and not templates.

Much the same goes for words in ALL UPPER CAPS, except that those can occur anywhere in a sentence, not only at the start, and we have to check three different spellings instead of two.

In one-letter words, the intermediate version is indistinguishable from the third one, so we suppress it. Definition at line 454 of file lexemgraph.c.

References GraphemNodeStruct::arc, LexemNodeStruct::arc, CDG_WARNING, cdgPrintf(), GraphemNodeStruct::chunk, FALSE, LexemNodeStruct::grapheme, GraphemNode, LexemGraphStruct::graphemnodes, LexemGraphStruct::isDeletedNode, LexemNodeStruct::lexem, GraphemNodeStruct::lexemes, GraphemNodeStruct::lexemgraph, LexemNodeStruct::lexemgraph, LexemNode, lgSpuriousUppercase(), LexemNodeStruct::limit, LexemGraphStruct::max, max, LexemGraphStruct::min, min, GraphemNodeStruct::no, LexemNodeStruct::no, LexemGraphStruct::nodes, NULL, LexemNodeStruct::tagscore, and TRUE.

Referenced by lgNew(), and lgUpdateArcs().

Boolean lgOverlap LexemNode  a,
LexemNode  b
 

Do these lexeme nodes overlap?

Returns TRUE if the two lexeme nodes have at least one time point in common.

This is subtly different from the more common question, "Can the two nodes coexist on one path?": two nodes can be compatible although they overlap if they are identical. Conversely, a and b may be incompatible even if they do not overlap if there is no path between them. Definition at line 1416 of file lexemgraph.c.

References LexemNodeStruct::arc, and LexemNode.

List lgPartitions GraphemNode  gn,
BitString  features
 

partitions a set of lexeme nodes into equivalence classes

This function partitions the set of lexeme nodes of gn into equivalence classes. The equivalence relation used is the function inputCompareLeByAtts() with the argument features. The function returns a new List of new lists of lexemes. (The latter are re-used in ConstraintNode structures, the former are deallocated by cnBuildNodes().)

  result = [];

  FOR each lexeme l:
   IF l fits into one of the known classes,
    insert l there;
   ELSE
    create new class [lexem];
    insert the new class into result;
   FI
  ROF

  return result.
Definition at line 306 of file lexemgraph.c.

References CDG_DEBUG, cdgPrintf(), GraphemNode, LexemNodeStruct::lexem, GraphemNodeStruct::lexemes, LexemNode, lgCompactLVs, and NULL.

Referenced by cnBuildLevelValues().

void lgPrint long unsigned int  mode,
LexemGraph  lg
 

print lexem graph

This function displays a textual representation of the lexeme graph lg. Definition at line 708 of file lexemgraph.c.

References LexemNodeStruct::arc, cdgPrintf(), chunkerPrintChunks(), LexemGraphStruct::chunks, LexemGraphStruct::isDeletedNode, LexemGraphStruct::lattice, LexemNodeStruct::lexem, LexemNode, lgPrint(), LexemNodeStruct::no, LexemGraphStruct::nodes, and LexemNodeStruct::tagscore.

Referenced by lgNewFinal(), and lgPrint().

void lgPrintNode unsigned long  mode,
LexemNode  ln
 

prints out a lexeme node

This function displays the identifier and the time span of ln in the format der_nom(0,1). Definition at line 1395 of file lexemgraph.c.

References LexemNodeStruct::arc, cdgPrintf(), LexemGraphStruct::isDeletedNode, LexemNodeStruct::lexem, LexemNodeStruct::lexemgraph, LexemNode, and LexemNodeStruct::no.

Referenced by cnOptimizeNode().

List lgQueryCat LexemGraph  lg,
GraphemNode  gn
 

What categories can this node represent? (Needed while tagging.).

This function queries the lexicon about what syntactical categories gn can represent. (The syntactical category is that feature whose index is taggerCategoryIndex.) This function is used to check whether an assignment by the tagger can be honored by the lexicon. Definition at line 1629 of file lexemgraph.c.

References GraphemNode, LexemNodeStruct::lexem, GraphemNodeStruct::lexemes, LexemNode, and NULL.

void lgRequireLexeme LexemGraph  lg,
ByteVector  v,
LexemNode  ln
 

Takes a Vector of Boolean, and sets all cells that correspond to the numbers of nodes incompatible with ln. This function can be used in combination with lvVectorCompatible() to decide whether an LV is compatible with a set of other LVs. Definition at line 1428 of file lexemgraph.c.

References LexemNode, lgCompatibleNodes(), LexemNodeStruct::no, LexemGraphStruct::nodes, and TRUE.

void lgRequireLexemes LexemGraph  lg,
ByteVector  v,
List  which
 

This function is similar to lgRequireLexeme(), but takes a List of lexeme nodes. It marks all those lexeme nodes that are incompatible with all lexemnodes of which. Definition at line 1449 of file lexemgraph.c.

References LexemNode, lgCompatibleNodes(), LexemNodeStruct::no, LexemGraphStruct::nodes, and TRUE.

Boolean lgSimultaneous LexemNode  a,
LexemNode  b
 

do the lexemes span the same time interval?

This function checks whether a and b cover the same time span. An argument of NONSPEC always causes TRUE to be returned. However, the NULL node is not simultaneous to any lexeme node, not even to another root node. Definition at line 1108 of file lexemgraph.c.

References LexemNodeStruct::arc, CDG_WARNING, cdgPrintf(), FALSE, LexemNodeStruct::lexem, LexemNode, NULL, and TRUE.

Referenced by lgIntersectingSets(), lgMember(), and lgSubset().

Boolean lgSpuriousUppercase LexemGraph  lg,
Arc  arc
 

Might this be a lowercase word that is spelled in upper case because of orthographic convention?

Spurious uppercase must be an upper case letter...

... followed by a lower case letter.

This is another instance of the "wordgraphs start at 0" assumption.

Ordinarily, this would be wrong, since the lexeme graph might start at some other time point. However, at this time lg->min may not be initialized, so we can't check it. Since spurious upper case only occurs in written text, and weird time points occur mainly in recognizer output for spoken text, I'm letting it pass here. Definition at line 1785 of file lexemgraph.c.

References FALSE, LexemGraphStruct::lattice, and TRUE.

Referenced by lgContains(), and lgNewIter().

Boolean lgSubset List  a,
List  b
 

This function checks whether a is a subset of b.

Precondition:
Both a and b must be Lists of lexeme nodes spanning the same respective time interval. If this is not the case, the behaviour is undefined.
Definition at line 1173 of file lexemgraph.c.

References FALSE, LexemNode, lgMember(), lgSimultaneous(), NULL, and TRUE.

Boolean lgUpdateArcs LexemGraph  lg,
Lattice  lat,
List  listArcs
 

updates the partial lexemgraph with the incoming arcs.

This function extends a lexeme graph by the Arc structures contained in listArcs. Definition at line 1476 of file lexemgraph.c.

References LexemGraphStruct::distance, LexemGraphStruct::lattice, lgNewFinal(), lgNewIter(), and NULL.

int lgWidth LexemGraph  lg  ) 
 

Returns:
maximal ambiguity per time point
This function computes the maximal number of overlapping lexeme nodes for any time point in lg. Thus it gives an upper bound (not an estimate) of the average acoustical and lexical ambiguity of the graph. Definition at line 1362 of file lexemgraph.c.

References LexemNodeStruct::arc, LexemNode, LexemGraphStruct::max, and LexemGraphStruct::nodes.


Variable Documentation

Boolean lgCompactLVs = TRUE
 

This variable controls whether the levelvalues should be deflated if they are equivalent. Usually, we want this switched on, only for testing a value of FALSE might be appropriate. Definition at line 48 of file lexemgraph.c.

Referenced by cmdStatus(), lgInitialize(), and lgPartitions().


CDG 0.95 (20 Oct 2004)