API Tutorial 4: Lexers and Lexemes : SciTools Support

Many API scripts and programs rely on the entities and references stored in the Understand database, but sometimes you need to descend into the text of the file itself. Understand lets you do that with the lexer function and the lexeme class.

Lexeme – a chunk of text that means something to the parser: a string, a comment, a variable, etc.

Lexer – a stream of lexemes.

With Understand, we can walk through that stream of lexemes and query each one about its text, what entity or reference is associated with it, what token it has (Punctuation, Comment, Preprocessor, etc), or what line is it on.

If you have a simple line like this:

int a=5;//radius

Its lexemes would have the following information:

An Example

Return the text of a file removing all inactive code and comments, and expanding macros.

import understand
db = understand.open("C:/sample project/sample_project.und")
def fileCleanText(file):
    returnString = ""
    # Open the file lexer with macros expanded and inactive code removed
    for lexeme in file.lexer(False,8,False,True):
        if(lexeme.token() != "Comment"):
            # Go through lexemes in the file and append
            # the text of non-comments to returnText
            returnString += lexeme.text()
    return returnString

# Search for the first file named ‘test’ and print
# the file name and the cleaned text
file = db.lookup(".test.","file")[0]
print (file.longname())
print(fileCleanText(file))

Continue to API Tutorial 5: Graphs ->

<- Return to API Tutorial 3: Entities, References, and Filters

Need help? Contact support@scitools.com or visit our About the Understand Python API page.

API Tutorial 4: Lexers and Lexemes Print

Related Articles