Custom GraphsThis article is part of a series on using the Understand API.


Many API scripts/programs rely on the entities and references stored in the Understand database, but sometimes you need to descend into the text of the file itself. Understand lets you do that with the lexer function and the lexeme class.


Lexeme – a chunk of text that means something to the parser: a string, a comment, a variable, etc.

Lexer – a stream of lexemes.


With Understand, we can walk through that stream of lexemes and query each one about its text, what entity or reference is associated with it, what Token is it (Punctuation, Comment, Preprocessor, etc), or what line is it on.

So if you have a simple line like this:


int a=5;//radius

Its lexemes would have the following information:



This plugin for the Understand GUI shows what the lexical values are for any file or entity.


Download: tokenizer.upl


To install, just drag into the Understand GUI, then right click on the entity and select Interactive Reports->Tokenizer


For this sample line, the plugin would show:



An Example


Return the text of a file removing all inactive code and comments, and expanding macros. This example assumes the Understand database has been opened already (see the templates at the end of the first tutorial).


Perl:


# Search for the first file named ‘test’ and print
# the file name and the cleaned text
my $ent = $db->lookup("test","file");
print $ent->longname. "\n";
print fileCleanText($ent);

sub fileCleanText{
	my $file = shift;
	my $returnText;

	# Open the file lexer with macros expanded and
	# inactive code removed
	my $lexer = $file->lexer(0,0,1);

	# return null if the lexer won’t open
	return unless $lexer;

	# Go through all lexemes in the file and append the
	# text of non-comments to returnText
	foreach my $lexeme ($lexer->lexemes()){
		if ($lexeme->token ne "Comment"){
			$returnText .= $lexeme->text;
		}
	}
	return $returnText;
}

Python:


def fileCleanText(file):
	returnString = "";

	# Open the file lexer with macros expanded and
	# inactive code removed
	for lexeme in file.lexer(False,8,False,True):
		if(lexeme.token() != "Comment"):
			# Go through lexemes in the file and append
			# the text of non-comments to returnText
			returnString += lexeme.text();
			return returnString;

# Search for the first file named ‘test’ and print
# the file name and the cleaned text
file = db.lookup(".test.","file")[0];
print (file.longname());
print(fileCleanText(file));

Next Tutorial: Custom Graphs