Spread the love

Class StreamTokenizer in Java

Spread the love

Class StreamTokenizer in Java

The StreamTokenizer class takes an input stream and parse it into ‘tokens’, allowing the tokens to be read one at a time. The parsing process is controlled by a table and a number of flags that can be set to various statuses.

The StreamTokenizer can recognize identifiers, numbers, quoted strings, and various other commands style.

Each byte read from the input stream is regarded as a character in the range ‘u000’ through ‘u00FF’.The character value is used to look up five possible attributes of the character.

This class performs a lexical analysis of a specified input stream and breaks the input up into tokens. It can be extremely useful when writing simple parsers. Although StreamTokenizer is not a general-purpose parser, it recognizes tokens that are similar to those used in the Java language. A StreamTokenizer recognizes identifiers, numbers, quoted strings, and various comment styles.

A StreamTokenizer object can be wrapped around an InputStream. In this case, when the
StreamTokenizer reads bytes from the stream, the bytes are converted to Unicode characters by simply zero-extending the byte values to 16 bits.A StreamTokenizer can be wrapped around a Reader to eliminate this problem.

nextToken() returns the next token in the stream–this is either one of the constants defined by the class (which represents end-of-file, end-of-line, a parsed floating-point number, and a parsed word) or a character value.

pushBack() “pushes” the token back onto the stream, so that it is returned by the next call to nextToken(). The public variables sval and nval contain the string and numeric values (if applicable) of the most recently read token. They are applicable when the returned token is TT_WORD and TT_NUMBER. lineno() returns the current line number. The remaining methods specify how tokens are recognized.

By default, a StreamTokenizer recognizes the following:

  • Whitespace characters between ‘\u0000’ and ‘\u0020’
  • Alphabetic characters from ‘a’ through ‘z’, ‘A’ through ‘Z’, and ‘\u00A0’ and ‘\u00FF’.
  • Numeric characters ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, ‘0’, ‘.’, and ‘-‘
  • String quote characters “‘” and “‘”
  • Comment character “/”

wordChars() specifies a range of characters that should be treated as parts of words.

whitespaceChars() specifies a range of characters that serve to delimit tokens.

ordinaryChars() and ordinaryChar() specify characters that are never part of tokens and should be returned as-is.

resetSyntax() makes all characters “ordinary.”

eolIsSignificant() specifies whether end-of-line is significant. If so, the TT_EOL constant is returned for end-of-lines. Otherwise, they are treated as whitespace.

commentChar() specifies a character that begins a comment that lasts until the end of the line. No characters in the comment are returned.

slashStarComments() and slashSlashComments() specify whether the StreamTokenizer should recognize C and C++-style comments. If so, no parts of the comments are returned as tokens.

quoteChar() specifies a character used to delimit strings. When a string token is parsed, the quote character is returned as the token value, and the body of the string is stored in the sval variable.

lowerCaseMode() specifies whether TT_WORD tokens should be converted to all lowercase characters before being stored in sval.

parseNumbers() specifies that the StreamTokenizer should recognize and return double-precision floating-point number tokens.

five possible possibilities are:

  • Whitespaces
  • alphabetic character
  • String quotes
  • comment characters
  • numerical values

In addition, an instance has four flags. They are as follows:

  1. Whether an instance is to be returned as tokens or treated as white space.
  2. Whether C-style comments are to be recognized and skipped.
  3. Whether C++-style comments are to be recognized and skipped.
  4. Whether the characters of identifiers are converted to lowercase

A typical application first constructs an instance of this class, sets up the syntax tables and repeatedly loops calling the nextToken() method in each iteration of the loop until it returns the value TT_EOF.

READ  How to Create A Basic Calculator In Java Applet

The structure of the class StreamTokenizer is given as :


pubic class java.io.StreamTokenizer extends java.lang.Object{
//member elements
public double nval;//If the current token is a number, this field is used. This field contains the value of the number. if the current token is a number the value of the tttype file is TT_NUMBER 
public String sval;//If the current token is a word, this field is used. This field contains a  String giving characters of the word token. When the current token is a quoted String token,this filed contains the body of the String if the current token is the word, the value of the tttype file is TT_WORD
public int ttype;//after a call to the nextToken() method,this field contains the type of the token just read. For a Single character token, its value is the single character converted to an integer. For a quoted String token, its value is the quoted character, otherwise, its value is one of the following
//TT_WORD-indicates that the token is a word
//TT_NUMBER-indicates that a token is a number
//TT_EOL-indicates that the end of the line has read. the field can only have this value if the EOL is significant method has been called with the argument true.
//TT_EOF-Indicates that the end of the input stream has been reached.
public static final int TT_EOF;
public static final int TT_EOL;
public static final int TT_NUMBER;
public static final int TT_WORD;
//constructors
 public StreamTokenizer(InputStream in);//Creates a StreamTokenizer object that parses the specified input stream.The StreamTokenizer is initialized to the following default state:
//All byte values 'A' through 'Z','a' through 'z' and 'u00A0 through 'u0FF'
//are considered to be alphabetic.//All bytes values 'u000 through 'u0020' 
//are considered to be whitespace.'/' is a comment character.Single quote ''
// and double quote "" are string quote character.numbers are parsed.End of lines are not treated as whitespace, not as separate tokens.C and C++ style comments are not recognized.
public StreamTokenizer(Reader r);
//Methods
public void commentChar(int ch);
public void eolIsSignificant(boolean flag);
public void ordinaryChar(int ch);
public void ordinaryChars(int low,int hi);
public void parseNumbers();
public void pushBack();
public void quoteChar(int ch);
public void resetSyntax();
public void lowerCaseMode(boolean f1);
public void slashStarComments(boolean flag);
public void slashSlashComments(boolean flag);
public void whitespaceChars(int low,int hi);
public void wordChars(int low,int hi);
public int lineno();
public int nextToken()throws IOException;
public String toString();
}
 

The details of the class structure are given as follows:

public double nval;

public double nval represents a variable that contains the value of a TT_NUMBER token.

public String sval;

public String sval represents a variable that contains the value of a TT_WORD token.

public int ttype;

public int ttype represents a variable that indicates the token type. The value is either one of the TT_ constants defined below or the character that has just been parsed from the input stream.

public final static int TT_EOF;

public final static int TT_EOF represents a token type indicates that the end of the stream has been reached.

public final static int TT_EOL;

public final static int TT_EOL represents a variable that token type indicates that the end of a line has been reached. The value is not returned by nextToken() unless eolIsSignificant(true) has been called.

public final static int TT_NUMBER;

public final static int TT_NUMBER represents a variable that token type indicates that a number has been parsed. The number is placed in nval.

public final static int TT_WORD;

public final static int TT_WORD represents a variable that token type indicates that a word has been parsed. The word is placed in sval.

public StreamTokenizer(InputStream in);

public StreamTokenizer(InputStream in) constructor creates a StreamTokenizer that reads from the given InputStream. However, it is deprecated now, rather StreamTokenizer(Reader) should be used instead.

READ  How To Communicate Between Two Machines Using UDP in Java

Parameter
in – the input stream to tokenize.

public StreamTokenizer(Reader in); 

public StreamTokenizer(Reader in); constructor creates a StreamTokenizer that reads from the given Reader.

READ  Class FileNotFoundException in Java

Parameter
in – the input stream to tokenize.

public void commentChar(int ch);

public void commentChar(int ch) method tells this StreamTokenizer to treat the given character as the beginning of a comment that ends at the end of the line. The StreamTokenizer ignores all of the characters from the comment character to the end of the line. By default, a StreamTokenizer treats the ‘/’ character as a comment character. This method may be called multiple times if there are multiple characters that begin comment lines.

To specify that a character is not a comment character, use ordinaryChar().

Parameter
ch – The character to use to indicate comments.

public void eolIsSignificant(boolean flag);

public void eolIsSignificant(boolean flag) method is used to tell the StreamTokenizer to return TT_EOL tokens. In the form of Call eolIsSignificant(true).A StreamTokenizer recognizes “\n”, “\r”, and “\r\n” as the end of a line. By default, end-of-line characters are treated as whitespace and thus, the StreamTokenizer does not return TT_EOL tokens from nextToken().

Parameter
flag – A boolean value that specifies whether or not this StreamTokenizer returns TT_EOL
tokens.

public int lineno();

public int lineno() method returns the current line number. Line numbers begin at 1.
This method returns the current line number.

public void lowerCaseMode(boolean flag);

public void lowerCaseMode(boolean flag) method converts a TT_WORD(sval) token returned by nextToken() to lowercase.

By default, a StreamTokenizer does not change the case of the words that it parses. However, if we call lowerCaseMode(true), whenever nextToken() returns a TT_WORD token, the word in sval is converted to lowercase.

READ  Class DeflaterOutputStream in Java

Parameter
flag – A boolean value that specifies whether or not this StreamTokenizer returns TT_WORD
tokens in lowercase.

public int nextToken();

public int nextToken() method reads the next token from the stream. The value returned is the same as the value of the variable ttype. The nextToken() method parses the following tokens:

  • TT_EOF – The end of the input stream has been reached.
  • TT_EOL – The end of a line has been reached. The eolIsSignificant() method controls whether end-of-line characters are treated as whitespace or returned as TT_EOL tokens.
  • TT_NUMBER – A number has been parsed. The value can be found in the variable nval. The parseNumbers() method tells the StreamTokenizer to recognize numbers distinct from words.
  • TT_WORD – A word has been parsed. The word can be found in the variable sval.
  • Quoted string – A quoted string has been parsed. The variable ttype is set to the quote character, and sval contains the string itself. We can tell the StreamTokenizer what characters to use as quote characters using quoteChar().
  • Character – A single character has been parsed. The variable ttype is set to the character value.

This method returns One of the token types (TT_EOF, TT_EOL, TT_NUMBER, or TT_WORD) or character code.

public void ordinaryChar(int ch);

public void ordinaryChar(int ch) method causes this StreamTokenizer to treat the given character as an ordinary character. This means that the character has no special significance as a comment, string quote, alphabetic, numeric, or whitespace character.

For example, to tell the StreamTokenizer that the slash does not start a single-line comment, use ordinaryChar(‘/’).

Parameter
ch – The character to treat normally.

public void ordinaryChars(int low, int hi);

public void ordinaryChars(int low, int hi) method tells this StreamTokenizer to treat all of the characters in the given range as ordinary characters.

Parameter
low – The beginning of a range of character values.
hi – The end of a range of character values.

public void parseNumbers();

public void parseNumbers() method tells this StreamTokenizer to recognize numbers. The StreamTokenizer constructor calls this method, so the default behavior of a StreamTokenizer is to recognize numbers. This method modifies the syntax table of the StreamTokenizer so that the following characters have the numeric attribute: ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, ‘0’, ‘.’, and ‘-‘ When the parser encounters a token that has the format of a double-precision floating-point number, the token is treated as a number rather than a word. The ttype variable is set to TT_NUMBER, and nval is set to the value of the number.

READ  All About Class URLEncode in Java

To use a StreamTokenizer that does not parse numbers, we need to make the above characters’ ordinary using ordinaryChar() or ordinaryChars().

public void pushBack();

public void pushBack() method has the effect of pushing the current token back onto the stream. In other words, after a call to this method, the next call to the nextToken() method returns the same result as the previous call to the nextToken()method without reading any input.

public void quoteChar(int ch);

public void quoteChar(int ch) method tells this StreamTokenizer to treat the given character as the beginning or end of a quoted string. By default, the single-quote character and the double-quote character are string-quote characters. When the parser encounters a string-quote character, the ttype variable is set to the quote character, and sval is set to the actual string.

The string consists of all the characters after (but not including) the string-quote character up to (but not including) the next occurrence of the same string-quote character, a line terminator, or the end of the stream.

To specify that a character is not a string-quote character, we need to use ordinaryChar().

Parameter
ch – The character to use as a delimiter for quoted strings.

public void resetSyntax();

public void resetSyntax() method resets this StreamTokenizer, which causes it to treat all characters as ordinary characters.

public void slashSlashComments(boolean flag);

public void slashSlashComments(boolean flag) method recognizes and ignores double-slash comments(as the nextToken()) if it is called slashSlashComments(true) format. By default, a StreamTokenizer does not recognize double-slash comments.

READ  Abstract Class URLStreamHandler in Java

Parameter
flag – A boolean value that specifies whether or not this StreamTokenizer recognizes double-slash comments (//).

public void slashStarComments(boolean flag);

public void slashStarComments(boolean flag) method recognizes and ignores slash-star
comments(by nextToken() method) if the method is called slashStarComments(true) format.By default, a StreamTokenizer does not recognize slash-star comments.

Parameter
flag – A boolean value that specifies whether or not this StreamTokenizer recognizes slash-star (/* … */) comments.

public String toString();

public String toString() method returns a string representation of the current token recognized by the nextToken() method. This string representation consists of the value of ttype, the value of sval if the token is a word or the value of nval if the token is a number and the current line number.

This method returns a String representation of the current token.

public void whitespaceChars(int low, int hi);

public void whitespaceChars(int low, int hi) method causes this StreamTokenizer to treat characters in the specified range as whitespace. The only function of whitespace characters is to separate tokens in the stream.

Parameter
low
The beginning of a range of character values.
hi
The end of a range of character values.

public void wordChars(int low, int hi);

public void wordChars(int low, int hi) method causes this StreamTokenizer to treat characters in the specified range as characters that are part of a word token, or, in other words, consider the characters to be alphabetic. A word token consists of a sequence of characters that begins with an alphabetic character and is followed by zero or more numeric or alphabetic characters.

Parameter
low
The beginning of a range of character values.
hi
The end of a range of character values.
Apart from these StreamTokenizer class also has inherited methods from class- Object. They are as follows:

  • clone()
  • finalize()
  • hashCode()
  • notifyAll()
  • wait()
  • wait(long, int)
  • equals(Object)
  • getClass()
  • notify()
  • toString()
  • wait(long)




Spread the love
Animesh Chatterjeehttps://techtravelhub.com/
I am the founder and owner of the blog - TechTravelHub.com, always love to share knowledge on test automation,tools,techniques and tips.I am a passionate coder of Java and VBScript.I also publish articles on Travel ideas and great honeymoon destinations.Apart from these, I am a gear-head,love to drive across India. I have shared lots of articles here on How to travel several parts of India.Customization of cars aka car modification is my another hobby.Get in touch with me on ani01104@gamil.com

Related Articles

Learn How to Use Jacoco Java code Coverage

Introduction to Jacoco Jacoco provides a wide range of coverage(analysis of instructions,...

Comments

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Recent Posts

Learn Quickly About DevOps Tools in a Fast and Efficient Manner

Introduction to DevOps tools DevOps tools are a collection of...

How to Collaborate between Development and Operations Team Fast and Effectively

Collaborate between Development and Operations Teams are the key...

Learn How to Use Jacoco Java code Coverage

Introduction to Jacoco Jacoco provides a wide range of coverage(analysis...

EDITOR PICKS


Spread the love