Tokenizes a stream. More...

#include <streamtokenizer.h>

Collaboration diagram for Common::StreamTokenizer:

Public Types
enum	ConsecutiveSeparatorRule { kRuleIgnoreSame, kRuleIgnoreAll, kRuleHeed }
	What to do when consecutive separator are found. More...

Public Member Functions
	StreamTokenizer (ConsecutiveSeparatorRule conSepRule=kRuleHeed)

void	addSeparator (uint32 c)
	Add a character on where to split tokens. More...

void	addChunkEnd (uint32 c)
	Add a character marking the end of a chunk. More...

void	addQuote (uint32 c)
	Add a character able to enclose (quote) separators and chunk ends. More...

void	addIgnore (uint32 c)
	Add a character to ignore. More...

UString	getToken (SeekableReadStream &stream)
	Parse a token out of the stream. More...

size_t	getTokens (SeekableReadStream &stream, std::vector< UString > &list, size_t min=0, size_t max=SIZE_MAX, const UString &def="")
	Parse tokens out of the stream. More...

void	findFirstToken (SeekableReadStream &stream)
	Find the first token character, skipping past separators. More...

void	skipToken (SeekableReadStream &stream, size_t n=1)
	Skip a number of tokens. More...

void	skipChunk (SeekableReadStream &stream)
	Skip to the end of the chunk. More...

void	nextChunk (SeekableReadStream &stream)
	Skip past end of chunk characters. More...

Private Member Functions
bool	isChunkEnd (SeekableReadStream &stream)

Static Private Member Functions
static bool	isIn (uint32 c, const std::list< uint32 > &list)

Private Attributes
ConsecutiveSeparatorRule	_conSepRule

std::list< uint32 >	_separators

std::list< uint32 >	_quotes

std::list< uint32 >	_chunkEnds

std::list< uint32 >	_ignores

Detailed Description

Tokenizes a stream.

Note: Only works with clean (non-extended ASCII) and UTF-8 streams right now.

Definition at line 42 of file streamtokenizer.h.

Member Enumeration Documentation

◆ ConsecutiveSeparatorRule

enum Common::StreamTokenizer::ConsecutiveSeparatorRule

What to do when consecutive separator are found.

Enumerator
kRuleIgnoreSame	Ignore the repeated separator, but only if it's the same.
kRuleIgnoreAll	Ignore all repeated separators.
kRuleHeed	Heed each separator.

Definition at line 45 of file streamtokenizer.h.

Constructor & Destructor Documentation

◆ StreamTokenizer()

Common::StreamTokenizer::StreamTokenizer ( ConsecutiveSeparatorRule conSepRule = kRuleHeed )

Definition at line 33 of file streamtokenizer.cpp.

Member Function Documentation

◆ addChunkEnd()

void Common::StreamTokenizer::addChunkEnd ( uint32 c )

Add a character marking the end of a chunk.

A chunk end is essentially a higher-order separator. Parsing tokens will stop at chunk end characters and will not move past them. Only a call to nextChunk() will move past a chunk end character.

Definition at line 56 of file streamtokenizer.cpp.

References _chunkEnds, _ignores, _quotes, _separators, and isIn().

Referenced by Aurora::VISFile::load(), Sound::XACTWaveBank_ASCII::load(), Aurora::LYTFile::load(), Graphics::Aurora::Model_NWN::ParserContext::ParserContext(), and Aurora::TwoDAFile::read2a().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ addIgnore()

void Common::StreamTokenizer::addIgnore ( uint32 c )

Add a character to ignore.

A character that is ignored will never be added to the token. For example, with the ignore character '#' and the separator character ',', the string "fo#o,#bar" will be splut into two tokens: "foo" and "bar".

Definition at line 62 of file streamtokenizer.cpp.

References _chunkEnds, _ignores, _quotes, _separators, and isIn().

Referenced by Aurora::VISFile::load(), Sound::XACTWaveBank_ASCII::load(), Aurora::LYTFile::load(), Graphics::Aurora::Model_NWN::ParserContext::ParserContext(), and Aurora::TwoDAFile::read2a().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ addQuote()

void Common::StreamTokenizer::addQuote ( uint32 c )

Add a character able to enclose (quote) separators and chunk ends.

For example, with the quote character '\'' and separator character ',', the string "foo\',\'bar,foo" will be split into two tokens: "foo,bar" and "bar".

Every quote character is handled as if it's the same! So with the quote characters '\'' and '"', the string "foo\',\"bar,foo" will also yield the two tokens "foo,bar" and "bar.

Definition at line 50 of file streamtokenizer.cpp.

References _chunkEnds, _ignores, _quotes, _separators, and isIn().

Referenced by Aurora::TwoDAFile::read2a().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ addSeparator()

void Common::StreamTokenizer::addSeparator ( uint32 c )

Add a character on where to split tokens.

For example, with the separator character ',', the string "foo,bar" will be split into two tokens: "foo" and "bar".

Several different characters can act as separator characters at the same time.

The ConsecutiveSeparatorRule value signals how consecutive separator characters are handled.

Definition at line 44 of file streamtokenizer.cpp.

References _chunkEnds, _ignores, _quotes, _separators, and isIn().

Referenced by Aurora::VISFile::load(), Sound::XACTWaveBank_ASCII::load(), Aurora::LYTFile::load(), Graphics::Aurora::Model_NWN::ParserContext::ParserContext(), Aurora::TwoDAFile::read2a(), Aurora::TwoDAFile::readHeaders2b(), Aurora::TwoDAFile::readRows2b(), and Aurora::TwoDAFile::skipRowNames2b().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ findFirstToken()

void Common::StreamTokenizer::findFirstToken ( SeekableReadStream & stream )

Find the first token character, skipping past separators.

Position the stream at the first character that is neither a separator or an ignored characted. This is useful if the first token of a chunk might be indented with separator characters.

Definition at line 213 of file streamtokenizer.cpp.

References _ignores, _separators, isIn(), Common::ReadStream::kEOF, Common::SeekableReadStream::kOriginCurrent, Common::ReadStream::readChar(), and Common::SeekableReadStream::seek().

Referenced by Aurora::TwoDAFile::readRows2a().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ getToken()

UString Common::StreamTokenizer::getToken ( SeekableReadStream & stream )

Parse a token out of the stream.

Go through the stream, character by character, collecting characters for a token. Collection will stop on any of these conditions:

We reached the end of the stream
We reached a separator character
We reached a chunk end character

When we find a separator character, the stream will be positioned after this character (potentially skipping over following separators depending on the ConsecutiveSeparatorRule value).

When we find a chunk end character, the stream will be positioned before this character. Only a call to nextChunk() will move the stream past it.

Definition at line 68 of file streamtokenizer.cpp.

References _chunkEnds, _conSepRule, _ignores, _quotes, _separators, Common::UString::end(), Common::UString::findFirst(), isIn(), Common::ReadStream::kEOF, Common::SeekableReadStream::kOriginCurrent, kRuleHeed, kRuleIgnoreSame, Common::ReadStream::readChar(), Common::SeekableReadStream::seek(), and Common::UString::truncate().

Referenced by getTokens(), Aurora::TwoDAFile::readHeaders2b(), Aurora::TwoDAFile::readRows2b(), and skipToken().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ getTokens()

size_t Common::StreamTokenizer::getTokens	(	SeekableReadStream &	stream,
		std::vector< UString > &	list,
		size_t	min = `0`,
		size_t	max = `SIZE_MAX`,
		const UString &	def = `""`
	)

Parse tokens out of the stream.

This method calls getToken() repeatedly and collects all tokens into a list.

Parameters

stream	The stream to parse out of.
list	The list to parse into.
min	Minimum number of tokens to parse.
max	Maximum number of tokens to parse.
def	Non-existing tokens are assigned this value.