xoreos  0.0.5
Public Types | Public Member Functions | Private Member Functions | Static Private Member Functions | Private Attributes | List of all members
Common::StreamTokenizer Class Reference

Tokenizes a stream. More...

#include <streamtokenizer.h>

Collaboration diagram for Common::StreamTokenizer:
Collaboration graph
[legend]

Public Types

enum  ConsecutiveSeparatorRule { kRuleIgnoreSame, kRuleIgnoreAll, kRuleHeed }
 What to do when consecutive separator are found. More...
 

Public Member Functions

 StreamTokenizer (ConsecutiveSeparatorRule conSepRule=kRuleHeed)
 
void addSeparator (uint32 c)
 Add a character on where to split tokens. More...
 
void addChunkEnd (uint32 c)
 Add a character marking the end of a chunk. More...
 
void addQuote (uint32 c)
 Add a character able to enclose (quote) separators and chunk ends. More...
 
void addIgnore (uint32 c)
 Add a character to ignore. More...
 
UString getToken (SeekableReadStream &stream)
 Parse a token out of the stream. More...
 
size_t getTokens (SeekableReadStream &stream, std::vector< UString > &list, size_t min=0, size_t max=SIZE_MAX, const UString &def="")
 Parse tokens out of the stream. More...
 
void findFirstToken (SeekableReadStream &stream)
 Find the first token character, skipping past separators. More...
 
void skipToken (SeekableReadStream &stream, size_t n=1)
 Skip a number of tokens. More...
 
void skipChunk (SeekableReadStream &stream)
 Skip to the end of the chunk. More...
 
void nextChunk (SeekableReadStream &stream)
 Skip past end of chunk characters. More...
 

Private Member Functions

bool isChunkEnd (SeekableReadStream &stream)
 

Static Private Member Functions

static bool isIn (uint32 c, const std::list< uint32 > &list)
 

Private Attributes

ConsecutiveSeparatorRule _conSepRule
 
std::list< uint32_separators
 
std::list< uint32_quotes
 
std::list< uint32_chunkEnds
 
std::list< uint32_ignores
 

Detailed Description

Tokenizes a stream.

Note
Only works with clean (non-extended ASCII) and UTF-8 streams right now.

Definition at line 42 of file streamtokenizer.h.

Member Enumeration Documentation

◆ ConsecutiveSeparatorRule

What to do when consecutive separator are found.

Enumerator
kRuleIgnoreSame 

Ignore the repeated separator, but only if it's the same.

kRuleIgnoreAll 

Ignore all repeated separators.

kRuleHeed 

Heed each separator.

Definition at line 45 of file streamtokenizer.h.

Constructor & Destructor Documentation

◆ StreamTokenizer()

Common::StreamTokenizer::StreamTokenizer ( ConsecutiveSeparatorRule  conSepRule = kRuleHeed)

Definition at line 33 of file streamtokenizer.cpp.

Member Function Documentation

◆ addChunkEnd()

void Common::StreamTokenizer::addChunkEnd ( uint32  c)

Add a character marking the end of a chunk.

A chunk end is essentially a higher-order separator. Parsing tokens will stop at chunk end characters and will not move past them. Only a call to nextChunk() will move past a chunk end character.

Definition at line 56 of file streamtokenizer.cpp.

References _chunkEnds, _ignores, _quotes, _separators, and isIn().

Referenced by Aurora::VISFile::load(), Sound::XACTWaveBank_ASCII::load(), Aurora::LYTFile::load(), Graphics::Aurora::Model_NWN::ParserContext::ParserContext(), and Aurora::TwoDAFile::read2a().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ addIgnore()

void Common::StreamTokenizer::addIgnore ( uint32  c)

Add a character to ignore.

A character that is ignored will never be added to the token. For example, with the ignore character '#' and the separator character ',', the string "fo#o,#bar" will be splut into two tokens: "foo" and "bar".

Definition at line 62 of file streamtokenizer.cpp.

References _chunkEnds, _ignores, _quotes, _separators, and isIn().

Referenced by Aurora::VISFile::load(), Sound::XACTWaveBank_ASCII::load(), Aurora::LYTFile::load(), Graphics::Aurora::Model_NWN::ParserContext::ParserContext(), and Aurora::TwoDAFile::read2a().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ addQuote()

void Common::StreamTokenizer::addQuote ( uint32  c)

Add a character able to enclose (quote) separators and chunk ends.

For example, with the quote character '\'' and separator character ',', the string "foo\',\'bar,foo" will be split into two tokens: "foo,bar" and "bar".

Every quote character is handled as if it's the same! So with the quote characters '\'' and '"', the string "foo\',\"bar,foo" will also yield the two tokens "foo,bar" and "bar.

Definition at line 50 of file streamtokenizer.cpp.

References _chunkEnds, _ignores, _quotes, _separators, and isIn().

Referenced by Aurora::TwoDAFile::read2a().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ addSeparator()

void Common::StreamTokenizer::addSeparator ( uint32  c)

Add a character on where to split tokens.

For example, with the separator character ',', the string "foo,bar" will be split into two tokens: "foo" and "bar".

Several different characters can act as separator characters at the same time.

The ConsecutiveSeparatorRule value signals how consecutive separator characters are handled.

Definition at line 44 of file streamtokenizer.cpp.

References _chunkEnds, _ignores, _quotes, _separators, and isIn().

Referenced by Aurora::VISFile::load(), Sound::XACTWaveBank_ASCII::load(), Aurora::LYTFile::load(), Graphics::Aurora::Model_NWN::ParserContext::ParserContext(), Aurora::TwoDAFile::read2a(), Aurora::TwoDAFile::readHeaders2b(), Aurora::TwoDAFile::readRows2b(), and Aurora::TwoDAFile::skipRowNames2b().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ findFirstToken()

void Common::StreamTokenizer::findFirstToken ( SeekableReadStream stream)

Find the first token character, skipping past separators.

Position the stream at the first character that is neither a separator or an ignored characted. This is useful if the first token of a chunk might be indented with separator characters.

Definition at line 213 of file streamtokenizer.cpp.

References _ignores, _separators, isIn(), Common::ReadStream::kEOF, Common::SeekableReadStream::kOriginCurrent, Common::ReadStream::readChar(), and Common::SeekableReadStream::seek().

Referenced by Aurora::TwoDAFile::readRows2a().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ getToken()

UString Common::StreamTokenizer::getToken ( SeekableReadStream stream)

Parse a token out of the stream.

Go through the stream, character by character, collecting characters for a token. Collection will stop on any of these conditions:

  • We reached the end of the stream
  • We reached a separator character
  • We reached a chunk end character

When we find a separator character, the stream will be positioned after this character (potentially skipping over following separators depending on the ConsecutiveSeparatorRule value).

When we find a chunk end character, the stream will be positioned before this character. Only a call to nextChunk() will move the stream past it.

Definition at line 68 of file streamtokenizer.cpp.

References _chunkEnds, _conSepRule, _ignores, _quotes, _separators, Common::UString::end(), Common::UString::findFirst(), isIn(), Common::ReadStream::kEOF, Common::SeekableReadStream::kOriginCurrent, kRuleHeed, kRuleIgnoreSame, Common::ReadStream::readChar(), Common::SeekableReadStream::seek(), and Common::UString::truncate().

Referenced by getTokens(), Aurora::TwoDAFile::readHeaders2b(), Aurora::TwoDAFile::readRows2b(), and skipToken().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ getTokens()

size_t Common::StreamTokenizer::getTokens ( SeekableReadStream stream,
std::vector< UString > &  list,
size_t  min = 0,
size_t  max = SIZE_MAX,
const UString def = "" 
)

Parse tokens out of the stream.

This method calls getToken() repeatedly and collects all tokens into a list.

Parameters
streamThe stream to parse out of.
listThe list to parse into.
minMinimum number of tokens to parse.
maxMaximum number of tokens to parse.
defNon-existing tokens are assigned this value.
Returns
The number of existing tokens parsed.

Definition at line 189 of file streamtokenizer.cpp.

References _conSepRule, Common::UString::empty(), getToken(), isChunkEnd(), and kRuleIgnoreAll.

Referenced by Sound::getFirst(), Aurora::VISFile::load(), Sound::XACTWaveBank_ASCII::load(), Aurora::LYTFile::load(), Graphics::Aurora::ModelNode_NWN_ASCII::load(), Graphics::Aurora::Model_NWN::loadASCII(), Graphics::Aurora::ModelNode_NWN_ASCII::readConstraints(), Aurora::TwoDAFile::readDefault2a(), Graphics::Aurora::ModelNode_NWN_ASCII::readFaces(), Aurora::TwoDAFile::readHeaders2a(), Aurora::TwoDAFile::readRows2a(), Graphics::Aurora::ModelNode_NWN_ASCII::readTCoords(), Graphics::Aurora::ModelNode_NWN_ASCII::readVCoords(), Graphics::Aurora::ModelNode_NWN_ASCII::readWeights(), and Graphics::Aurora::Model_NWN::skipAnimASCII().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ isChunkEnd()

bool Common::StreamTokenizer::isChunkEnd ( SeekableReadStream stream)
private

Definition at line 251 of file streamtokenizer.cpp.

References _chunkEnds, isIn(), Common::ReadStream::kEOF, Common::SeekableReadStream::kOriginCurrent, Common::ReadStream::readChar(), and Common::SeekableReadStream::seek().

Referenced by getTokens().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ isIn()

bool Common::StreamTokenizer::isIn ( uint32  c,
const std::list< uint32 > &  list 
)
staticprivate

Definition at line 36 of file streamtokenizer.cpp.

Referenced by addChunkEnd(), addIgnore(), addQuote(), addSeparator(), findFirstToken(), getToken(), isChunkEnd(), nextChunk(), and skipChunk().

Here is the caller graph for this function:

◆ nextChunk()

void Common::StreamTokenizer::nextChunk ( SeekableReadStream stream)

◆ skipChunk()

void Common::StreamTokenizer::skipChunk ( SeekableReadStream stream)

Skip to the end of the chunk.

The stream will be positioned before the next end chunk.

Definition at line 228 of file streamtokenizer.cpp.

References _chunkEnds, isIn(), Common::ReadStream::kEOF, Common::SeekableReadStream::kOriginCurrent, Common::ReadStream::readChar(), and Common::SeekableReadStream::seek().

Referenced by nextChunk().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ skipToken()

void Common::StreamTokenizer::skipToken ( SeekableReadStream stream,
size_t  n = 1 
)

Skip a number of tokens.

Definition at line 223 of file streamtokenizer.cpp.

References getToken().

Referenced by Aurora::TwoDAFile::readRows2a(), and Aurora::TwoDAFile::skipRowNames2b().

Here is the call graph for this function:
Here is the caller graph for this function:

Member Data Documentation

◆ _chunkEnds

std::list<uint32> Common::StreamTokenizer::_chunkEnds
private

◆ _conSepRule

ConsecutiveSeparatorRule Common::StreamTokenizer::_conSepRule
private

Definition at line 156 of file streamtokenizer.h.

Referenced by getToken(), and getTokens().

◆ _ignores

std::list<uint32> Common::StreamTokenizer::_ignores
private

◆ _quotes

std::list<uint32> Common::StreamTokenizer::_quotes
private

Definition at line 159 of file streamtokenizer.h.

Referenced by addChunkEnd(), addIgnore(), addQuote(), addSeparator(), and getToken().

◆ _separators

std::list<uint32> Common::StreamTokenizer::_separators
private

The documentation for this class was generated from the following files: