User Guide

Incremental Processing
Serialization
Tree processing

Trial.Protocol can process JSON incrementally or via serialization. Incremental processing handles JSON one token at the time, whereas serialization processes the entire JSON buffer in a single operation. A token is a single element in the JSON grammar, such as a number, a string, the beginning or the end of an array.

Incremental processing has a lower-level interface than serialization or document processing, but even though incremental processing is more laborious to use, it covers more use cases and can be used more efficiently. Some examples of these use cases are:

The input and output serialization archives are build on top of incremental parser and generators.
Certain operations on JSON buffers does not require that individual JSON elements, such as JSON escaped strings, are converted into in-memory data structures. For instance, searching for data in a JSON file, or pretty-printing the content of a JSON file.
We can build other parser types from an incremental parser, such as the push parser in the tutorials.

Incremental Processing

Incremental processing is a low-level API which regards JSON as a sequence of tokens to be processed one by one.

Token

The JSON parser and generator use tokens to identify data types as well as errors. All token-related types are located in the trial::protocol::json::token namespace, which we will simply refer to as token below.

	Note
	Tokens are located in the `<trial/protocol/json/token.hpp>` header.

Token constants

A token is represented by the token::code enumeration type with a constant for each possible token or error state. This means that each error is represented by its own enumerator constant.

Symbols

Working directly with token::code can be tedious. Suppose you want to check if an error occurred, then you have to check if the current token is one of the numerous error constants. Each token::code enumerator has therefore been grouped into a more convenient enumeration type called token::symbol that is better suited for normal operation.

All token::code error constants have been grouped into the single token::symbol::error constant, and we can now check for errors with a single comparison.

Table 1.1. JSON symbol constants

`token::symbol`	Description
`boolean`	True or false.
`integer`	Integer number.
`number`	Floating-point number.
`string`	String value.
`null`	No data.
`begin_array`	Start of an array.
`end_array`	End of an array.
`begin_object`	Start of an associative array.
`end_object`	End of an associative array.
`separator`	A context-specific separator.
`end`	End of input or output buffer.
`error`	Erroneous format.

The symbol type will be the preferred manner to use tokens in the examples throughout this documentation. In fact, we are not even going to describe the token::code enumerator constants here,^[6] because we are only interested in the subset that contains the error codes and they are described in the section on errors.

`token::category`	Description
`data`	Data tokens have a value associated with them, whose content can be retrieved. Examples of data tokens are booleans, numbers, and strings.
`structural`	Structural tokens wrap containers and separate items.
`nullable`	The nullable token is a special case, because it can represent either a data token without and associated value or structural token without an associated container, such as a missing integer or a missing array. The nullable token is typeless.
`status`	A status token indicates another condition.

`token::symbol`	`token::category`
`boolean`	`data`
`integer`	`data`
`number`	`data`
`string`	`data`
`null`	`nullable`
`begin_array`	`structural`
`end_array`	`structural`
`begin_object`	`structural`
`end_object`	`structural`
`separator`	`structural`
`end`	`status`
`error`	`status`

Error

The <system_error> framework is used for error codes and exceptions.

	Note
	Error codes and utilities are located in the `<trial/protocol/json/error.hpp>` header.

Error codes

Trial.Protocol defines its own json::error_category with associated error enumerator constants.

Normally, an error_code of the current error can be obtained via an error() member function.

std::string input = "illegal";
json::reader reader(input);

assert(reader.symbol() == json::symbol::error);
assert(reader.error() == json::invalid_value);

This conversion can also be done manually with json::to_errc() and json::make_error_code().

std::string input = "illegal";
json::reader reader(input);

assert(reader.symbol() == json::symbol::error);
assert(reader.code() == json::code::error_invalid_value);

enum json::errc ec = json::to_errc(reader.code());
assert(ec == json::invalid_value);

auto error = json::make_error_code(ec);
assert(error == json::invalid_value);

The following error codes exists.

Table 1.4. Error codes

`json::errc`	Description
`unexpected_token`	An unexpected token is encountered in the input.
`invalid_key`	An associative array key is not in a valid format.
`invalid_value`	The content is not in a valid format.
`incompatible_type`	Conversion between two incompatible types failed.
`unbalanced_end_array`	Encountered an end array token without a corresponding begin array token.
`unbalanced_end_object`	Encountered an end object token without a corresponding begin object token.
`expected_end_array`	Encountered an end array token outside an array.
`expected_end_object`	Encountered an end object token outside an associative array.

Exception

Conversion errors will result in json::error exceptions being thrown. json::error inherits from std::system_error which contains a std::error_code.

Reader

Reader is an incremental parser (also called a pull parser) that transforms the JSON input into a sequence of C++ tokens. This transformation is done piecemeal, which means that the reader will stay at the first token until explicitly instructed to parse the next token.

Based on the Iterator design pattern, reader parses just enough of the input to identify a single token. The reader provides various accessors that can be used to examine or convert the current token.

Table 1.5. Reader Accessors

Reader member function	Description
`token::code::value code()`	Returns the current token.
`token::symbol::value symbol()`	Returns the symbol of the current token.
`token::category::value category()`	Returns the category of the current token.
`size_type level()`	Returns the current level of nested containers. The levels starts with zero for the outmost level.
`error_code error()`	Returns the current error code.
`const view_type& literal()`	Returns a view of the raw input of the current value.
`T value<T>()`	Returns the current value. The raw input is converted into the requested value type.

No data is converted until explicitly requested with reader::value<T>(). Notice that the return type for reader::value<T>() must be specified as a template parameter. The return type can be a boolean, a number, or a string as described below. The requested type must match the current token as returned by reader::type(); otherwise a run-time error will be raised.

Note

Requesting the value of an incompatible type will result in a run-time error. For example, attempting to read a string as an integer:

assert(reader.symbol() == json::symbol::string);
int number = reader.value<int>(); // Throws exception

Note

Requesting an unsupported type will result in a compile-time error. For example, attempting to read a user-defined struct:

struct dummy {};
dummy d = reader.value<dummy>(); // Causes compilation error

The unconverted textual data of the current token can be obtained with reader::literal(). This can be useful when displaying errors. The result of reader::literal() is different from reader::value<std::string>().

When you are done with the current token, the next token is found with reader::next(). This function returns a bool, which is true unless either an error or the end of the input was encountered. Whitespaces and separators are skipped.

Errors in the input are identified with an error token, and the current error can be obtained with reader::error().

Boolean

Boolean values are indicated by the json::symbol::boolean token. The value is requested by reader::value<bool>().

std::string input = "true";
json::reader reader(input);

assert(reader.symbol() == json::symbol::boolean);
assert(reader.literal() == "true");
assert(reader.value<bool>());

Number

While JSON does not distinguish between integer and floating-point numbers, C++ does make this distinction and therefore Trial.Protocol does too. However, integers can be read as floating-point numbers, and floating-point numbers can be read as integers.

	Note
	Reading a floating-point number as an integer will round the number, so it may result in loss of information.

Numbers are identified as integers if the consists of digits only.

std::string input = "42";
json::reader reader(input);

assert(reader.symbol() == json::symbol::integer);
assert(reader.literal() == "42");
assert(reader.value<int>() == 42);

Numbers are detected as floating-point if they contain a decimal point or an exponent.

std::string input = "3.1415";
json::reader reader(input);

assert(reader.symbol() == json::symbol::number);
assert(reader.literal() == "3.1415");
assert(reader.value<double>() == 3.1415);

Integers can be read as floating-point numbers as well.

std::string input = "42";
json::reader reader(input);

assert(reader.symbol() == json::symbol::integer);
assert(reader.value<double>() == 42.0);

Floating-point numbers can also be read as integers. The number will be rounded to the nearest integer.

std::string input = "3.1415";
json::reader reader(input);

assert(reader.symbol() == json::symbol::number);
assert(reader.value<int>() == 3);

Any kind of additional constraints have to be enforced by the application layer. For instance, if we have a protocol with a size field, then logically this field cannot be negative or a fraction, even if JSON numbers allow this.

String

Strings are identified with the json::symbol::string token, and is converted into a UTF-8 encoded string with reader::value<std::string>().

std::string input = "\"alpha\\n\"";
json::reader reader(input);

assert(reader.symbol() == json::symbol::string);
assert(reader.literal() == "\"alpha\\n\"");
assert(reader.value<std::string>() == "alpha\n");

Null

Null indicates the absence of a value, although it is encoded explicitly in the JSON format as the null literal string.

std::string input = "null";
json::reader reader(input);

assert(reader.symbol() == json::symbol::null);

Array

Arrays are delimited by a begin-token and an end-token. Array members are comma separated. Arrays can contain any JSON type, including a nested array or a nested associative array.

std::string input = "[42]";
json::reader reader(input);

assert(reader.symbol() == json::symbol::begin_array);

reader.next();

assert(reader.symbol() == json::symbol::integer);
assert(reader.value<int>() == 42);

reader.next();

assert(reader.symbol() == json::symbol::end_array);

Associative array

An associative array is called a JSON object, which as a first approximation can be thought of as a std::map in C++.

Writer

Writer is an incremental generator that outputs C++ data types in a JSON format. The output is generated piece by piece as C++ data types are inserted. The writer keeps track of the context and inserts the appropriate separators between values where needed.

Table 1.6. Writer Accessors

Writer member function	Description
`size_type level()`	Returns the current level of nested containers.
`error_code error()`	Returns the current error code.
`size_type literal(const view_type&)`	Write a literal value directly into the JSON output without formatting it. Returns the number of characters written. Returns zero if an error occurred.
`size_type value<T>()`	Write a formatted tag into the JSON output. Returns the number of characters written. Returns zero if an error occurred.
`size_type value(T)`	Write a formatted value into the JSON output. Returns the number of characters written. Returns zero if an error occurred.

Values are properly formatted and written into the JSON output with writer::value(T). The parameter T can be a boolean, a number, or a string. Writing a nullable value or the opening and closing brackets of containers is done by passing special tags as the parameter to the parameter-less version writer::value<T>().

Literal values can also be inserted unconverted into the JSON output with writer::literal(). These can be useful useful for adding whitespaces, but special care should be exerted to not violate the JSON format.

As writer has been designed for wire protocols, it does not insert whitespaces into the output^[7].

Note

The following examples assume that you have included the following header files:

#include <trial/protocol/buffer/ostream.hpp>
#include <trial/protocol/json/writer.hpp>

Boolean

Boolean values are output via writer::value(bool).

std::ostringstream result;
json::writer writer(result);

writer.value(true); // Write boolean value
assert(result.str() == "true");

Number

Numbers can either be integer values or floating-point values.

String

Strings are written by passing an std::string or a string literal to writer::value(T). All strings will be quoted in the JSON output, and special characters will be escaped. Strings must be UTF-8 encoded.

std::ostringstream result;
json::writer writer(result);

writer.value("alpha"); // Write string
assert(result.str() == "\"alpha\"");

Null

Nullable value are output with writer::value<token::null>().

std::ostringstream result;
json::writer writer(result);

writer.value<json::token::null>(); // Write nullable value
assert(result.str() == "null");

Array

An array is initiated by passing the json::begin_array tag to writer::value(T), and terminated by passing the json::end_array. These tags must be properly balanced, otherwise an error will be raised.

Value separators are automatically inserted between values.

std::ostringstream result;
json::writer writer(result);

writer.value(json::begin_array); // Write beginning of array
assert(result.str() == "[");

writer.value(42);
assert(result.str() == "[42"); // Write number

writer.value(43);
assert(result.str() == "[42,43"); // Write number

writer.value(json::end_array);
assert(result.str() == "[42,43]"); // Write ending of array

Associative array

Name separators are automatically inserted between the key and the value, and value separators are automatically inserted between key-value pairs.

^[6] The description of all token::code enumerator constants can be found in the reference documentation.

^[7] See example/json/pretty_printer for an example of how to produce an indented output.