Changes to PORTABLE_STORAGE.md

* More information about array entries (especially nesting)
* Varint encoding examples
* Expanded string and integer encoding information
This commit is contained in:
Jeffrey 2022-04-23 14:15:28 -05:00 committed by Jeffrey Ryan
parent 34941ac3e1
commit fc9b77d855
2 changed files with 78 additions and 36 deletions

View File

@ -15,15 +15,19 @@ documentation. Unfortunately, whilst the rest of the library is fairly
straightforward to decipher, the Portable Storage is less-so. Hence this straightforward to decipher, the Portable Storage is less-so. Hence this
document. document.
## Preliminaries ## String and Integer Encoding
### String and integer encoding ### Integers
#### varint With few exceptions, integers serialized in epee portable storage format are serialized
as little-endian.
Varints are used to pack integers in an portable and space optimized way. The ### Varints
lowest 2 bits store the amount of bytes required, which means the largest value
integer that can be packed into 1 byte is 63 (6 bits). Varints are used to pack integers in an portable and space optimized way. Varints are stored as little-endian integers, with the lowest 2 bits storing the amount of bytes required, which means the largest value integer that can be packed into 1 byte is 63
(6 bits).
#### Byte Sizes
| Lowest 2 bits | Size value | Value range | | Lowest 2 bits | Size value | Value range |
|---------------|---------------|-----------------------------------| |---------------|---------------|-----------------------------------|
@ -32,20 +36,47 @@ integer that can be packed into 1 byte is 63 (6 bits).
| b10 | 4 bytes | 16384 to 1073741823 | | b10 | 4 bytes | 16384 to 1073741823 |
| b11 | 8 bytes | 1073741824 to 4611686018427387903 | | b11 | 8 bytes | 1073741824 to 4611686018427387903 |
#### string #### Represenations of Example Values
| Value | Byte Representation (hex) |
|----------------------|---------------------------|
| 0 | 00 |
| 7 | 1c |
| 101 | 95 01 |
| 17,000 | A2 09 01 00 |
| 7,942,319,744 | 03 BA 98 65 07 00 00 00 |
These are simply length (varint) prefixed char strings. ### Strings
## Packet format These are simply length (varint) prefixed char strings without a null
terminator (though one can always add one if desired). There is no
specific encoding enforced, and in fact, many times binary blobs are
stored as these strings. This type should not be confused with the keys
in sections, as those are restricted to a maximum length of 255 and
do not use varints to encode the length.
"Howdy" => 14 48 6F 77 64 79
### Section Keys
These are similar to strings except that they are length limited to 255
bytes, and use a single byte at the front of the string to describe the
length (as opposed to a varint).
"Howdy" => 05 48 6F 77 64 79
## Binary Format Specification
### Header ### Header
A packet starts with a header: The format must always start with the following header:
| Header | Type | Value | | Field | Type | Value |
|---------------|-----------|-----------------------| |------------------|----------|------------|
| Signature | 8 bytes | 0x0111010101010201| | | Signature Part A | UInt32 | 0x01011101 |
| Version | byte | 0x01 | | Signature Part B | UInt32 | 0x01020101 |
| Version | UInt8 | 0x01 |
In total, the 9 byte header will look like this (in hex): `01 11 01 01 01 01 02 01 01`
### Section ### Section
@ -63,18 +94,12 @@ Which is followed by the section's name-value [entries](#Entry) sequentially:
| Entry | Type | | Entry | Type |
|-------------------|-----------------------| |-------------------|-----------------------|
| Name | string<sup>1</sup> | | Name | section key |
| Type | byte | | Type | byte |
| Count<sup>2</sup> | varint | | Count<sup>1</sup> | varint |
| Value(s) | (type dependant data) | | Value(s) | (type dependant data) |
<sup>1</sup> Note, this is only present if the entry type has the array flag
<sup>1</sup> Note, the string used for the entry name is not prefixed with a
varint, it is prefixed with a single byte to specify the length of the name.
This means an entry name cannot be more that 255 chars, which seems a reasonable
restriction.
<sup>2</sup> Note, this is only present if the entry type has the array flag
(see below). (see below).
#### Entry types #### Entry types
@ -90,7 +115,7 @@ The types defined are:
#define SERIALIZE_TYPE_UINT32 6 #define SERIALIZE_TYPE_UINT32 6
#define SERIALIZE_TYPE_UINT16 7 #define SERIALIZE_TYPE_UINT16 7
#define SERIALIZE_TYPE_UINT8 8 #define SERIALIZE_TYPE_UINT8 8
#define SERIALIZE_TYPE_DUOBLE 9 #define SERIALIZE_TYPE_DOUBLE 9
#define SERIALIZE_TYPE_STRING 10 #define SERIALIZE_TYPE_STRING 10
#define SERIALIZE_TYPE_BOOL 11 #define SERIALIZE_TYPE_BOOL 11
#define SERIALIZE_TYPE_OBJECT 12 #define SERIALIZE_TYPE_OBJECT 12
@ -103,11 +128,14 @@ The entry type can be bitwise OR'ed with a flag:
#define SERIALIZE_FLAG_ARRAY 0x80 #define SERIALIZE_FLAG_ARRAY 0x80
``` ```
This signals there are multiple *values* for the entry. When we are dealing with This signals there are multiple *values* for the entry. Since only one bit is
an array, the next value is a varint specifying the array length followed by reserved for specifying an array, we can not directly represent nested arrays.
the array item values. For example: However, you can place each of the inner arrays inside of a section, and make
the outer array type `SERIALIZE_TYPE_OBJECT | SERIALIZE_FLAG_ARRAY`. Immediately following the type code byte is a varint specifying the length of the array.
Finally, the all the elements are serialized in sequence with no padding and
without any type information. For example:
<p style="padding-left:1em; font:italic larger serif">name, type, count, <p style="padding-left:1em; font:italic larger serif">type, count,
value<sub>1</sub>, value<sub>2</sub>,..., value<sub>n</sub></p> value<sub>1</sub>, value<sub>2</sub>,..., value<sub>n</sub></p>
#### Entry values #### Entry values
@ -123,18 +151,32 @@ Note, I have not yet seen the type `SERIALIZE_TYPE_ARRAY` in use. My assumption
is this would be used for *untyped* arrays and so subsequent entries could be of is this would be used for *untyped* arrays and so subsequent entries could be of
any type. any type.
### Overall example
Let's put it all together and see what an entire object would look like serialized. To represent our data, let's create a JSON object (since it's a format
that most will be familiar with):
```json
{
"short_quote": "Give me liberty or give me death!",
"long_quote": "Monero is more than just a technology. It's also what the technology stands for.",
"signed_32bit_int": 20140418,
"array_of_bools": [true, false, true, true],
"nested_section": {
"double": -6.9,
"unsigned_64bit_int": 11111111111111111111
}
}
```
This would translate to:
![Epee binary storage format example](/docs/images/storage_binary_example.png)
## Monero specifics ## Monero specifics
### Entry values ### Entry values
#### Strings
These are prefixed with a varint to specify the string length.
#### Integers
These are stored little endian byte order.
#### Hashes, Keys, Blobs #### Hashes, Keys, Blobs
These are stored as strings, `SERIALIZE_TYPE_STRING`. These are stored as strings, `SERIALIZE_TYPE_STRING`.

Binary file not shown.

After

Width:  |  Height:  |  Size: 526 KiB