Borsh, binary serializer for security-critical projects.
Borsh stands for Binary Object Representation Serializer for Hashing. It is meant to be used in security-critical projects as it prioritizes consistency, safety, speed; and comes with a strict specification. It optimizes for the following qualities in the decreasing priority:
- Consistency
- Consistency means there is a bijective mapping between objects and their binary representations. There is no two binary representations that deserialize into the same object. This is extremely useful for applications that use binary representation to compute hash;
- Safety
- Borsh implementations use safe coding practices. In Rust, Borsh uses almost only safe code, with one exception to avoid exhaustion attack;
- Specification
- Borsh comes with a full specification that can be used for implementations in other languages;
- Speed
- In Rust, Borsh achieves high performance by opting out from Serde which makes it faster than bincode in some cases; which also reduces the code size;
Example
use borsh::{BorshSerialize, BorshDeserialize};
#[derive(BorshSerialize, BorshDeserialize, PartialEq, Debug)]
struct A {
x: u64,
y: String,
}
#[test]
fn test_simple_struct() {
let a = A {
x: 3301,
y: "liber primus".to_string(),
};
let encoded_a = borsh::to_vec(a).unwrap();
let decoded_a = borsh::from_slice::(&encoded_a).unwrap();
assert_eq!(a, decoded_a);
}
Opting out from Serde allows borsh to have some features that
currently are not available for serde-compatible serializers.
Currently we support two features: borsh_init
and
borsh_skip
(the former one not available in
Serde). See
https://github.com/nearprotocol/borsh
Benchmarks
We measured the following benchmarks on objects that blockchain projects care about the most: blocks, block headers, transactions, accounts. We took object structure from the nearprotocol blockchain. We used Criterion for building the following graphs. The benchmarks were run on Google Cloud n1-standard-2 (2 vCPUs, 7.5 GB memory). Note, size only roughly corresponds to the serialization complexity which causes non-smoothness of the graph.
See complete report here.
Specification
In short, Borsh is a non self-describing binary serialization format. It is designed to serialize any objects to canonical and deterministic set of bytes.
General principles:- integers are little endian;
- sizes of dynamic containers are written before values as u32;
- all unordered containers (hashmap/hashset) are ordered in lexicographic order by key (in tie breaker case on value);
- structs are serialized in the order of fields in the struct;
- enums are serialized with using u8 for the enum ordinal and then storing data inside the enum value (if present).
Informal type | Rust EBNF * | Pseudocode |
---|---|---|
Integers | integer_type: ["u8" | "u16" | "u32" | "u64" | "u128" | "i8" | "i16" | "i32" | "i64" | "i128" ] | little_endian(x) |
Floats | float_type: ["f32" | "f64" ] | err_if_nan(x) little_endian(x as integer_type) |
Unit | unit_type: "()" | We do not write anything |
Bool | boolean_type: "bool" |
if x { repr(1 as u8) } else { repr(0 as u8) } |
Fixed sized arrays | array_type: '[' ident ';' literal ']' | for el in x repr(el as ident) |
Dynamic sized array | vec_type: "Vec<" ident '>' |
repr(len() as u32) for el in x repr(el as ident) |
Struct | struct_type: "struct" ident fields | repr(fields) |
Fields | fields: [named_fields | unnamed_fields] | |
Named fields | named_fields: '{' ident_field0 ':' ident_type0 ',' ident_field1 ':' ident_type1 ',' ... '}' |
repr(ident_field0 as ident_type0) repr(ident_field1 as ident_type1) ... |
Unnamed fields | unnamed_fields: '(' ident_type0 ',' ident_type1 ',' ... ')' | repr(x.0 as type0) repr(x.1 as type1) ... |
Enum |
enum: 'enum' ident '{' variant0 ',' variant1 ',' ...
'}' variant: ident [ fields ] ? |
Suppose X is the number of the variant that the enum
takes. repr(X as u8) repr(x.X as fieldsX) |
HashMap | hashmap: "HashMap<" ident0, ident1 ">" |
repr(x.len() as u32) for (k, v) in x.sorted_by_key() { repr(k as ident0) repr(v as ident1) } |
HashSet | hashset: "HashSet<" ident ">" |
repr(x.len() as u32) for el in x.sorted() { repr(el as ident) } |
Option | option_type: "Option<" ident '>' |
if x.is_some() { repr(1 as u8) repr(x.unwrap() as ident) } else { repr(0 as u8) } |
Result | result_type: "Result<" ident '>' |
if x.is_ok() { repr(1 as u8) repr(x.unwrap() as ident) } else { repr(0 as u8) repr(x.unwrap_err() as ident) } |
String | string_type: "String" |
encoded = utf8_encoding(x) as Vec<u8> repr(encoded.len() as u32) repr(encoded as Vec<u8>) |
Borsch or Borscht is an extremely tasty sour soup common in Eastern Europe and Northern Asia. The primary ingredients are beetroots or tomatoes that give the dish its distinctive red color.
The similarity between the name of the serializer and the fact that many members of the development team are extreme fans of this savory dish is entirely coincidental.