Effective serde
By Writing Less Rust Code

Topics On
The Rust Programming Language
Edition 2018

Daniel Joseph Pezely

17 April 2019

The following was presented at the Vancouver Rust meeting on 17 April 2019.

This covers a powerful library for the Rust programming language, whereby paradoxically one’s software benefits by writing less code overall.

*   *   *

“My deep hierarchy of data structures is too complicated for auto-conversion.”

–someone not using serde

Contents:

  1. The Way Of Serde
  2. A Realistic Example – minimalist data files
  3. Simple Hierarchy of Enums – simple tricks
  4. Untagged Enums – “… indistinguishable from magic”
  5. Renaming Variants – Pretty JSON and prettier Rust
  6. Error Handling – using ? early and often
  7. Flattening – but still writing less code
  8. Asymmetric JSON – populate Rust fields only when JSON is non-null

Of course, all this applies to far more than just JSON, but JSON is easier for presentation purposes here and is likely familiar to a general programming audience.

Take time to read serde.rs entirely
before jumping into API docs at crates.io/crates/serde

You’ll find it time well-invested!

(Spoilers: it’s resolved entirely at compile-time, and without run-time “reflection” mechanisms.)

1. The Way Of Serde

Let serde give you superpowers by relying upon:

I. Decorate structs & enums with attributes
II. Write methods of auto-convert traits
III. Coalesce errors via ? operator
IV. Bonus: Deep or mixed structures? Easy!

I. Decorate structs with attributes

Attributes in Rust are like decorators in Python. These are compiler directives for code-generation and related capabilities. The syntax is a hash symbol (#) followed by a clause within square brackets.

See serde.rs/attributes.html

II. Write methods of auto-convert traits

If writing code handling common patterns: that’s probably the wrong approach!

If writing code to handle name or value conversions: that’s probably the wrong approach!

If checking for existence of nulls or special values: that’s probably the wrong approach!

III. Coalesce errors via ? operator

Make aggressive use of ? operator; e.g., use Result and ErrorKind together

Implement various methods of From and Into traits. The compiler reveals exactly what you need, so this becomes fairly straight-forward plug-and-chug

A common Rust idiom– not just a serde thing– is using the question-mark operator.

IV. Deep or mixed structures? Easy!

Populate a nested enum and their variants from a flattened set. For instance, each variant must map to exactly one Enum. Then, nested Enums may be resolved when decorating with a single attribute

Ingest minimal data file structures to well-defined structures in Rust. For example, JSON without naming each structural component, where keys contain data (NOT name of struct).

Thus, have your idiomatic Rust cake and eat minimalist data files too!

(For those that are non-native to English, “Wanting to have your cake and eat it too” simply indicates the impression of a paradox. For those using serde, however, there is no paradox at all.)

2. A Realistic Example

  1. Each entry may have multiple categories
  2. Given as a flattened set in JSON
  3. Expand to well-defined structs in Rust

Unpacking Minimalist JSON:

{
  "energy-preferences": {
    "2000s": ["solar", "wind"],
    "1900s": ["kerosene", "soy", "peanut", "petroleum"],
    "1800s": ["wind", "whale", "seal", "kerosene"]
  }
}

Notable:

Starting From The Top, Serde can handle various naming conventions such as snake_case, camelCase, PascalCase, kebab-case, etc.

#[derive(Serialize, Deserialize, Debug)]
#[serde(rename_all = "kebab-case")]
struct EnergyPreferenceHistory {
    energy_preferences: EnergyPreferences
}

#[derive(Serialize, Deserialize, Debug)]
struct EnergyPreferences (HashMap<Century, Vec<EnergySources>>);

See serde.rs/attributes.html and particularly, serde.rs/container-attrs.html

Avoid merging concepts in an enum. For instance, avoid the following.

enum EnergySources {  // Don't mix categories like this!
    Solar,
    Wind,
    // ...
    Kerosene,
    Petroleum,
    // ...
    PeanutOil,
    SoyOil,
    // ...
    SealBlubber,
    WhaleBlubber,
    // ... 
}

It would be more idiomatic Rust grouping them by category, instead.

3. Simple Hierarchy Of Enums

Continuing from previous example:

enum EnergySources {
    Sustainable(Inexhaustible),
    Animal(Blubber),
    Vegetable(Crop),
    Mineral(Fossil),
}

enum Inexhaustible { Solar, Wind, /* ... */ }

enum Blubber { Seal, Whale, /* ... */ }

enum Crop { Peanut, Soy, /* ... */ }

enum Fossil { Kerosene, Petroleum, /* ... */ }

This is more idiomatic Rust, but our data file doesn’t look anything like this… Fear not!

(As an aside, focus on the Rust code, not precision of the categories above. For instance, pulp or pellets made from trees or other vegetable matter are all ignored here yet were in common usage during the late Nineteenth and early Twentieth Century within North America. Other divisions or categories might be better, such as petrochemical, oleochemical, etc. Or rendered, cultivated, extracted, etc.)

4. Untagged Enums

Decorate With Attributes. Attributes are a feature of the Rust language and used extensively for fine-tuning how serde and serde_json behave.

Continuing{1} from previous example:

#[derive(Serialize, Deserialize, Debug)]
#[serde(untagged)]                 // <-- Unflatten from compact JSON
enum EnergySources {
    Sustainable(Inexhaustible),
    Animal(Blubber),
    Vegetable(Crop),
    Mineral(Fossil),
}

See “Untagged” section in serde.rs/enum-representations.html

5. Renaming Variants

For both pretty JSON and prettier Rust, use attributes to control naming of a field or variant. Then, one context gets to use a name that makes the most sense there and perhaps an entirely different name for the other context.

#[derive(Serialize, Deserialize, Debug, PartialEq, Eq, Hash)]
enum Century {
    #[serde(rename = "1800s")]
    NinteenthCentury,

    #[serde(rename = "1900s")]
    TwentiethCentury,

    #[serde(rename = "2000s")]
    TwentyfirstCentury
}

Each has its preferred naming convention.{2} Rust code gets idiomatic mixed case naming for enum variants, and JSON uses a more terse mnemonic for readability there.

Note use of additional attributes: PartialEq, Eq, Hash. This accommodates sorting and storage within a hash table or tree structure.

6. Error Handling

Use the question mark operator, ?, early and often. This operator expands to an if-else that attempts to unwrap a Result to its Ok variant. If the value is instead an Err indicating an error, the else clause contains a return statement.

As of Rust 1.26 (May 2018), its use is allowed within the main function as well.3

fn main() -> Result<(), ErrorKind> {
    let json_string = fs::read_to_string("energy.json")?;

    let sources: EnergyPreferenceHistory =
        serde_json::de::from_str(&json_string)?;

    println!("{:#?}", sources);
    Ok(())
}

Note uses of question mark ? operator above.{4}

Implementing just the above, the compiler helpfully tells you exactly which impl From methods to add.

As an example ErrorKind for use with Result type{5} and continuing from previous example:

#[derive(Debug)]
enum ErrorKind {
    BadJson,
    NoJson,
    NoFilePath,
    Unknown,
}

Implementing From methods for use with ? operator{6} is aided by the compiler because it helpfully informs which methods are missing.

As an exercise, comment-out all impl From and see how the compiler indicates exactly what needs to be written. Then, it’s a matter of taste regarding how deep you go into addressing each particular error to your own ErrorKind.

There’s lots to love about Rust!

impl From<serde_json::Error> for ErrorKind {

    fn from(err: serde_json::Error) -> ErrorKind {
        use serde_json::error::Category;

        match err.classify() {

            Category::Io => {
                println!("Serde JSON IO-error: {:?}", &err);
                ErrorKind::NoJson
            }

            Category::Syntax | Category::Data | Category::Eof => {
                println!("Serde JSON error: {:?} {:?}",
                         err.classify(), &err);
                ErrorKind::BadJson
            }

        }
    }
}

7. Flattening

Other powerful features of serde that offer writing less code, called flattening, transparently hides one layer of nesting. A common use case is to eliminate a wrapper struct or the name of an inner hash-table.

For instance:

#[derive(Serialize, Deserialize)]
struct CatalogueEntry {
    id: u64,

    #[serde(flatten)]                     // <-- Field Attribute
    description: HashMap<String, String>,
}

The above Rust code would ultimately produce the following JSON representation:

{
  "id":     1234,
  "size":   "bigger than a car",
  "weight": "less than an airplane"
}

Note that all fields get rendered to same level within JSON.

See serde.rs/field-attrs.html.

For writing the preceding item to a JSON file, the corresponding Rust code would be:

fn populate_catalogue() -> Result<(), ErrorKind> {
    let id = 1234;

    let mut description = HashMap::new();
    description.insert("size".to_string(),
                       "bigger than a house".to_string());
    description.insert("weight".to_string(),
                       "less than an airplane".to_string());

    let catalogue = vec![CatalogueEntry{id, description}];

    fs::write("foo.json", serde_json::to_string(&catalogue)?)?;
    Ok(())
}

There’s nothing special here, because serde handles iterables. You just implement the trait.

8. Asymmetric JSON

Populate fields only when non-null makes for smaller data files, so there’s less to store, less to send over the network, less for a human to read, etc.

struct Thing {
    pub keyword: String,

    #[serde(default="Vec::new")]   // <-- constructor
    pub attributes: Vec<String>,
}

This yields an empty Vec instead of Vec with empty string, and it gets done without wrapping value with Option.

In other words, serde simply Does The Right Thing for you.

Finally

As incentive to read serde.rs documentation, be especially certain to see the section, Borrowing data in a derived impl.

When data has already been loaded and memory allocated: let your deserialized structs track only references.

Copyright © 2019 Daniel Joseph Pezely
May be licensed via Creative Commons Attribution.