What makes Fastify highly performant part 1: JSON serialization

At Indy, we use Fastify as the framework for our backend servers. Fastify is a Node.js framework for developing web servers. According to its benchmark, it’s more performant than most of the common Node.js frameworks. Many people may argue that the benchmark is not relevant and it depends on the usage. That might be true but it’s not our topic today. I’m going to look into some of the interesting design decisions of the framework that makes it faster. There will be a series of articles and this is the first one that focuses on JSON serialization.

Since JSON is by far the most dominant format used in HTTP request/response data exchange. JSON serialization can have a big impact on the performance of a web server.

The native function: JSON.stringify

A native function exists in Javascript to do JSON serialization: JSON.stringify. But due to the nature of Javascript that all variables have dynamic types, it has to do many checks at the runtime to determine how to serialize each key of the JSON input according to its type, thus the performance can be hardly optimized.

How did Fastify do

So here comes the interesting part: how Fastify achieves a significantly better performance of JSON serialization than JSON.stringify? The key is to use JSON Schema. We know that a part of the poor performance of JSON.stringify is due to the dynamic type checks at runtime, so what if we know beforehand the structure of the JSON object (its keys, the type of each key) that we need to serialize? We could easily build a function based on the JSON Schema, and save all the time wasted on checking the type of each key at runtime. That’s exactly what fast-json-stringify does, the internal library that Fastify uses to do JSON serialization. It takes a JSON Schema object as argument and generates a custom stringify function for future use.

Dive into the code

Let’s do a simple experimentation to see how stringify functions are generated in fast-json-stringify.

Imagine that we want to serialize the following JSON structure:

{
	"fisrtName": "Foo"
	"lastName": "Bar",
	"age": 20
}

First of all, we need a JSON Schema that defines the structure, for example:

const schema = {
	"title": "Person schema",
	"type": "object",
	"properties": {
		"firstName": {
			"type": "string"
		},
		"lastName": {
			"type": "string"
		},
		"age": {
			"type": "integer",
			"minimum": 0
		}
	},
	"required": ["firstName", "lastName"]
}

Next step we should build a function based on the JSON Schema that serializes our JSON object to a string. As we got the JSON schema, we could simply loop over each property and transform the value to string according to its type. For example, firstName is of type string, we need to add double quotes to its value and the result will be "firstName": "${obj.firstName}".

function build(schema) {
  return function(obj) {
    let json = '{';
    Object.keys(schema.properties).forEach((key, i, array) => {
      json += `"${key}":`;
			const type = schema.properties[key].type;
      switch (type) {
        case 'string':
          json += `"${obj[key]}"`;
          break;
        case 'integer':
          json += '' + obj[key];
          break;
        default:
          throw new Error(`${type} unsupported`);
      }
      if (i < array.length - 1) {
        json += ',';
      }
    });
    json += '}';
    return json;
  }
}

// serialization
const customStringify = build(schema);
customStringify(obj);

If we do a benchmark test, it’s over 30% faster than JSON.stringify.

JSON.stringify x 4,586,512 ops/sec ±0.42% (99 runs sampled)
custom-json-stringify x 6,236,179 ops/sec ±0.51% (90 runs sampled)
Fastest is custom-json-stringify

The above function saves some type checking at runtime, but it’s still not optimal. We have to iterate over each property of the JSON schema and find the right way to serialize its value whenever we call the stringify function. The JSON schema will not change once set, so can we do the iteration earlier before calling the stringify function? Actually we need a “generated” function instead of a closure that has access to the JSON schema. There is a way in Javascript to write code dynamically and executes it as a function: new Function(). Here is the improved version:

function build(schema) {
  let code = `
    'use strict'
    let json = '{'
  `

  Object.keys(schema.properties).forEach((key, i, array) => {
		code += `
      json += '"${key}":'
    `

    const type = schema.properties[key].type
    switch (type) {
      case 'string':
        code += `
          json += '"' + obj.${key} + '"'
        `
        break;
      case 'integer':
        code += `
          json += '' + obj.${key}
        `
        break;
      default:
        throw new Error(`${type} unsupported`)
    }

    if (i < array.length - 1) {
      code += 'json += \\',\\''
    }
  })

  code += `
    json += '}'
    return json
  `
  return new Function('obj', code)
}

The generated function looks like this:

function stringify(obj) {
	let json = '{'
    json += '"firstName":'
        json += '"' + obj.firstName + '"'
      json += ','
    json += '"lastName":'
        json += '"' + obj.lastName + '"'
      json += ','
    json += '"age":'
        json += '' + obj.age
  json += '}'
  return json
}

In the end, it’s just some string concatenation when we call the stringify function. The benchmark shows that the performance is significantly boosted.

JSON.stringify x 4,653,809 ops/sec ±0.28% (97 runs sampled)
fast-json-stringify x 1,032,584,240 ops/sec ±0.23% (100 runs sampled)
Fastest is fast-json-stringify

Conclusion

The above code is from the first commit of fast-json-stringify. Of course the actual library is more complex than that. For example it needs to validate the input to make sure that it has the correct structure, it should deal with circular JSON reference, it also has to sanitize the JSON Schema for security concerns as we can see from above that it uses new Function() internally, and maybe we should never pass an unknown or user-generated schema to it.

The idea behind it is a really simple but yet great and clever one which essentially moves the runtime analysis to “compile time” to achieve performance boost.

Laisser un commentaire