Tracking configuration updates over time

When we think of user configuration, we usually represent it as a fixed and global value. For example, a user only has a single email address and has a preferred display language. It can be updated at any time, replacing the old value. When we need to send an email, we just have to look for these values and we don’t need to know what the previous values were, we only need the latest configuration.

However, we may need to know what these configurations were at any given time. At Indy, our users may change their fiscal preferences depending on their obligations across the years. Current user settings may not be the same as they were 6 months ago, but we still have to know and support previous configurations in case the user needs to amend their previous declarations.

This requirement first arose for VAT (value added tax) declarations: given your profits and your personal choices, you may or may not be required to declare your VAT. And if you are, you can either declare them monthly, quarterly or yearly. This has various implications across the application: some tax calculations and forms need to know if the user is liable to VAT. At first, we naively stored this information as any other basic configuration value, overwriting previous values at each update. As a result, when a user needed to make adjustments to a previous tax declaration, let’s say for fiscal year 2019, and VAT configuration was updated in the meantime for fiscal year 2020, we mistakenly took this new value into account for 2019.

We needed to find a way to store what we call “historized configuration values”: we must be able to answer this kind of questions:

“Was the user liable to VAT on September 17th, 2020?”
“Was the user liable to VAT at least once in 2021?”
“Since when has the user been liable to VAT?”
“On what period does the user have to make monthly VAT declarations?”

ℹ️ As we use MongoDB for our app database, we use the term « document ». It can be read as « row » in a SQL database context.

The naive solution

The first “obvious” solution that comes to mind is to store a configuration timeline for each user, an array of objects containing configuration values and their validity timeframes (start and end dates). At signup, we create for the user a single document containing an initial start date, for example in 1970, and no end date, as this is the user current configuration. When the user makes an update, they now have to provide the date from which the change is effective. We create a new document with the start date being the provided effective date, and update the previous document by setting the end date to the new configuration start date. Later on, when we need to get configuration at a given date, we just have to look for the configuration document that contains it within its time range.

But this solution has a major flaw: we need to take great care in updating our configuration documents, ensuring that the timeline is complete, each configuration end date being the start date of the following configuration. As configuration updates can be retroactive, we may even have to delete whole documents if the new one completely overlaps their time range. So at each configuration update, we have to make up to three different operation types, that can damage timeline integrity if done incorrectly :

Insertion for the new configuration document
Updates for previous documents, updating their timeframe to ensure a gapless timeline
Deletion of “shadowed” configuration documents when their period is overridden by the new one

Here is a little diagram illustrating an update scenario: if we create the new orange configuration starting before the blue end date, we have to update its end date and delete the green and red documents as they are completely overwritten by orange.

On the other hand, accessing configuration values is trivial, as simple as a database request with date filters. But we preferred to go on with another solution that makes configuration access slightly more complex, in exchange for data integrity and consistency guarantees.

The immutable solution

Our final solution was heavily inspired by Martin Fowler’s Temporal Property article. He describes how we can hold configuration values on what he calls Value objects, with an Effectivity property, describing when this value is effective. We took his approach and terminology with an event sourcing pattern. We wanted each configuration update to imply a single document insertion in our database, without needing any form of updates on previous ones. This way, we ensure data consistency and minimise the risk of errors messing up with user configurations.

We ended up with this kind of configuration document:

{
  "_id": "random_id",
  "user_id": "user_id",

  // The actual configuration payload
  "configuration": {
    "vat_selected": "vat_ht",
    "vat_frequency": "monthly"
  },

  // Auto generated value, being this document creation date
  "known_at": {
    "$date": "2020-01-20T15:37:14.547Z"
  },

  // User controlled value, telling when this configuration starts being active
  "effective_date": {
    "$date": "2000-01-01T15:37:14.547Z"
  },

  // *Optional* user controlled value, telling when this configuration will expire and not be active anymore
  "end_date": {
    "$date": "2021-01-01T15:37:14.547Z"
  }
}

Reading a configuration value at a given date consists of finding the most recent document (greatest known_at date) with an effective date before the given date, and an end date after the given date (if it exists). This does the job for our simplest requirement “I want to know what is the configuration at this specific point in time”, but the other requirements are a bit tougher to meet if we directly work with raw documents.

Instead, we create an intermediate representation that looks like the “naive timeline” we talked about earlier, which is way easier to work with.

Temporal documents are represented here as colored lines (blue, green and red), each color being a single value, valid for the given time range. All of them have an effective date, being the dot at the beginning, and an optional end date, being the dot at the end.

From this raw representation of our configuration documents, we can construct a simpler, cleaner timeline (the colored rectangles at the bottom). From here, we can meet all of our requirements, and easily work through time and configurations! And icing on the cake, we can even reconstruct this final timeline as it was in the past; we only need to filter out documents that were created after a given date.

The implementation

We have multiple micro-services that need to access configuration histories, so we made a standalone internal npm package.

It is a collection of pure functions that can be instantiated by providing an array of raw configuration documents. In order to be interoperable with any type of configuration, it only relies on the presence of an effective date, end date and first known date on each document (the TemporalDocument type). So it can work with VAT configurations or anything else. Once instantiated, you get an HistoryService having the final timeline internal representation, and only exposes a set of generic functions that operate on it:

export interface HistoryService<T extends TemporalDocument> {
  /**
   * Effective configuration at current date.
   * Returns `undefined` if no document is effective at that date.
   */
  getCurrentConfiguration(): TimelineItem<T> | undefined;

  /**
   * Effective configuration at given date.
   * Returns `undefined` if no document is effective at that date.
   */
  getConfigurationAtDate({ date }: { date: Date }): TimelineItem<T> | undefined;

  /**
   * An array of effective documents on given date range.
   * Returns an empty array if there is no effective documents on provided date range.
   */
  getConfigurationsOnDateRange({
    startDate,
    endDate,
  }: {
    startDate: Date;
    endDate: Date;
  }): TimelineItem<T>[];
}

For our specific use case of VAT configuration, we created a VatHistoryService that wraps our low level HistoryService, exposing only VAT-oriented functionality, ready to be used where needed in the app:

Wrapping up

It’s been almost a year since we deployed this solution for our users, and we are satisfied with the results. We no longer have issues with stale configurations nor data integrity problems.

However, this posed some challenges. For instance, at release time, we had to initialize user configuration for all users, based only on what we knew about them then. There was no way to know what their previous configurations were, and some users did not even complete their account setup by the time of initialization. Keep this in mind if you adopt this kind of solution on an existing system: there is no guarantee that stored past configurations are correct for users that signed up before historization setup.

Also, it is worth noting that this design makes configuration updates trivial, at the expense of data readability and accessibility. Specific developments have to be made to visualize configuration history (for debugging and customer support purposes, for example). But we believe that as long as our data is safely stored and updated, this is a tradeoff we are willing to make. In case of a bug, it is way easier to fix pure read-only functions than having to fix database data.