Cheap Cache
Across an API, you have two states to manage: client-side and server-side. When things change on the server side, you want to see those changes on the client side. This isn't always easy and not every out-of-the-box library is going to work right for your exact situation. This post describes one way I've handled it for a large, complicated interface. There are many ideas out there on how to set this up, a lot coming from how database replication works. If you're running into the same sorts of performance issues I was hitting, maybe you'll find some inspiration in how I've done things.
Why do we need a cache on the client-side that mirrors the server state?
It used to be that, even with client-side templating, you would only render the data on-demand. When a user clicked one thing, you'd show a loader, fetch the data from an API, then render the new markup. This had its problems, but it was mostly manageable and users were happy just to not have the entire page reload.
These days are different. We have reactive interfaces now and those are backed by full state stores. These frameworks like vue and react are now sitting right on top of your data layer ready to reload markup as soon as the data changes. Most state management tools rely on a data layer that is essentially one large nested JavaScript object. That's, um, fine. Well, until it isn't. For complicated interfaces that are toggling many different things constantly and triggering many partial view changes, this can really lead to poor performance.
To get around the giant state object as a data layer, you can use the "offline" browser APIs to store data local to your user's machine. You don't need to go so far as making the content available as offline, but browser have a database, IndexedDB and with a little finagling we can make a cache backed by a remote API.
Event Sourcing
One of the neatest concepts in distributed networks is event sourcing. There's a talk by Martin Kleppmann that really makes a compelling case for storing events rather than data in our databases. These events keep track of all the changes made to the data anyway. They are just as important as understanding what the final applied data will be after the event is handled. So instead of storing data, we just hold onto the events. Then if anyone needs to keep a mirror of the data, they can tail that stream of events and figure out what the data will be on their own. This is some really powerful stuff.
There are logistical hurdles like holding gigantic transaction logs of events rather than the computed data, but there are solutions for all that. The other thing that's difficult is getting the events right in the first place, deciding what to include in the transaction and how big it needs to be.
Cheap Event Sourcing
For our case, we're not going to do full-fledged event sourcing. We're storing documents, not events in our database. And that's fine, we may still want to access documents directly without having to hold some internal state of the computed events. This allows us to do that. It's also more familiar for those not used to thinking in events. However, we are still going to make events an important component of our API. This way, clients listening at the other end can pick up on partial changes rather than having to re-query all the time.
We're going to keep the events simple and not really think of them as partial changes to the data. Instead, we'll just treat them as entire documents (like you'd store in a NoSQL database). We'll have very limited actions that can happen to each of the documents: CREATE_RECORD
, UPDATE_RECORD
, DELETE_RECORD
. It wouldn't be hard to get more granular and have an event for each attribute on the document (or "record" or whatever). You have to just consider who's listening at the other end and if they need to keep track of partial changes or not. They might be happy to just replace the entire document once even one attribute was updated. There are other considerations too like having too much chatter on the network.
We're note doing event sourcing and that's what makes this cache cheap. There's no worry about snapshots or replaying events. We've taken some shortcuts. This obviously reduces some of the power of events like pre-filtering the events to those that are relevant for your service and ignoring the rest. It also makes it difficult to create "materialized views". That's fine for our case or even preferred. We're just trying to keep our client-side cache updated.
Server API
There are two things we need our API to provide:
- The full index, the entirety of server-side state, possibly pre-filtered for our specific client's needs.
- Latest deltas, a partial list of the last things that have changed since we last pulled content.
These are really the only two things our caching application is worried about. The first time our client-side application runs, we can get the full index of all the necessary content. We put that in the cache and then wait for any incoming changes.
Let's say we have core database documents that look like this:
struct Uid(String);
struct Record {
id: Uid,
name: String,
}
Here is how we are defining what the deltas are. As mentioned before, our events are fairly chunky. We're not tracking the nitty-gritty details of inner-record attribute changes. The key is that our delta list needs to be finite and approachable by our cache-management logic. Each change will need to be handled individually by the cache.
pub enum Delta {
RecordCreated(Record),
RecordUpdated(Record),
RecordDeleted(Uid),
}
Index Response
We'll have a /content/getIndex
route as part of our API. With no parameters, it just returns everything in the database, the full set to be cached.
{
"latestDelta": "2021-09-03T12:06:27.142045346Z|2|LPcamK",
"records": [
{id: "7EvuKyhAdeAT", name: "Record One"},
{id: "Ux8Zx3QowpRd", name: "Record Two"},
]
}
The list of records is simple enough and can safely be dumped as the entire contents for your cache.
Deltas Response
The latestDelta
value is a sort of cursor or marker that keeps track of the last change that has happened and resulted in this exact set of records. It is just an ID of the last delta. Next time we make a request to get the index (like on a page reload), we can pass along this latestDelta
value to tell the server what the last change we know about is. If it's been a long time and many changes have happened, the server can just return the full index again. But if it's only been a short while, the server can just give us the last few changes that happened.
So let's make a request to the full index again:
/content/getIndex?latestDelta="2021-09-03T12:06:27.142045346Z|2|LPcamK"
.
{
"latestDelta": "2021-09-03T12:27:53.954377922Z|2|7xxnSL",
"deltas": [
{
"id": "2021-09-03T12:27:53.954377922Z|2|7xxnSL",
"action": "RECORD_CREATED",
"delta": {
"id": "6NFIA16wzwMK",
"name": "Record Three"
}
}
]
}
We provided the last delta we knew about. It wasn't from that long ago, but a single change did happen while we weren't looking: a third record was created. To update our cache, we just need to add this record. Then we're all synced with the server and that's it. We can keep playing this game of calling the /content/getIndex
route over and over and we'll be more or less in-sync.
Change Requests
If our client-side application is also making changes over the API, we'll need the responses from those requests to fit into this same scheme of keeping the cache updated. We shouldn't have to poll the /content/getIndex
route again after each change we make. Instead, every change request we send over the API just responds with that list of deltas:/record/create?name="Record+Four"&latestDelta="2021-09-03T12:27:53.954377922Z|2|7xxnSL"
.
{
"latestDelta": "2021-09-03T12:28:35.812854133Z|2|q-z4k0",
"deltas": [
{
"id": "2021-09-03T12:28:35.812854133Z|2|q-z4k0",
"action": "RECORD_CREATED",
"delta": {
"id": "Ux8Zx3QowpRd",
"name": "Record Four"
}
}
]
}
And now we don't really have to do anything special. After handling the deltas response from the /content/getIndex
route, we've already supported this situation. We just do the same thing we did before to update our cache and we're done.
What's neat about this approach is that if there were other changes possibly coming from other users in the same system, we'll automatically get their changes from this one request as well. We also aren't fussing around with different API responses, there're just the two: full index or partial list of deltas.
Server Implementation of Deltas
The tricky part of all this is how the server deals with deltas. Each API operation must keep track of all the things that it's changed and store that information to the database. This can get tricky if your operation handler isn't making all the changes in one place.
As you can see, the delta IDs are more complex than the record IDs. They are made up of a timestamp, an increment (for when multiple deltas are created around the same time), and an extra random element for extra conflict handling. Unlike record IDs, we want our delta IDs to be sortable.
All-in-all, it can be freeing to have only one type of server response for all API operations.
Client-side Utilities
I like to really compartmentalize the different elements used for managing this cache.
Utility | Description |
---|---|
Database | Some sort of wrapper to IndexedDB . This provides a layer for the rest of your application to save and retrieve content from the local database. It might be useful to use this layer as a way to pre-optimize the queries you'll need to use by inserting content already sorted into materialized collections. |
API Sender | The job of this utility is to send requests to the server, parse the responses, and update the database. It will also be sure to emit an event about every database change. |
Event Bus | Events are key to sending update information to all the stores. Especially when working with SharedWorkers, you'll want to make sure you have a solid understanding of how events are flowing through your application. |
Store | Stores provide a particular slice into the local database. They are meant to be used in reactive situations. They listen for events about database changes relevant to the content in this store and then update their subscribers with the latest from the database. |
Store Subscriber | Subscribers live in the view layer and are hooked into the reactive rendering mechanisms. When a store has new data, these subscribers will properly re-render their markup immediately. |
This architecture gives us one very important thing: nothing is held in-memory. That means subscribing to a store is cheap. You can build out very complicated interfaces that are all still reactive, but not held back by keeping all the state in one large object (or web of objects).
The magic to all this is the store and that really only works because of how our API is set up to use these deltas in all situations. We aren't going to go into detail about the other utilities.
Stores
You might have a view component that is rendering information about a single record. We can create a store that is dedicated to keeping information up-to-date about that record.
import {Store, events, api} from './local'
export default class SingleRecord extends Store {
// our store is paramaterized so we can fine-tune what
// events and data changes we care about
constructor ({ recordId }) {
this.recordId = recordId
// the key function of the store is to fetch data from the
// database. we pass a function to the parent that will
// get called whenever this store is triggered
super(async ({ db }) => {
const record = await db.recordsById.get(recordId)
// do any special processing of the record
return record
})
// we need to trigger our store whenever certain relevant
// events occur.
events.on([
events.DELTA_RECORD_UPDATED,
events.DELTA_RECORD_DELETED,
], (e) => {
// we're getting alerted for changes to _any_ record,
// but we only want to trigger updates when _our_ record is updated.
const { id } = e.detail
if (id === recordId) {
// this is a parent function that re-runs our fetch function
// and then notifies all of our subscribers with this info
this.trigger(e)
}
})
}
// We can provide some shortcuts here as this store is already
// available to our subscribers
setName (name) {
// we don't need to handle the response or await the promise.
// the API's job is to save all that to the database, then
// emit the right events.
// our store is already handling the events, so we're all set.
api.record.setName(this.recordId, name)
}
}
Your view component would subscribe to the store and update changes as necessary.
import {SingleRecord} from '../stores'
// the details are different per view library.
// here's what it would like with Vue
const props = defineProps({
recordId: {type: String, required: true},
})
// our reactive variable
const record = ref({})
const store = new SingleRecord(props.recordId)
store.subscribe(updatedRecord => {
// update the reactive variable which triggers view changes
record.value = updatedRecord
})
// can send API requests from here too
const onNameChange = newName => {
// don't need to await or handle response
// it's all handled by the subscription
store.setName(newName)
}
The base Store
class is mostly a subscription manager.
import {db, uid} from './local'
export default class Store {
constructor (fetch) {
this.subscriptions = new Map()
this.fetch = fetch
}
subscribe (cb) {
const key = uid()
this.subscriptions.set(key, cb)
// provide the intial value from the store
setImmediate(async () => {
const newState = await this.fetch({ db })
cb(newState)
})
// unsubscribe method
return () => this.subscriptions.delete(key)
}
// re-send updated state to all subscribers
async trigger (event) {
const newState = await this.fetch({ db })
this.subscriptions.forEach(sub => sub(newState))
}
}
There is more you can do here like pausing/resuming updates, handling errors, and deferring state updates until after initial setup. The principles are the same.
Cheap Cache FTW
These are very simple mechanisms: events and callbacks, reading from the database on-change, handling API requests, etc. It's not overly complicated. We're glossing over some of the more tricky situations like providing user feedback and handling data conflicts. Still, we have very clear data flow and state management to lean into.
The benefits are that we have a cache of the content that is reactive and ready to be plugged into our view layer. We can build out an entire state management system that is completely isolated from whatever our view library is. API changes are simple and consistent, but also can be added onto indefinitely without upsetting the underlying system.