What is the outbox pattern?
It is when a request or the event processor performs its database transaction operation but does not publish the occurred event afterwards. Instead, during the database transaction, a record or multiple records are inserted into a dedicated database table, the outbox table, which holds the records which you want to publish to a specific event stream.
The outbox approach guarantees at least once event publishing, because?
Because Of the nature of atomic database transactions, the records get inserted into the outbox as part of the same database transaction that is performing the operations against the main entity tables.
It guarantees that all the operations will be successful after the transaction finishes, or nothing will have changed, ensuring data integrity.
Be careful with at least-once publishing and why idempotent operations are important
However, at-least-once publishing means that the publisher could publish the same event more than once, causing downstream processors to receive the same event more than once.
Hence downstream consumers need to ensure they are idempotent and don’t reprocess events, or if processed again, there are no harmful side effects.
The outbox pattern allows the denormalisation of events
Another benefit of the outbox table pattern is the outbox record event does not need to be a 1 to 1 mapping of an entity table. Instead, the event can be a consolidation of the values across the relationships of an object across multiple tables.
Denormalising the event’s data will improve the performance of downstream consumers if they require information about the context of the event, because it will reduce or even eliminate the need to look up relationship data or consume multiple event streams to get the complete picture.
Ensure you can order the events when publishing
Now when it comes to publishing events, the order in which events are published to an event stream is important, so bear in mind that an outbox table would require the record to be stored with:
- An incremental sequence id
- or, A created timestamp, or both
The outbox pattern will add load to the database
And also, remember and take into consideration that an outbox table will increase the amount of data stored within a database, and the service which is publishing the events will increase the load and data on the database,
How to Liberate events from the database with the outbox pattern?
Ok, great, so how do we get the events out of the database and liberate the events?
Well, there are a couple of approaches
The most reliable and efficient way to do this is using a technique called Change Data Capture,
Most databases write every insert, update and delete operation to a transaction log or append-log.
Which is used by databases in case of failure and when it needs to recover.
Some of the tools you can use depending on your platform are:
- Apache kafka Connect
- AWS Data Migration Service
How Cloud native database decrease cost to entry for event driven architecture?
Suppose you a running a native cloud database, then, for example.
If you are using one of the AWS databases, you can trigger Lambda functions after the database operation completes.
How to liberate events from legacy systems with Polling & querying the database table to publish events
If you can’t use the methods I have stated, you can even develop a service that polls the table for new records and then publishes the events.
I’ve done this several times, but it is complex to scale. But at the end of the day, you are simply polling the database, so it becomes a balancing act between finding the intervals to poll so that you don’t end up polling before the last poll has not finished, and publishing events multiple times.
Outbox pattern summary,
- Ensure at least once publishing when storing events in the database first
- The outbox approach gives you more control over what gets published.