When adopting microservices architecture, the traditional relational database might not meet the requirement. Given different types of services, different databases are more suitable.
The following list of points that need to be considered.
When an organization moves to microservices architecture from an existing monolithic application built with SQL database, the existing database running in a data center may be used again with some process updates. For a mission-critical application, it is still recommended to use a commercial database like Oracle or MS SQLServer. The real problems with these commercial databases are the associated costs (Licenses, Hardware, Maintenance, etc.) and provisioning process. Oracle is industry’s single largest database vendor - which was great during the days before cloud and open source. But now Oracle is in trouble.
When you start a microservice that needs to be done within 2 weeks, you cannot wait for an Oracle database instance to be created within 3 months. When moving to the cloud, open source SQL databases are more flexible as vendors have fully automated everything for the popular ones. Most cloud providers have multiple database offerings and users can create their instance from the GUI console on their own. This is called database as a service.
Another issue with a commercial database like Oracle is its heaviness. It would be very costly for an Oracle database to support 1000s of connections, yet it would be easy for MySQL to do so. When you split a big monolithic app to 100 pieces, each piece might need 10 connections which are 1000 rather than 10 connections for the original monolith.
In-Memory Data Grid
Given the scalability and latency requirement of microservices, we have to heavily utilize In-Memory Data Grid or distributed cache. When event sourcing and CQRS are used, the query side service should be built on top of distributed cache if possible for read-heavy services. The cache is constantly updated by the events sent from the command side so we don’t need to worry about if the data in the cache is stalled or not. Given we have an event store which is the source of the truth, we don’t need to worry too much about if the distributed cache is lost or not. The aggregates can always be generated by replaying the events. The following products can be used based on the requirements:
If the distributed cache is not available, we can always put the cached data into a document database so that it can be loaded from a key quickly. With SSD becoming more popular, these days, the key lookup in document databases can be very fast.
One of the most popular document databases is MongoDB.
The document database is good at handling JSON documents, but sometimes, we need to handle some key-value mapping data directly. This requires a K/V database like RockDB.
Both the document and key/value databases support lookup from a key to a value. When it comes to the relationship between entities, they are not very good at it. Graph databases like Neo4j are more efficient for these use cases.
There are databases that support K/V, Document and Graph at the same time with one instance, as they are both getting more popular these days. ArangoDB and OrientDB are two examples.
Event-based frameworks light-eventuate-4j, light-tram-4j, and light-saga-4j require event or message store in a database that supports transaction log trailing in order to publish events to Kafka message broker. Currently, we have only tested MySQL and MariaDB and we have CDC server built to support the two databases. Oracle Streams is under consideration but there is more work to be done.
For metrics information, we need a time-serial database. Currently, we support InfluxDB and Prometheus. There are other databases like KairosDB and OpenTSDB we are planning to support in the future. KairosDB is based on Cassandra and it makes sense if you have Cassandra implemented already. OpenTSDB is based on Hadoop/HBase and it is a natural choice if Hadoop is already used in your organization.