Data Replication Strategies
This blog will try to explain about the different approaches which can be used for the data replication from Master DB (Maria DB) to the salve Maria DB.
Replication is having a number of benefits, some of them includes:
- Scalability: By having one or more slave servers, reads can be spread over multiple servers, reducing the load on the master. The most common scenario for a high-read, low-write environment is to have one master, where all the writes occur, replicating to multiple slaves, which handle most of the reads.
- Low hardware : Having a slave system db can provide the freedom to play with that data without affecting master database. We can have all the reporting etc done on this particular database. Hence, having a very high hardware requirement on the master database can be avoided.
- Data analysis: Analyzing data may have too much of an impact on a master server, and this can similarly be handled on a slave server, while the master continues unaffected by the extra load.
- Backup assistance: Backups can more easily be run if a server is not actively changing the data. A common scenario is to replicate the data to slave, which is then disconnected from the master with the data in a stable state. Backup is then performed from this server.
- Distribution of data: Instead of being connected to a remote master, it’s possible to replicate the data locally and work from this data instead.
Approach 1 : Maria DB to Maria DB replication using Master slave configuration.
Replication is a feature allowing the contents of one or more master servers to be mirrored on one or more slave servers.
We can exert control over which data to replicate. All databases, one or more databases, or tables within a database can all be selectively replicated as well.
The main mechanism used in replication is the binary log. If binary logging is enabled, all updates to the database (data manipulation and data definition) are written into the binary log as binlog events. Slaves read the binary log from each master in order to access the data to replicate.
In this particular approach, masters and slaves do not need to be in constant communication with each other. We can take the servers offline or disconnect from the network, and when they come back, replication will continue where it left off. The slave in this case will be in access read only mode, the data copying from Master to slave will be on real time without any lag. Hence, on the BI and reporting front, we can have real time BI capabilities.
Approach 2 : Usage of Third party tools
The second approach which can be used for data replication from Master DB to Slave DB is by the usage of DB replication third party tools. One of the example which could be cited here is a tool like Tungsten. Tungsten is an open source third party data replicator tool. It not only works on MySQL but also on Maria DB. Tungsten can help solve problems like promoting masters easily from pools of slaves, replicating data between different database versions, replicating efficiently across sites, building complex topologies, and parallelizing data flow between servers. Tungsten Replicator runs equally well in cloud as well as locally hosted environments. Tungsten Replicator users range from tiny start-ups to the largest web properties on the planet.
Approach 3: using ETL. CDC implement, free version
The third approach which could be followed is to use ETL. We are having open source ETL options which includes Talend Open Studio or Kettle – Pentaho Data Integrator.
For having the master slave databases configuration implemented, we would be implementing CDC (change data capture). By CDC we can identify, capture and deliver only the changes which are made to the master database. Frequency of the same can be set. Change Data Capture (CDC) is based on Publisher/Subscriber principle. The publisher captures the change data and makes it available to the subscribers. Implementing CDC will also help in reducing the workload on ETL since we would be sharing the information of only the changed data.
The advantages of implementing via ETL is security can be implemented like data encryption, while copying data we can have options like we can select which all tables data we want to replicate, how much amount of data we can to replicate, rules can be implemented while replicating etc.