Introduction
- A secondary server taking over quickly while primary server fails to serve is called High Availability.
- Several database machines serving same data is called as Load Balancing.
This synchronization problem is solved by many solutions:
One Server to modify data & other servers to perform read operations
Using one server (Master or Primary server) as write only and other servers (Standby or Secondary server) as read only servers which track changes in the primary server.
- Primary or Master server
- Secondary or Standby server
- Warm standyby server (cannot be connected until it is promoted to a master server)
- Hot standyby server (serves read only queries)
Data modification is not considered as committed until all the servers have committed the transaction
This solution guarantees:
- fail over (There is no loss of data)
- all load balanced servers will retrieve consistent data no matter whatever the serve is queried
Comparison of different solutions
Shared Disk Failover (sharing the hardware functionality)
Uses single disk array that is shared by multiple servers. As soon as primary server server fails, secondary server will be able to mount and start.
Is advantageous as there:
Uses single disk array that is shared by multiple servers. As soon as primary server server fails, secondary server will be able to mount and start.
Is advantageous as there:
- is no overhead of synchronization of the primary server with the secondary server.
- and allows rapid failover with no data loss.
Also has some serious limitations since:
- If shared disk array fails or corrupts all the servers becomes non-functional.
- and standby server should never access the shared storage when primary server is accessing
File System Replication (file system replication Eg: DRBD)
has restriction that, writes to standby server must be done in the same order of the data changes in primary server to ensure consistency to the standby server.
Write-Ahead Log shipping
Synchronously or Asynchronously warm and hot standby servers are kept updated by reading stream of WAL records and is achieved by
either:
- File based log shipping
- Streaming replication
- or combination of both
This is done on entire database
Logical Replication
Logical replication constructs a stream of logical data modifications (at table level is also possible) from the WAL of the server and sends to the other servers (can be from master to standby or standby to master).
Trigger-Based Master-Standyby Replication (Updates standby servers asynchronously in batches) Eg: Slony-I
This is 2 step process
Asynchronously sends all the data modification queries to the standby servers as a batch with help of which standby servers can answer read-only queries while the primary server is running. This case may lead to possible data loss during failover.
Statement-Based Replication Middleware (Each server operated independently)
A program intercepts every SQL Query and broadcasts the same to all the servers to execute. This can cause data inconsistency if the queries contains functions like CURRENT_TIMESTAMP(), random(), sequences which can have different value on different servers etc. Care must be taken that transaction either commits or rollbacks on all the servers, perhaps using the 2 phase commit (PREPARE TRANSACTION and COMMIT TRANSACTION)
This is 2 step process
- Master-Standby replication setup server asynchronously sends all the modification queries to the master server.
- Master server asynchronously sends the data to the standby server.
Asynchronously sends all the data modification queries to the standby servers as a batch with help of which standby servers can answer read-only queries while the primary server is running. This case may lead to possible data loss during failover.
Statement-Based Replication Middleware (Each server operated independently)
A program intercepts every SQL Query and broadcasts the same to all the servers to execute. This can cause data inconsistency if the queries contains functions like CURRENT_TIMESTAMP(), random(), sequences which can have different value on different servers etc. Care must be taken that transaction either commits or rollbacks on all the servers, perhaps using the 2 phase commit (PREPARE TRANSACTION and COMMIT TRANSACTION)
- One solution for this would be that middleware should query server dependent functions from the one server and make to use same in all other servers.
Asynchronous Multimaster Replication
Each server works independently and periodically communicates each other and resolve the conflicting transactions. These conflicting transactions can be resolved by user or using some rules like Bucardo.
Synchronous Multimaster Replicatiom
Each server can accept the write requests and modified data is sent to all other servers. Since, data is sent to all other servers, we won't face any issue with non-deterministic functions like random() etc. Heavy write activity may lead to excessive locking leading to the poor performance. However, read operations take advantage since we need not partition the workload and the any server can serve the read request. The problem with write overhead is reduced by using shared disk.
This replication should be implemented in the application using 2 phase commit transactions since, PostgreSQL doesn't implicit;y provide us the functionality.
References
https://www.postgresql.org/docs/current/static/high-availability.html
No comments:
Post a Comment