Author | Wu Xia, Senior Engineer, Tencent Cloud TDSQL
As the relevant national departments have successively issued relevant policy guidance documents in recent years to promote the exploration of safe and controllable financial technology products and strengthen the construction of banking information security, many domestic financial government and corporate institutions have begun to explore the transformation of the original IT system and the localized database The demand for is becoming stronger. Tencent's self-developed financial-grade distributed database TDSQL has more than 600 financial government and corporate users recently. As a financial-grade domestic database, TDSQL not only fully meets the country's requirements for financial security and control, but also solves the problems that traditional financial databases could only be solved by purchasing high-end equipment or piling up resources in the past.
Why financial institutions
Choose a localized database?
In addition to policy guidance, a very important reason why financial institutions have chosen localized databases is "reducing costs and increasing efficiency." Traditional centralized databases are costly, difficult to expand, and rely on resource piling to ensure database availability and scalability. This solution is facing increasing pressure. The safe and controllable domestic distributed database has the characteristics of high scalability, high performance, high availability, etc., which can well meet the needs of online, high frequency, multi-dimensional, and high concurrency scenarios, and help financial institutions solve technical bottlenecks. . In terms of cost, domestic distributed databases have obvious advantages. Take TDSQL as an example. At the hardware level, it fully adopts x86 servers to replace the mainframes and minicomputers required by traditional commercial databases, and the cost drops sharply:
How about financial data
Smooth migration to a localized database?
In financial business scenarios, database migration and upgrade, data distribution, and data backup are essential basic functions of the database system. Among them, database migration and upgrade and data distribution are important foundations for data decoupling and aggregation. For example: a common general and sub-system architecture in the insurance industry, multiple sub-libraries need to synchronize business data to the general database in real time for summary and query; in the core transaction system of the bank , It is necessary to synchronize the transaction data to the analysis subsystem in real time for reporting, running batches and other salesman operations. Data backup is the cornerstone of data security and the lifeline of financial business data. As a financial-grade database product, TDSQL has deposited the TDSQL-MULTISRCSYNC solution for database data migration, distribution, synchronization and backup at the technical level, providing high availability and high reliability for the localized transformation of database systems Useful reference.
1. TDSQL multi-source synchronization: support
TDSQL and MySQL, Oracle, PostgreSQL
Data synchronization between platforms
Distributed database TDSQL is a distributed database product created by Tencent. It has strong consistency, high availability, global deployment architecture, distributed horizontal expansion, high performance, enterprise-level security and other features. It also provides intelligent DBA, automated operations, and monitoring alarms. And other supporting facilities. TDSQL multi-source synchronization (TDSQL-MULTISRCSYNC) is a high-performance, high-consistent data distribution service that supports multiple heterogeneous data platforms. The service supports real-time synchronization and distribution of data with TDSQL as the source to platforms such as MySQL, Oracle, PostgreSQL, and message queues. It also supports TDSQL as the target to synchronize MySQL or Oracle data to TDSQL in real time, with flexible deployment and support One-to-many, many-to-one and other multiple replication topologies. The typical log-based DCD replication technology of the multi-source synchronization module has the following system architecture:
As you can see from the figure above, the entire system can be roughly divided into three parts: producer, store, and consumer.
producer: Incremental log acquisition module, which is mainly responsible for parsing and acquiring the incremental data change log of the source, parsing and encapsulating the acquired log into a JSON protocol message body, and sending it to the message queue. When the source is MySQL or TDSQL, the incremental log obtained is a binlog event. Here, the binlog must be in row format and full-image. When the source is Oracle, the producer obtains incremental data from Oracle's materialized view log, encapsulates and delivers it. Here the producer adopts the at-least-once mode when producing messages to the message queue, that is, to ensure that there is at least one copy of a specific message queue, and it does not rule out that there are duplicate messages in the queue.
store: In the message queue, because the database system log has sequential requirements, the number of partitions for all topics here is 1 to ensure that they can be consumed in order.
consumer: Log consumption and replay module, responsible for consuming CDC messages from the message queue and replaying them to the target instance according to the configuration. Here, because the producer uses the at-least-once mode for production, the consumer implements idempotent logic here to ensure correct data replay.
2. TDSQL high performance, high consistency, high availability
High performance guarantee: row-based hash concurrency strategy
In financial business scenarios, the real-time performance of data is often higher, so higher requirements are put forward for the performance of data synchronization. In order to solve this problem, the consumer adopts a row-based hash concurrency strategy to achieve parallel replay. The following uses binlog messages as an example to illustrate the implementation of this strategy. When the database system records the binlog, it writes the row changes into the binlog file according to the commit order of the transaction. Therefore, the serial replay is performed in the order of the events recorded in the binlog file, and the source and target database instance states must be consistent. However, because of the slow speed of serial replay, when encountering large transactions such as batch update, it is easy to produce a large synchronization delay, and it cannot adapt to the synchronization scene with high real-time data. In order to improve concurrency, the consumer here hashes according to the table name and primary key value of each row record, and sends the message to the corresponding synchronization thread according to the hash value. Will such out-of-order replay cause data inconsistency? The answer is no, because although the sequence of messages is disrupted, all operations on the same line are ordered in the same thread. Therefore, as long as the sequence of changes in each line is correct, the final data is Will be consistent. This process is shown in the figure below:
Consistency guarantee: row format
Idempotent fault tolerance of binlog events
There are two motivations for implementing idempotent logic:
1. Because the producer implements the at-lease-once mode for message production, the consumer must be able to deal with the problem of message duplication here.
2. After supporting idempotent logic, it is convenient for data restoration, and there is no need to record mirror points during data synchronization, which is convenient for operation and maintenance. The design principle of idempotent logic here is to ensure that the target instance is modified according to the intention of the binlog event. Such as the insert event, the intention is to have a record identified by the new value in the database; the intention of the update event is that there is no record identified by the old value in the database, only the record identified by the new value; the same is true for the delete operation, and the result is the requirement The target database does not contain records identified by the old value. Therefore, for insert, update, and delete operations, the idempotent logic is as follows:
As can be seen from the above figure, when a primary key conflict occurs, the insert operation will be transformed into a delete+insert operation to ensure that the insert action is executed successfully. In addition, the number of affected rows in the figure is less than 0 or equal to 0 to indicate SQL execution errors and primary key conflicts. update
As you can see from the above figure, the idempotent processing of the update operation actually ensures that there can only be records generated by the new value in the database.
In this process, after the delete is over, it will succeed if it is greater than 0; if it is less than 0, it will fail; when it is equal to 0, it is considered that it may not match the row. At this time, I will operate according to the primary key-because the final result of the delete operation is that the target must be gone. The row identified by the primary key of the currently deleted message-after this operation is completed, there must be no data in this row in the DB, so just delete it according to the primary key. At this time, if the number of affected rows is greater than 0, the deletion is successful. If it is equal to 0, it is considered to match according to the primary key, and it cannot be deleted or matched-meaning that the target itself does not have the data identified by the primary key to be deleted-so in fact its result is to be deleted As a result, the effect is the same, and the idempotence of this deletion is ended.
High availability guarantee: multi-machine disaster recovery protection
This set of synchronization services must be highly available, which is reflected in two aspects:
1. In the event of a disaster, the services of its own consumers can promptly sense and automatically migrate and switch if the machine has some unrecoverable failures;
2. To deal with its own conventional expansion-vertical expansion or horizontal expansion scalability requirements, this is also more emphasized, this set of synchronization services must be compatible with various disaster situations and various requirements in conventional operation and maintenance scenarios To achieve high service availability. Focusing on the scenario of database migration and synchronization, TDSQL multi-source synchronization provides a multi-machine disaster recovery protection mechanism: At the level of high-availability protection for consumers, on the one hand, the consumer service itself is stateless, and all tasks are issued through MetaCluster. Synchronous services are deployed on a machine. This disaster recovery mechanism is implemented through the manager process. That is to say, when the entire machine is powered off, the consumer operating this machine is no longer alive. At this time, the survival of these consumers on the MetaCluster node will fail. Perceived by the manager nodes of other machines-thinking that the consumers of other machines are no longer alive, at this time they will take over the tasks and re-pull these services on their own machines. The second is to achieve the same data synchronization link cannot be pulled up on two machines at the same time, which is a mutually exclusive requirement. The high-availability mechanism will be achieved through some unique identifiers or current dispatch nodes. When the same data synchronization task is pulled up, it must occur on different machines to achieve drift and mutual exclusion operations. This multi-machine disaster recovery The protection is generally done through the process of MetaCluster and monitoring, for example, services such as manager are coordinated to ensure that these tasks can be successfully migrated to other surviving nodes within ten seconds in the case of machine-level disasters or other disasters.
3. Many large financial institutions
Migration and replacement of domestic databases has been realized
The TDSQL multi-source synchronization solution has helped many financial, government, and corporate organizations to efficiently and steadily realize the migration and replacement of domestic databases in the past few years.
Real-time database summary of insurance policy information
Users can transform their services in a distributed manner through multi-source synchronization, and directly migrate a single instance to a distributed architecture through real-time synchronization. For example, insurance customers use multiple sub-databases, multiple shards, or multiple single business, logical divisions, and synchronize these data to the full database through the TDSQL set of services. Taking a large Internet insurance company as an example, based on the TDSQL multi-source synchronous migration solution, a many-to-one topology is realized, which can very efficiently and steadily aggregate the company's multiple sets of business data into a full database, and then Realize report analysis and extraction of overall data.
Achieve synchronization between database instances,
Build a data warehouse
A large consumer finance company, after replacing the core database with TDSQL, used TDSQL's multi-source synchronization service to synchronize between different database instances. This operation brings about two major business drivers: the first is risk control. By sending incremental data to the downstream message queue, the accuracy and efficiency of smart risk control can be improved; second, it is realized. In the data warehouse, users use the multi-source synchronization system to continuously and real-time input the data in the business real-time production database into the data warehouse, and then realize OLAP-type business and provide intelligent decision-making analysis support for the business.
Real-time backup of heterogeneous database of core transaction system
In 2019, Zhangjiagang Bank's new-generation core business system based on TDSQL became the focus of industry attention after it was successfully launched. After migrating the traditional core system database of the bank to the Tencent Cloud TDSQL database, the generation difference level of cost reduction and efficiency increase was realized. In this kind of financial-level highly sensitive business system migration practice, TDSQL's performance and secure migration service strategy have been well verified. In the practice of Zhangjiagang Bank’s database migration, the core transaction cluster is TDSQL. The TDSQL multi-source synchronization solution writes the stock and incremental data to the backup computer room through the internal LAN. At the same time, it also guarantees the data source, The goal is to perform risk control in complete agreement. When the core transaction system encounters some unrecoverable disasters with a small probability, the system can switch all transaction services to Oracle in the backup computer room in a short time, as a safe solution for the bank's traditional core system database migration, and finally ensure the smooth database. migrate.
Database migration business cutover
Database migration involves a large amount of core data information, and "fast" and "stable" are indispensable. Multi-source synchronization service is a built-in feature of TDSQL. Take the migration case of a provincial broadcasting and television bureau as an example. TDSQL multi-source synchronization migration service re-deploys the business system migration method, from migration preparation, migration evaluation, plan design, resource preparation and database transformation. , Migration implementation, and result verification are only used for a total of 30 days. The most critical resource preparation and database transformation took 7 days! Migrating the customer's business system database from Oracle to TDSQL, the performance of TDSQL meets the existing business pressure faced by the customer. The data integrity guarantee during the business system migration process provides a good data foundation for the subsequent operation and maintenance of the new business system.
Cross-city and cross-data center disaster recovery
Take an Internet life insurance company as an example. The user’s instance on the public cloud TDSQL is its core production environment. A TDSQL system is also deployed under the cloud. Through multi-source synchronization of this system, the user realizes the cloud The production environment and the production environment under the cloud are synchronized with disaster recovery, which is equivalent to realizing the cross-city and cross-data center data disaster recovery function.
Based on the "three highs" characteristics of high consistency, high performance, and high availability, TDSQL multi-source synchronization helps users achieve financial level or financial scene data external decoupling, data distribution, migration, synchronization and other capabilities, and through these capabilities, Integrated into the digital upgrade of user services. As the core module of the TDSQL product service system, TDSQL multi-source synchronization is not only a key bridge-like function, but also a service that helps derive business value. In the localization of databases, it serves from distributed transformation, migration, backup to subsequent synchronization and distribution, etc. Users migrate to the entire process of production and production operations. This is also the product advantage of TDSQL after years of practice.