Query optimization of distributed databases, characteristics of distributed databases

5 answers

Anonymous users2024-02-06

Refers to the implementation algorithm that selects the query execution plan and relational operator when executing a distributed query. According to the different system environments, the algorithms used in query optimization are also different, which are usually divided into long-distance WAN environment and high-speed LAN environment, and the difference is mainly in the bandwidth of the network. For unary operators, query optimization methods in a centralized database can be employed.

In the case of binary operators, the cost of communication must be considered because of the data transfer involved between sites. Common join operation execution strategies in distributed queries include:

1) Semi-connection method: The conversion method r s=(r s) s using the semi-connection operation. Assuming that there are relations R and S on Site 1 and Site 2 respectively, first the projection on the connection attribute is executed on S and the result is transmitted to Site 1, the connection operation between Relation R and the projection is performed on Site 1, and then the result is transmitted to Site 2 and the relationship S is performed.

This approach reduces the cost of network communication when performing connection operations, and is mainly suitable for long-range WANs with low bandwidth.

2) Enumeration method: refers to the method of enumerating the physical execution plan of the relative operator, and selecting the execution algorithm by comparing the cost of the execution plan. Among them, the physical execution plan of the join operator includes the nested loop method, the hash join method, and the merge join method.

The enumeration method is mainly applicable to high-speed LAN environments where disk IO costs are the mainstay.
Anonymous users2024-02-05

In the case of hardware standards, query optimization can be carried out by optimizing SQL, and TIDB can currently achieve the purpose of increasing computing power by horizontally expanding TIKV nodes

Horizontal elastic scaling.

By simply adding new nodes, you can achieve horizontal expansion of tidb, and expand throughput or storage on demand, easily coping with high concurrency and massive data scenarios.
Anonymous users2024-02-04

Characteristics of Distributed Databases:

1. Independent transparency.

Data independence is one of the main goals pursued by the database approach, and distribution transparency means that users do not have to care about the logical partitioning of data, the details of the physical location distribution of data, the consistency of duplicate copies (redundant data), and the data model supported by the database on a local site.

The advantages of distribution transparency are obvious. With distributed transparency, the user's application is written as if the data were not distributed. When Mubi data is moved from one site to another, there is no need to rewrite the application.

When adding duplicate copies of some data, you don't have to rewrite the application quickly. The information about the distribution of data is stored by the system in a data dictionary. The user's request for access to non-local data is interpreted, transformed, and transmitted by the system according to the data dictionary.

2. Replication transparency.

Users do not need to worry about the replication of the database at each node in the network, and the system automatically completes the update of the replicated data. In the distributed database system, the data of one site can be copied to other sites for storage, and the application can use the data copied to the local area to complete the distributed operation locally, avoiding the transmission of data through the network, and improving the operation and query efficiency of the system.

However, for the update operation of copied data, it involves updating all copied data.

3. Easy scalability.

In most network environments, a single database server will eventually be insufficient. If the server software supports transparent horizontal scaling, then multiple servers can be added to further distribute data and share processing tasks.

Key advantages: 1) Has a flexible architecture.

2) Adapt to distributed management and control institutions.

3) Superior economic performance.

4) The system has high reliability and good availability.

5) Fast response time for local application.

6) Good scalability, easy to integrate with existing systems.
Anonymous users2024-02-03

A distributed database is a logical database whose physical database is geographically distributed in a computer network of multiple database management systems, which constitute a distributed database management system.

In a distributed database management system, the user on each computer accesses the database without feeling that the leaky data he uses is not physically stored in his own computer, but is transmitted by the distributed database system from other machines over the network.

As a result, each user sees a unified conceptual pattern.

The main characteristics of the distributed database system are: (1) it has high reliability, and when one machine in the system fails, it will not lead to the destruction of the whole system.

When the fault is solved, the distributed database system can recover the database during the failure and modify the segment.

2) Disperse the workload so that a large number of processes are evenly shared.

3) It is convenient to realize the expansion of the system.

The distributed database system is the product of the combination of computer communication and database technology, and is one of the most representative development directions of database technology.
Anonymous users2024-02-02

Distributed Database System (DDBS) consists of Distributed Database Management System (DDBMS) and Distributed Database (DDB). In a distributed database system, an application can transparently manipulate the database, and the data in the database is stored in different local databases, managed by different DBMS, run on different machines, supported by different operating systems, and connected by different communication networks.

A distributed database is logically a unified whole, but physically stored on different physical nodes. An application can access geographically distributed databases through a network connection. Its distribution is manifested in the fact that the data in the database is not stored in the same site.

More precisely, not stored on a storage device on the same computer. That's the difference from a centralized database. From the user's point of view, a distributed database system is logically the same as a centralized database system, where users can execute global applications at any site.

It's as if the data is stored on the same computer, managed by a single database management system (DBMS), and the user doesn't feel any different.

The distributed database system is developed on the basis of the centralized database system, which is the product of the combination of computer technology and network technology. The distributed database system is suitable for departments with scattered units, allowing each department to store its commonly used data locally and implement local storage for local use, so as to improve response speed and reduce communication costs. Distributed database systems are scalable compared to centralized database systems, improving the reliability of the system by adding appropriate data redundancy.

In a centralized database, minimizing redundancy is one of the goals of the system The reason is that redundant data wastes storage space, and it is easy to cause inconsistency between replicas In order to ensure the consistency of data, the system has to pay a certain maintenance cost The goal of reducing redundancy is achieved by data sharing. However, in the distributed database, it is hoped to increase redundant data and store multiple copies of the same data in different sites, the reasons are: Improve the reliability and availability of the systemWhen a site fails, the system can operate the same copy on another site, and the whole system will not be paralyzed due to one failure.

Improve system performanceThe system can operate according to the closest copy of data to the user, reducing the communication cost and improving the performance of the entire system.

Related questions

Which is better, a simple database, or an easy-to-use database?

9 answers2024-02-27

Create a database.

Select the Program Management SQL Server 2008 SQL Server Management Studio command in the Start menu to open the SQL Server Management Studio window and establish a connection using Windows or SQL Server authentication. >>>More

Database construction problems? Questions about databases

10 answers2024-02-27

The database is generally composed of many tables, for example, the school builds a database, which can build a student table (including the student's name, age, student number, class, date of birth), a teacher table (including the teacher's name, age, teaching class, teaching category), a report sheet (including the student's student number, and the grades of each subject), etc. These are the ways in which the files are stored in the database, and try to make sure that the items in a table are closely related and have the same attributes, if this condition cannot be met, a table must be built (the redundancy of the built table has 4 levels). In order to meet the user's query needs, we also need to make a number of views, for example, you can make a view of his items have the name of the student, age, grades of each subject, and the teacher of each subject, etc., we can also export new items according to the existing items, for the purpose of simplicity, for example, the view can add an average grade, we add different permissions and roles to each view, and provide it to different people to query to protect the security of the database. >>>More

What is the origin of the distributed operating system

7 answers2024-02-27

Such a professional question does not add ...... points

I'll give you an original answer. >>>More

( ) issues with left and right connections to the database

11 answers2024-02-27

select ,from emp e left jion dept don ;

Left Company. The left link is, the left side prevails, and the right side is useless to make up for it. >>>More

The Problem with MySQL Master-Slave Databases Who triggers whom?

5 answers2024-02-27

It seems that there is something called triggers in the DB, and there seems to be another thing called transactions.