-
The dimensional model is advocated by Ralph Kimball, a master in the field of data warehousing, to build a model based on the needs of analysis and decision-making, and the data model built serves the analysis needs, so it focuses on how to solve the analysis needs of users more quickly, and at the same time has better response performance for large-scale complex queries.
Dimensional modeling is the key to the success of a data warehouse business intelligence project, because no matter whether our data volume is from GB to TG or to PB, although the amount of data is getting larger and larger, but in order for data presentation to be successful, it must be built on the basis of simplicity, and dimensional modeling is always thinking about how to provide simplicity, driven by business, and aiming at user comprehension and query performance.
Dimensional modeling: Dimensional modeling is a method that is specifically applied to the modeling of analytical databases, data warehouses, and data marts. The datamart can be understood as a kind of "small data warehouse", and dimensional modeling guides us on how to build tables in the data warehouse.
Dimension modeling is divided into two types of tables: fact tables and dimension tables.
Fact table: Some data that must exist, such as collected log files and order tables, can be used as fact tables.
Feature: It is a collection of a bunch of primary keys, each of which corresponds to a record in the dimension table, which objectively exists and determines the data that needs to be used according to the subject.
Dimension table: A dimension is a quantity of data to be analyzed, and a dimension table is a table created from an appropriate perspective to analyze a problem: time, region, terminal, user, etc.
Three modes of dimensional modeling.
Star mode: The simplest and most commonly used one is the simplest and most commonly used one, centered on the fact table, and all dimension tables are directly connected to the fact table.
Please click Enter a description.
Snowflake mode: The dimension table of snowflake mode can have other dimension tables, which are not easy to maintain.
Please click Enter a description.
Constellation model: It is based on multiple fact tables and shares dimension information, i.e., some dimension tables can be shared between fact tables.
-
Dimensional modeling is a logical design approach that structures data by dividing the objective world into measures and contexts. Metrics are often numerical in the form of facts, surrounded by context, which is intuitively divided into independent logical blocks, called dimensions. It is very different from entity-relational modeling, which is an application-oriented design technique that follows the third paradigm and aims to eliminate data redundancy.
Dimensional modeling is an analysis-oriented, denormalized design technique that can increase data redundancy in order to improve query performance.
Refer to the article.
-
No redundancy** Convenience.
-
1. Collect business requirements and data implementation.
Before starting dimensional modeling, you need to understand the requirements of the business and the actual situation of the source data as the foundation. Identify requirements through communication with business representatives to understand their goals based on key performance indicators, competitive business issues, decision-making processes, and support for analyzing requirements. The actual situation of the data can be revealed by communicating with the development of the source data, and the feasibility of building high-level data analysis and accessing the data.
2. Discussion on collaborative dimension modeling.
The dimensional model should be designed through a series of high-level interactive discussions and work with business representatives.
3. 4-step dimensional design process.
1. Select the business process.
A business process is an operational activity that is done by an organization. Business process events establish or capture performance metrics and convert them into facts in a fact table. Most fact sheets focus on the outcome of a business process.
The choice of process is very important because the process defines the specific design goals and the definition of granularity, dimensions, facts. Each business process corresponds to a row of the enterprise data warehouse bus matrix.
2. Declare the granularity.
Declaring granularity is an important step in dimension design. The granularity must be declared before selecting a dimension or fact, because each candidate dimension or fact must be consistent with the defined granularity. Ensuring consistency across all dimensions of design is key to ensuring the performance and ease of use of BI applications.
Atomic granularity is the lowest level of granularity when fetching data from a given business process. It's best to start with atomic granularity, which can withstand unexpected user queries. For different fact table granularities, create different physical tables and do not mix multiple granularities in the same fact table.
3. Confirm the dimension of the environment.
The dimension revolves around the background of who, what, where, when, why, and how involved in a business Guobi Dacheng incident. Dimension tables contain the descriptive attributes that BI applications need to filter and categorize facts. A firm grasp of the granularity of the fact table allows you to distinguish between all possible dimensions.
When associated with a given fact table, the dimension table is guaranteed to have a unique value in any case.
4. Confirm the facts used for measurement.
Fact design comes from the measurement of business process events, which are basically expressed in quantitative values. There is a one-to-one relationship between a fact table row and a metric event described at the granularity of the fact table, so the fact table corresponds to a physically observable event. Within the table of facts, all facts are only allowed to be consistent with the granularity of life.
-
Dimensions and values in modeling are closely related. During modeling, it is common to use values to represent an attribute or feature, which can be any numeric type. Combining these values with other properties or features can form multidimensional arrays or data structures, which can be called a dimension.
Dimensions are used to describe the attributes or characteristics of a data element, while numeric values are the values corresponding to the attributes or features. In a dimension, an attribute or feature is discrete in a dimension, while a specific value is continuous. For example, for a student's score, the score is a numeric value, and the student dimension is a discrete attribute.
In modeling, the dimensions and values are usually searched one-to-one. A dimension of a data element corresponds to a numerical value, and a multidimensional array of these dimensions and values can form a variety of different data models. For example, in the star pattern in a data warehouse, dimensions typically refer to entities in the business domain, while numeric values are measures corresponding to entities.
In data analysis and mining, it is often necessary to conduct multi-dimensional analysis of data, and by analyzing the relationship between different dimensions and values, we can dig deep into the correlation and regularity between data, so as to better understand the data and make better decisions.
-
Hello, I'm glad to answer for you, the benefits of doing dimensions a) Dimension modeling is a standard framework that can be used. Allows database systems and end-user query tools to generate powerful assumptions about the data that primarily plays a role in performance and performance. ——The performance of the later Jikai data products is good, b) the star connection mode of the best framework, can tolerate the unpredictable changes in user behavior.
It's easy to switch queries with different dimensions c) It's very scalable to accommodate unpredictable new data sources and new design decisions. It's easy to add new analytical dimensions and facts without changing the granularity of the model, without having to overload data or recode to accommodate new changes. Better scalability means that all previous applications can continue to run without any different results.
Good scalability.
-
a) Definition and scope of the data warehouse project.
b) Project readiness assessment.
c) Business justification a) Business requirements collection.
b) Review of business requirements.
c) Data audit dimension modeling.
Dimensional modeling is a logical design technique that attempts to represent data using some kind of intuitive standard framework structure that allows for high-performance access. The dimensional model is a rapid delivery technique used to design databases that are delivered to end users.
Define the data warehouse bus structure.
a) Business-Driven Dimension Modeling.
b) Data warehouse bus structure matrix.
c) Consistency dimensions.
d) Consistency of facts.
Consistency dimensions and consistency facts are the "bus" of a data warehouse
e) Unit data marts.
Purchase order, shipment, payment.
** for a single transaction.
f) Multivariate data marts.
Customer profit margins, where a traditional source describing benefits must be used in conjunction with a traditional source describing costs.
** For multiple transactions.
You should start creating a data mart from a unit data mart.
g) Transactional mode data marts.
h) Periodic snapshot data mart.
i) Cumulative snapshot data mart.
Define a high-level data model logic diagram.
The design process of a dimensional model.
a) Select the business process.
b) Define granularity.
c) Selected dimensions.
d) Establishing the facts.
Source Data Target Data Mapping (ETL Rule Definition).
a) Dimension table mapping.
b) Fact table mapping.
Produce documentation. a) Data warehouse bus structure documentation.
b) High-level data model documentation.
c) Data model and ETL design documents Create a physical data model.
a) Select a data modeling tool.
b) Design of physical data structures.
Make an initial indexing plan.
a) Create an index for the fact table.
b) Create an index for the dimension table.
Design and create a database instance.
a) Save the database creation script and parameter file.
b) Create a physical storage structure.
Produce documentation. a) Data model design documentation.
b) Database Creation Script Documentation.
c) Database initialization script documentation.
d) Add the design content of the relevant part to the data model and ETL design document, and load the design of the dimension table.
Fact table load design.
Aggregate tables and multidimensional online analytics processing mounts.
Operations and automation of data warehouses.
Data warehouse operations are performed on a non-public basis on a regular basis.
Produce documentation. a) Project development documentation.
b) Add the design content of the relevant part of the data model and ETL design document to the data mart definition.
a) Dimension definitions.
b) Measure group definition.
c) Calculate member definitions.
Perspective definition. A subset of the cube is defined by grouping multiple measure groups together based on the needs of the user's application.
Produce documentation. a) OLAP Cube Project Development Documentation.
b) OLAP Cube Business Description Document.
-
Principle 1: Load detailed atomic data into the dimensional structure.
Dimensional modeling should be populated with the most basic atomic data to support unpredictable filtering and grouping requests from user queries, users often don't want to see a single record at a time, but you can't do what data the user wants to mask, what data they want to display, if there is only aggregated data, then you've set the usage patterns of the data, and when the user wants to dig deeper into the data, they run into obstacles. Of course, atomic data can also be supplemented by high-level dimensional modeling, but business users can't just work on aggregated data, they need raw data to answer ever-changing questions.
Principle 2: Build a dimensional model around the business process.
A business process is an activity performed by an organization, they represent measurable events, next to an order or a settlement, business processes usually capture or generate unique performance indicators related to an event, after these data are converted into facts, each business process is represented by an atomic fact table, in addition to a single process fact table, sometimes from multiple process fact tables merged into a single fact table, and the merged fact table is a good supplement to a single process fact table, and cannot replace them.
Principle 3: Make sure that each fact table has a date dimension table associated with itThe measurable events described in Principle 2 always have a datestamp information, and each fact table has at least one foreign key, which is associated with a date dimension table, and its granularity is one day, using calendar attributes and non-standard features about measuring event dates, such as financial month and company holiday indicators, and sometimes multiple date foreign keys in a fact table.
If there is a fire in the forest, how many firefighters will be sent to fight the fire? If there are a large number of people, the fire loss will be small but the rescue cost may be large, and vice versa, the fire loss will be large and the rescue cost may be small. >>>More
1: Click Customize.
5: Then click on the module management to remove all the modules above the "interaction", and then hit the modules that need to be displayed, so that the "log", "message", "personal image" or "**favorite" are displayed on the top of the big picture module. OK to save. >>>More
If you want to learn systematically, you can consider signing up for a live online class, and recommend CGWANG's online class. The teacher speaks carefully, you can watch it back after the class, and there are also the same type of recorded classes that you can learn for free (give away lifelong VIP). >>>More
If you want to learn systematically, you can consider signing up for a live online class, and recommend CGWANG's online class. The teacher speaks carefully, you can watch it back after the class, and there are also the same type of recorded classes that you can learn for free (give away lifelong VIP). >>>More
The very demanding lover is because after I was with you, I began to become dependent on you, my mind was full of you, and that independent individual began to be inseparable from you, hoping that you could love me more, don't hurt me, and don't let me suffer from gains and losses. Many times I ask of you because I have high expectations of you, and you have a high position in my heart, and I hope I will do the same in your heart. Just like I can tolerate a friend stepping on my shoes, but if you accidentally step on me, I will act unhappy and even a little grumpy, I hope you can protect me, you can coax me, ask the baby if it's okay and so on.