-
Four characteristics of big data.
1. Massiveness.
For example, a recent IDC report** states that the world's data volume will expand by a factor of 50 by 2020. Currently, the scale of big data is an ever-changing indicator, with the size of a single dataset ranging from tens of terabytes to several petabytes. In short, storing 1 petabyte of data would require twenty thousand PCs with 50GB hard drives.
In addition, all sorts of unexpected ** can generate data.
2. Diversity.
The increase in data diversity is mainly due to new types of multi-structured data, as well as data types including web logs, social networks, internet searches, cell phone call logs, and sensor networks.
3. High speed.
High velocity describes the speed at which data is created and moved. In the era of high-speed networks, it has become popular to create real-time data streams based on high-speed computer processors and servers that are optimized for software performance. Not only do businesses need to know how to create data quickly, they must also know how to quickly process, analyze, and return it to users to meet their real-time needs.
4. Volatility.
Big data has a multi-layered structure, which means that big data can take on a variety of forms and types. Compared with traditional business data, big data is irregular and ambiguous, which makes it difficult or even impossible to use traditional application software for analysis. Traditional business data has evolved over time to have a standard format that can be recognized by standard business intelligence software.
Today's challenge is to process and extract value from complex data presented in all its forms.
-
Clause. 1. Descriptive thinking.
That is, to turn some structured data or unstructured data into objective standards, in the process of big data thinking, involving a lot of human factors, these can also be analyzed for data, for example, the study of consumer behavior, consumer behavior can be quantitative, or non-quantitative, descriptive thinking should include all aspects of consumer behavior. Here is an example of shopping malls will continue to collect data from customers connected to the LAN to understand the consumption and distribution of customers, consumers can achieve shopping, dining, leisure, entertainment one-stop service, and can also greatly improve the user's experience. In some large-scale scenic spots or amusement parks, big data can help scenic spots for better visitor management.
Clause. 2. Relevance thinking.
It is the study of the correlation between data, the study of consumer behavior or user behavior, these behaviors to a certain extent, large and small and other different data are intrinsically related, the results of big data analysis can better establish a model of data, which can be used to study consumer preferences and behaviors, correlation research and have can also better support thinking, for example, in the modern logistics industry, according to consumers' purchase behavior or purchase habits, The next purchase behavior, such as route and evaluation, will now store some goods in separate warehouses, and after placing an order on the consumer network, it can be delivered in place as soon as possible, which greatly improves the user's experience. As well as an important product recommendation function of e-commerce, it is also inseparable from the relevance thinking of big data, we often receive similar recommendation functions after browsing the page or after shopping, although not 100% will buy, but the recommendation is still effective.
Clause. 3. Strategic thinking.
After the big data continues to be analyzed and analyzed, enterprises can adjust their marketing strategies according to the results of big data analysis, which is the main purpose of big data marketing.
-
The industry usually uses the four Vs (i.e., volume, variety, value, and velocity) to summarize the characteristics of big data.
The first is the huge volume of data. Up to now, the amount of data for all printed materials produced by humans is 200 petabytes (1 petab = 210 terabytes), while the amount of data for all the things that all humans have said throughout history is about 5 ebytes (1eb = 210 petabytes). Currently, a typical personal computer hard disk.
The capacity of the company is terabytes, and the data volume of some large enterprises is close to the exabyte level.
The second is the data type.
Variety. This type of diversity also allows data to be divided into structured and unstructured.
There is an increasing number of unstructured data, including web logs, compared to structured data, which was mainly text-based, which was easy to store.
The third is low value density. The value density is inversely proportional to the size of the total amount of data. For example, in a 1-hour project, in continuous and uninterrupted monitoring, the useful data may only be one or two seconds.
How to complete the value "purification" of data more quickly through powerful machine algorithms has become an urgent problem to be solved in the context of big data.
Fourth, the processing speed is fast (velocity). This is what distinguishes big data from traditional data mining.
The most striking features. According to IDC's Digital Universe report, global data usage is expected to reach by 2020. In the face of such a large amount of data, the efficiency of processing data is the life of an enterprise.
-
Summary. First, there is no essential difference between the two in terms of analytical methods.
The core work of data analysis is the analysis, thinking and interpretation of data indicators, and the amount of data that the human brain can carry is extremely limited. Therefore, whether it is "traditional data analysis" or "big data analysis", it is necessary to statistically process the raw data according to the analysis idea to obtain summary statistical results for analysis. The two are similar in this process, the difference is only the difference in the processing method caused by the size of the original data.
Second, there is a big difference in the focus of the use of statistical knowledge.
Traditional data analysis "uses knowledge that revolves around the question of whether the real world can be extrapolated from a small sample of data." "Big data analysis" mainly uses various types of full data (not sampled data) to design statistical schemes and obtain detailed and confident statistical conclusions.
Third, there is an essential difference between the two in terms of their relationship with machine learning models.
Traditional data analysis: "Most of the time, knowledge uses machine learning models as black-box tools to assist in analyzing data. "Big data analysis" is more often a close combination of the two, and big data analysis produces not only an analysis effect evaluation, but also a follow-up.
One difference between big data analytics and traditional data analysis is that the diversity of data is mainly manifested in:
First, there is no essential difference between the two in terms of analytical methods. The core work of data analysis is the analysis, thinking and interpretation of data indicators, and the amount of data that the human brain can carry is extremely limited. Therefore, whether it is "traditional data analysis" or "big data analysis", it is necessary to statistically process the original data according to the analysis idea to obtain a summary of the results for analysis.
The two are similar in this process, the difference is only the difference in the processing method caused by the size of the original data. Second, there is a big difference in the focus of the use of statistical knowledge. Traditional data analysis "uses knowledge that revolves around the question of whether the real world can be extrapolated from a small sample of data."
"Big data analysis" mainly uses various types of full data (not sampled data) to design statistical schemes and obtain detailed and confident statistical conclusions. Third, there is an essential difference between the two in terms of their relationship with machine learning models. In "traditional data analysis", most of the time, knowledge uses machine learning models as a black-box tool to assist in masking and analyzing data.
Diversity is manifested in structured data, semi-structured data, unstructured data, and diverse data.
-
This data diversity includes the following:2. Data formats: Big data solutions need to support a variety of data formats, including structured data, semi-structured data, and unstructured data.
-
The main characteristics of big data are as follows:
1. Large volume: The most significant feature of big data is the huge amount of data. With the development of information technology, various sensors, devices, and Internet applications have generated massive amounts of data, including structured data (such as database records) and unstructured data (such as text, images, audio, and **, etc.).
2. Fast speed: The generation and flow speed of big data is very fast. Data spine source macros are generated and transmitted at a high rate and need to be processed and analyzed in real time or near real time.
3. Diversity: Big data contains a variety of types and formats of data. In addition to traditional structured data, it also includes unstructured and semi-structured data, such as text, images, **, audio, logs, geolocation data, etc.
These diverse data types provide richer information and a more comprehensive analytical perspective.
4. Authenticity: Big data is often real-time data obtained from the real world, which is authentic and real-time. This data comes from a variety of sources, including social, sensor, transactional, and more, and reflects real behaviors, opinions, and events.
5. Low value density: There is a lot of noise, redundancy, and useless information in big data. Compared with traditional data, big data has a lower value density and requires effective data cleaning, processing, and analysis to extract meaningful and valuable information. Cracks.
6. Complexity: Big data often has a high degree of complexity, involving multi-dimensional data, multi-variable relationships and complex data structures. Processing and analyzing big data requires the use of complex algorithms, tools, and techniques in areas such as statistics, machine learning, data mining, and more.
The role of big data
1. Improve decision-making and strategy: Big data can provide comprehensive, accurate, and real-time information to help businesses and organizations make more informed decisions and develop more effective strategies. Through the analysis of big data, hidden patterns, trends, and associations can be discovered, and insights into market demand, consumer behavior, and competitive dynamics can be gained to guide business development and resource allocation.
2. Improve the quality of products and services: Big data can help enterprises understand customer needs, preferences and feedback, so as to improve product design, development and marketing strategies. By analyzing user data, usage behavior, and feedback, we can optimize product features, enhance user experience, and provide customized products and services based on individual needs.
-
What is Big Data? In fact, it is very simple, big data is actually a huge amount of data, these huge data ** data generated at any time around the world, in the era of big data, any small data may produce incredible value. Big data has 4 characteristics, which are different:
Volume, variety, velocity, and value are generally called 4V.
The so-called 4V specifically refers to the following four points:
1 Mass. The characteristics of big data are first embodied in "big", from the early MAP3 era, a small MB-level MAP3 can meet the needs of many people, but with the passage of time, the storage unit has changed from GB to TB in the past, and even now the PB and EB level in the state. With the rapid development of information technology, data has begun to grow explosively.
Social networks (Weibo, Twitter, Facebook), mobile networks, various smart tools, service tools, etc., have all become the best of data. **Nearly 400 million members of the network generate about 20TB of commodity transaction data every day; Facebook's approximately 1 billion users generate more than 300 terabytes of log data every day. There is an urgent need for intelligent algorithms, powerful data processing platforms, and new data processing technologies to count, analyze, and process such large-scale data in real time.
2 Variety. The wide range of data** determines the diversity of big data forms. Any form of data can have a role, and the most widely used system is the recommendation system, such as **, NetEase Cloud**, Toutiao, etc., these platforms will analyze the user's log data, so as to further recommend what the user likes.
Log data is obviously structured data, and there are some data that are not obviously structured, such as **, audio, **, etc., these data have a weak causal relationship, so they need to be manually annotated.
Big data. <>
3 High speed. Big data is generated very quickly and is mainly transmitted via the Internet. Everyone is inseparable from the Internet in their lives, which means that every day individuals are providing a large amount of information to big data.
And these data need to be processed in a timely manner, because it is very cost-effective to spend a lot of capital to store historical data with a small effect, for a platform, maybe the data saved is only within the past few days or a month, and the data that is far away must be cleaned up in time, otherwise the cost is too great. Based on this situation, big data has very strict requirements for processing speed, and a large number of resources in the server are used to process and calculate data, and many platforms need to achieve real-time analysis. Data is generated all the time, and whoever is faster has an advantage.
First, the loss and fragmentation of biological habitats, the reclamation and expansion of land by humans, the area of undisturbed natural habitats has been drastically reduced and fragmented, and environmental pollution and climate change have also caused the disappearance of species. >>>More
3.Population diversity cannot be indicated.
Because a species can have many populations, a population refers to the aggregate of organisms of the same species with a certain natural distribution area. >>>More
The intrinsic form is genetic diversity, i.e., genetic diversity. >>>More
Returning farmland to forests, refraining from indiscriminate logging, planting trees and grasses, establishing nature reserves, and prohibiting the killing of endangered animals. >>>More
The diversity of cells is reflected in the differentiation of different structures of cells. It is mainly manifested in many aspects such as cell membrane, nucleus and cytoplasm. This is a microscopic view. >>>More