-
There are many ways to collect data, and here are some common ones:1Manual acquisition:
Manually extract the required data by manually browsing the web, copying and pasting, etc. This method is suitable for situations where the amount of data is small and the collection frequency is low. 2.
Web crawler: A crawler program written in a programming language that automatically accesses web pages and extracts data by simulating browser behavior. This approach is suitable for large-scale data collection and frequent updates.
This method is useful for situations where you need to get specific data, and the data source provides an API interface. 4.Database queries:
For data that has been stored in the database, you can query the database to obtain the required data. This method is suitable for situations where you need to get the data you already have. 5.
Data subscription: Some ** and applications provide data subscription services, users can subscribe to the data they are interested in, and the data will be automatically pushed to the user when it is updated. This approach is suitable for situations where data needs to be acquired in real time.
Octopus Collector is a full-featured, easy-to-operate, and wide-ranging Internet data collector that can help users collect data quickly and efficiently. To learn more about the methods and techniques of data collection, you can refer to the tutorial of Octopus Collector, please go to the official website Tutorial & Help for more details.
-
1. Data collection.
According to the type of collected data, it can be divided into different methods, the main methods are: sensor collection, crawler, entry, import, interface, etc.
2. Basic methods of data collection:
1) Sensor monitoring data: through sensors, that is, a word that is now widely used: the Internet of Things. Through the temperature and humidity sensor.
2) The second is news and information Internet data, which can be written by writing web crawlers.
After setting up the data source, crawl the data in a targeted manner.
3) The third method is to enter the existing data into the system by using the system entry page.
4) The fourth way is to target existing batches of structured data.
You can develop an import tool to import it into the system.
5) In the fifth way, data from other systems can be collected into this system through API interface.
-
Common data acquisition.
There are questionnaires.
Consult information, field inspection, and experiment.
1. Questionnaire survey: Questionnaire survey is the most commonly used method of data collection, because its cost is relatively low, and the information obtained will be more comprehensive.
2. Access to information: Access to information is the oldest way of data collection, and you can get the data you want by consulting books, records and other materials.
3. Field investigation: Field investigation is to go to the designated place to do research, which refers to going to the field to conduct an intuitive and partial detailed investigation in order to understand the truth of a thing and the development process of the situation.
4. Experiment: The advantage of experimental data collection is that the accuracy of the data is very high, and the disadvantage is that the uncertainty is great, regardless of the period of the experiment or the results of the experiment.
-
The data collection methods are classified according to two categories: online collection and offline collection, and the following is a brief introduction to each collection method and related technologies.
1.Online collection.
1) Open data.
Open data refers to the data that is open to everyone on the Internet, including data that is open to specific industries, data that is publicly available at all levels, and data related to content on web pages.
To obtain open data, we can use crawler technology, and here is a brief introduction to crawler technology.
Crawler technology is a technology that allows developers to automate and systematically collect relevant data on the Internet, and crawlers are not the producers of content, but the porters of content. All kinds of learning materials about reptile technology can be said to be "sweaty" on the Internet, so I won't talk about it here, but what we want to talk about here about crawlers is the safety of crawlers, and we must abide by relevant laws and remember not to touch the red line.
a.Personal information, commercial secrets, and state secrets are the red lines of data crawling.
b.Abide by professional ethics, control the frequency of crawler visits, and do not interfere with the normal business activities of the crawler.
c.Abide by the robots protocol and do what you can and can't climb.
2) Third-party platform data.
For example, if a developer wants to obtain relevant financial data, in addition to using crawler technology, we can retrieve relevant data through the API interface provided by a third-party platform.
I have received such a task, to obtain all the sections of a city that prohibit motor vehicles from turning left, prohibiting motor vehicles from turning right, and prohibiting motor vehicles from turning around, when there is no condition to obtain accurate data, we can set up starting and ending points at the intersection through the API interface of the map open platform of AutoNavi or, and analyze whether the intersection is forbidden to left, right, and U-turn by comparing the path planning distance of motor vehicles and walking. The corresponding function has a corresponding service document to explain how to use, and you can open ** to try if you are interested.
3) Physical data.
Physical data refers to the data generated by the user in the physical world, such as various sensors of the mobile phone when the user uses the mobile phone (fingerprint sensor: records the user's fingerprint for unlocking the mobile phone or making payments, etc., gyroscope: records the angular velocity through the principle of conservation of angular momentum for mobile phone navigation and other behaviors).
Compared with daily applications, physical data exists in a large number of traditional manufacturing industries, and there are generally the following types of data collection methods:
Sensors:
As mentioned above, there are many types of sensors in the mobile phone, and there are many types of sensors in the traditional manufacturing industry, covering different categories of industrial sensors such as light-sensitive, gas-sensitive, force-sensitive, magnetic-sensitive, and sound-sensitive.
-
1. Investigation method
Survey methods are generally divided into two categories: census and sample survey.
2. Observation
The observation method is to conduct research through meetings, in-depth on-site, participation in production and operation, on-site sampling, on-site observation and accurate recording (including surveying and mapping, audio recording, video, photography, transcription, etc.). It mainly includes two aspects: one is the observation of human behavior, and the other is the observation of objective things.
The observation method is widely used, often in combination with the interrogation method and the collection of objects, to improve the reliability of the information collected.
3. Literature search
Literature retrieval is the process of retrieving the required information from a vast body of literature. Literature search was divided into manual search and computer search.
By nature, it is divided into:
localized, such as various coordinate data;
qualitative, such as data that represents the properties of things (settlements, rivers, roads, etc.);
Quantitative, data that reflects the quantitative characteristics of things, such as geometric quantities such as length, area, and volume, or physical quantities such as weight and velocity;
Timed, data that reflects the time characteristics of things, such as year, month, day, hour, minute, second, etc.
According to the form of expression, it is divided into:
Numerical data, such as various statistical or measurement data. Numeric data is discrete within an interval.
Simulation data, composed of continuous functions, refers to the physical quantities that change continuously in a certain interval, and can be divided into graphic data (such as points, lines, and surfaces), symbol data, text data, and image data, such as the change of sound volume and temperature.
-
One is collection, such as crawlers, sensors, logs, which is the objective world to generate information and data, and the other is transportation, such as batch movement, real-time movement, this type is purely technical problems.
-
For example, you do quantitative investment, based on the volatility of big data in the future, and buy and sell based on this result. You can currently get all the historical data of the past, can you make a high-rate data analysis system based on these data?
In fact, if you only have historical data, you still can't understand why there are large fluctuations. For example, it may be that there was an outbreak of SARS or a war in a certain area. The impact of these major social events on the ** is also huge.
Therefore, we need to consider that the trend of a data is affected by multiple dimensions. We need to collect as many data dimensions as possible through multi-source data collection, and at the same time ensure the quality of the data, so as to obtain high-quality data mining results.
-
Common methods of data collection include direct observation, interview, communication, network survey, and satellite remote sensing.
1. Direct observation.
A method by which investigators go to the scene to observe, measure, and register the respondents to obtain information. The investigator has no control or interference with the observed events or actions and is able to obtain information without the person being investigated being aware of it.
2. Interview method.
Surveys with only one respondent at a time for specific questions. Suitable for more intimate issues, such as personal privacy issues; or more sensitive issues.
3. Communications Law.
The survey organizer (e.g. the statistical department) puts the questionnaire together.
or the questionnaire is mailed or sent to the respondent by electronic group, and then returned after filling it out, also known as mailing the questionnaire. The survey subjects are not limited by space area, and the survey cost is low; But the speed is slower, and the rate collapses and loses.
Lower. <
Types of direct observation
The direct observation method can be divided into two methods: open observation and covert observation. Public observation is the disclosure of the investigator at the place of investigation, that is, the person being investigated is aware that someone is observing his words and actions.
Covert observation is when the respondent is not aware that his or her actions have been observed and recorded. In most cases, these two methods are direct first-hand research methods, and store operators often need to understand the operation of competitors in order to know themselves and their opponents in the mall, and be in the initiative position to compete. However, openly conducting an investigation at a competitor's store will attract the attention of the other party.
Covert observation can be used as a direct investigative method to collect information about competitors. If the enterprise uses to send market researchers as customers to the competitor's store for direct observation, it will be able to obtain the competitor's product variety, furnishings and layout, store activities, sales staff services and other information.
Data acquisition. Refers to the process of automatically collecting information from analog and digital units under test such as sensors and other devices under test. The data acquisition system is a combination of computer-based measurement hardware and software products to achieve a flexible, user-defined measurement system. >>>More
3.Configure collection rules. You can use the intelligent recognition function to let Octopus automatically identify the data structure of the e-commerce ** page, or manually set the collection rules. >>>More
For most manufacturing enterprises, the automatic data acquisition of measuring instruments has always been a troublesome thing, even if the instrument has RS232 485 and other interfaces, but still in the use of measurement, while manually recording to the paper, and finally input into the PC to process the way, not only the work is heavy, but also can not ensure the accuracy of the data, often the data obtained by the management personnel has been lagging behind the data for a day or two; For on-site defective product information and related output data, how to achieve efficient, concise and real-time data collection is a major problem.
There are a lot of web collection tools,But it's generally more difficult to use.,Can't write a program to estimate it.,There's a newly released octopus collector recently.,It's pretty simple.,It's a little bit of a mouse.。
The opening of the topic is the purpose and significance of the topic. >>>More