Naive Bayes and three common model derivations

4 answers

Anonymous users2024-02-15

The Bayesian inference formula involved in the naïve Bayesian classification is: p(a)*p(b|a) =p(b)*p(a|b)。

The Bayesian Reason was developed by the British mathematician Thomas Bayes (1702-1761) to describe the relationship between two conditional probabilities, such as p(a|b) and p(b|a)。

According to the multiplication rule, it can be exported immediately: p(a b) = p(a)*p(b|a)=p(b)*p(a|b)。The above formula can also be deformed as:

p(b|a) =p(a|b)*p(b) /p(a)。Bayes' theorem can be derived from conditional probability. In the diagram, A and B are two events, and the conditional probability refers to the probability that one event will occur after another event occurs.

Represented by mathematical symbols, p(a|b) refers to the probability of event a occurring under the condition that event b occurs. Conversely, p(b|a) refers to the probability of event b occurring under the condition that event a occurs. The probability that both event A and B are true at the same time is equal to the probability of event A occurring multiplied by the conditional probability of event B occurring after event A occurs, and it is equal to both the probability of event B occurring multiplied by the conditional probability of event A occurring after event B occurs.

Examples about coins:

Probability theory likes to use coins as an example, and here we will also take a coin example, mainly borrowing a visual diagram published in naturemethods. We have two "fair" coins, both with a 50% probability of heads after a coin toss, i.e. p(h) = 50%. In this case, the joint probability of choosing a particular coin c and the positive head of a particular outcome h is the product of their respective probabilities, p(c,h) = p(c)*p(h).

If we swap one of the coins for a biased coin, and 75% of the coin is tossed, then the coin selection and heads are not independent events. The relationship between two events can be expressed by the conditional probability mentioned above, p(h|cb) =75%。

p(cb) is our "guess" about the probability that the coin is biased before the coin is tossed, that is, the prior probability. and p(cb|h) is the re-"guessing" of the probability that the coin is biased after the coin toss comes out, i.e., the posterior probability. p(h|cb) is equal to, p(cb) is equal to; And p(h) is equal to p(h|c)*p(c) +p(h|cb)*p(cb), which is equal to.

According to the Bayesian formula, we know that p(cb|h) equals.

From the above, we obtain the posterior probability from the prior probability through the result of a coin toss. If the coin toss continues, we have more and more "data", and the next toss is still positive (some people think that the coin is biased), we can update the original hypothetical prior probability with the first posterior probability, and then use the new Bayesian formula to calculate the new posterior probability.
Anonymous users2024-02-14

The reasoning methods of Bayesian model mainly include: heuristic strategy theory, natural sampling space hypothesis, frequency effect theory, and sampling processing theory.

Bayesian reasoning is an inductive reasoning method discovered by the British priest Bayes, and many later researchers have continuously improved Bayesian methods in terms of views, methods and theories, and finally formed an influential school of statistics, breaking the dominance of classical statistics. Bayesian reasoning is a new method of reasoning developed on the basis of classical statistical inductive reasoning - estimation and hypothesis testing.

Compared with the classical statistical inductive reasoning method, Bayesian reasoning should draw conclusions not only on the basis of the currently observed sample information, but also on the relevant past experience and knowledge of the inferent. As a method of reasoning, Bayesian reasoning is an extension of Bayes' theorem in probability theory.

Study Overview:

Kahneman and Tversky opened up an important field of study in probabilistic reasoning. Their research in the early 70s of the 20th century first found that people's intuitive probability reasoning does not follow the Bayesian principle, which is manifested in the fact that the basic probability information in the problem is often ignored in the judgment, and the judgment is mainly based on the hit rate information.

One of their classic studies was to tell the participants that 70 out of 100 people were lawyers and 30 were engineers, and when they were randomly selected from among them, the probability that the participants would judge that person to be an engineer was close when the person's personality traits were described as an engineer. Obviously, the participants ignored the basic probability of an engineer being only 30%.

Later, they also used a variety of problems to verify the basic probability ignorance phenomenon, such as asking the participants to solve the following taxi problem: 85% of the taxis in a city belong to the green car company, 15% belong to the blue car company, and the existing taxi is involved in a hit-and-run incident, according to an eyewitness, the car involved in the accident belongs to the blue car company, and the reliability of the witness is 80%. Q: What is the probability that the car causing the accident is a blue car.

The result was judged to be 80% by most participants, but it should be 41% when the basic probability is considered.
Anonymous users2024-02-13

Simple derivation of Bayesian formula:

The naivety of Naive Bayes lies in the assumption that each value of the b feature is independent of each other, so the formula of Naive Bayes is like this.

Learning & Classification Algorithms:

1) Calculate a priori and conditional probabilities.

Laplace Smooth:

2) Substituting the sample vector to obtain different categories p, and then taking the category with the largest p as the label category according to the maximum posterior probability.

The advantage of naive Bayes is that it is good for small-scale data and suitable for multiple classifications. The disadvantage is that the form of data input is sensitive, and it is difficult to guarantee the impact of the independence of the eigenvalues.
Anonymous users2024-02-12

vmap=arg max p( vj | a1,a2...an)

VJ belongs to the V set.

where vmap is the most likely target value given an example.

where A1....An is the property in this example.

Among them, the target value of vmap is the one with the highest probability calculated later. So it's represented by max.

The Bayesian formula is applied to p( vj | a1,a2...an).

vmap= arg max p(a1,a2....an | vj ) p( vj ) / p (a1,a2...an)

And because the naïve Bayesian classifier defaults to a1....anthony they are independent of each other.

So p(a1,a2....an) is not useful for the results. [Because all the probabilities have to be compared after dividing the same thing, the final result does not seem to have much effect].

vmap= arg max p(a1,a2....an | vj ) p( vj )

The naïve Bayesian classifier is then based on a simple assumption: the properties are conditionally independent of each other given a target value. Other words.

The assumptions illustrate the target value for a given instance. Combined a1, a2....The probability of an is exactly the probability product of each individual attribute:

p(a1,a2...an | vj ) =πi p( ai| vj )

Naive Bayesian classifier: vnb = arg max p( vj ) i p ( ai | vj )

vnb = arg max p ( vj )

Here vj ( yes |no), an example of the corresponding weather.