Artificial intelligent assistant

Naive Bayes problem applied to text Assume that you are using a Naïve Bayes classifier to classify some documents into two classes, Sports and Health docs. Assume that there are only $5$ words used in your model. Let us denote these 5 features as $w_1, w_2, w_3, w_4$ and $w_5$. $$p(w_1 |Sports )=0.3$$ $$p(w_2 |Sports )=0.2$$ $$p(w_3 |Sports )=0.05$$ $$p(w_4 |Sports )=0.4$$ $$p(w_5 |Sports )=0.05$$ $$p(w_1 |Health)=0.05$$ $$p(w_2 |Health )=0.3$$ $$p(w_3 |Health )=0.5$$ $$p(w_4 |Health )=0.1$$ $$p(w_5 |Health )=0.05$$ $$p(Sports )= \frac{\text{number of Sports documents}}{\text{total number of documents}} = 0.65$$ $$p(Health )= \frac{\text{number of Health documents}}{\text{total number of documents}} = 0.35$$ compute $p(Sports|w_1,w_2 )$ and $p(Health|w_1,w_2 )$. Show the derivation of your answer step by step. Based on the computed probabilities, which category (Health vs. Sports) do you think this document belongs to? Can anyone help me understand how to solve this problem?

The Naive Bayes Classifier assumption leads to,

$$P(w_1, w_2, \ldots, w_5 | y) = \prod_{i=1}^{5}P(w_i|y)$$

where $y = \text{Sports/Health}$.

$$P(\text{Sports}|w_1, w_2) = \frac{P(w_1, w_2 | \text{Sports})P(\text{Sports})}{P(w_1, w_2)} = \frac{P(w_1 | \text{Sports})P(w_2 | \text{Sports})P(\text{Sports})}{P(w_1, w_2)}$$

$$P(\text{Health}|w_1, w_2) = \frac{P(w_1, w_2 | \text{Health})P(\text{Health})}{P(w_1, w_2)} = \frac{P(w_1 | \text{Health})P(w_2 | \text{Health})P(\text{Health})}{P(w_1, w_2)}$$

where,

$$\begin{align} P(w_1, w_2) &= P(w_1, w_2 | \text{Sports})P(\text{Sports}) + P(w_1, w_2 | \text{Health})P(\text{Health}) \\\\\\\ &= P(w_1 | \text{Sports})P(w_2 | \text{Sports})P(\text{Sports}) + P(w_1 | \text{Health})P(w_2 | \text{Health})P(\text{Health})\end{align}$$

Take it from here.

**Edit** : Note that,

$$P(y|w_1, w_2, \ldots, w_5) = \frac{P(y)\prod_{i=1}^{5}P(w_i|y)}{P(w_1, w_2, \ldots, w_5)}$$

where $y = \text{Sports/Health}$.

xcX3v84RxoQ-4GxG32940ukFUIEgYdPy d53f87db7b171f97f1f6f06f853ee0ea