Naive Bayes Classification explained with Python

Machine Learning is a vast area of Computer Science that is concerned with designing algorithms which form good models of the world around us (the data coming from the world around us).

Within Machine Learning many tasks are – or can be reformulated as – classification tasks.

In classification tasks we are trying to produce a model which can give the correlation between the input data $X$ and the class $C$ each input belongs to. This model is formed with the feature-values of the input-data. For example, the dataset contains datapoints belonging to the classes ApplesPears and Oranges and based on the features of the datapoints (weight, color, size etc) we are trying to predict the class.

We need some amount of training data to train the Classifier, i.e. form a correct model of the data. We can then use the trained Classifier to classify new data. If the training dataset chosen correctly, the Classifier should predict the class probabilities of the new data with a similar accuracy (as it does for the training examples).


After construction, such a Classifier could for example tell us that document containing the words “Bose-Einstein condensate” should be categorized as a Physics article, while documents containing the words “Arbitrage” and “Hedging” should be categorized as a Finance article.

Another Classifier (whose dataset is illustrated below) could tell whether or not a person makes more than 50K, based on features such as Age, Education, Marital Status, Occupation etc.


As we can see, there is a input dataset $ X $ which corresponds to a ‘output’ $Y$. The dataset $X$ contains $m$ input examples$x^{(1)}, x^{(2)}, .. , x^{(m)}$, and each input example has $n$ feature values $x_1, x_2, ..., x_n$ (here $n\ =\ 7$).

There are three popular Classifiers within Machine Learning, which use three different mathematical approaches to classify data;

  • Naive Bayes, which uses a statistical (Bayesian) approach,
  • Logistic Regression, which uses a functional approach and
  • Support Vector Machines, which uses a geometrical approach.

Previously we have already looked at Logistic Regression. Here we will see the theory behind the Naive Bayes Classifier together with its implementation in Python.

For the rest of the post, click here.


Bayesian machine learning

In essence, Bayesian means probabilistic. The specific term exists because there are two approaches to probability. Bayesians think of it as a measure of belief, so that probability is subjective and refers to the future.

Frequentists have a different view: they use probability to refer to past events – in this way it’s objective and doesn’t depend on one’s beliefs. The name comes from the method – for example: we tossed a coin 100 times, it came up heads 53 times, so the frequency/probability of heads is 0.53.

For a thorough investigation of this topic and more, refer to Jake VanderPlas’ Frequentism and Bayesianism series of articles.

Bayesian Statistics explained to Beginners in Simple English

Bayesian Statistics continues to remain incomprehensible in the ignited minds of many analysts. Being amazed by the incredible power of machine learning, a lot of us have become unfaithful to statistics. Our focus has narrowed down to exploring machine learning. Isn’t it true?

We fail to understand that machine learning is only one way to solve real world problems. In several situations, it does not help us solve business problems, even though there is data involved in these problems. To say the least, knowledge of statistics will allow you to work on complex analytical problems, irrespective of the size of data.

In 1770s, Thomas Bayes introduced ‘Bayes Theorem’. Even after centuries later, the importance of ‘Bayesian Statistics’ hasn’t faded away. In fact, today this topic is being taught in great depths in some of the world’s leading universities.

With this idea, I’ve created this beginner’s guide on Bayesian Statistics. I’ve tried to explain the concepts in a simplistic manner with examples. Prior knowledge of basic probability & statistics is desirable. By the end of this article, you will have a concrete understanding of Bayesian Statistics and its associated concepts.