**By Aisha Javed **.

**Unfolding Naive Bayes from Scratch! Take-2 ****��**

So in my previous blog post of **Unfolding Naive Bayes from Scratch! Take-1**��, I tried to decode the rocket science behind the working of The Naive Bayes (NB) ML algorithm, and after going through it’s algorithmic insights, you too must have realized that it’s quite a painless algorithm. In this blog post, we will walk-through it’s complete step by step pythonic implementation ( using basic python only) and it will be quite evident that *how easy it is to code NB from scratch* and that NB is *not that Naive at classifying !*

**Who’s the Target Audience? ****��** **��** **��**** ML Beginners ****��**** ****����**

Since I always wanted to decipher ML for absolute beginners and as it is said that if you can’t explain it, you probably didn't understand it, so yeah this blog post too is especially intended for *ML beginners looking for humanistic ML resources for an in depth yet without any gibberish jargon of those creepy Greek mathematical formulas *( honestly that scary looking math never made any sense to me too ! )

**Outcome of this Tutorial — A Hands-On Pythonic Implementation of NB**

As I just mentioned above, a complete walk-through of NB pythonic implementation

*Once you reach the end of this blog post, you will be done completely with 90% of understanding & implementing NB and only 10% will be remaining to master it from application point of view!*

**ML Little Birdy Transforming from Zero to Hero !!!**

### Defining The Roadmap….. ��

**Milestone # 1: ****Data Preprocessing Function**

**Milestone # 2: ****Implementation of NaiveBayes Class**** ****—**** ****Defining Functions for Training & Testing**

**Milestone # 3: ****Training NB Model on Training Dataset**

**Milestone # 4: ****Testing Using Trained NB Model**

**Milestone # 5: ****Proving that the Code for NaiveBayes Class is Absolutely Generic!**

Before we begin writing code for Naive Bayes in python, I assume you are familiar with:

- Python Lists
- Numpy & just a tad bit of vectorized code
- Dictionaries
- Regex

*Let’s Begin the with the Pythonic Implementation !*

**Defining **
1. **def **addToBow(self,example,dict_index)
2. **def **train(self,dataset,labels)
3. **def **getExampleProb(self,test_example)
4. **def **test(self,test_set)

And the code is divided into two major functions i.e train & test functions. Once you understand the statements defined inside these two functions, you will surely get to know what the code is actually doing and in what order the other two functions are being called.

1. Training function that trains NB Model :
**def **train(self,dataset,labels)
2. Testing function that is used to predict class labels
for the given test examples :
**def **test(self,test_set)

The other two functions are defined to supplement these two major functions

1. BoW function that supplements training function
It is called by the train function.
It simply splits the given example using space as a tokenizer
and adds every tokenized word to its corresponding BoW :
**def **addToBow(self,example,dict_index)
2. Probability function that supplements test function.
It is called by the test function.
It estimates probability of the given test example so that
it can be classified for a class label :
**def **getExampleProb(self,test_example)

You can view the above code in this **Jupyter Notebook** too

It is much much more easier to organize and reuse the code if we define a class of NB rather than use the traditional structured programming approach. That’s the reason of defining a NB class and all it’s relevant functions inside it.

We don’t just want to write code, rather we want to write beautiful, neat & clean, handy, and reusable code . Yes that’s right - we want to possess all the traits that a good

1.defaddToBow(self,example,dict_index) 2.deftrain(self,dataset,labels) 3.defgetExampleProb(self,test_example) 4.deftest(self,test_set)

And the code is divided into two major functions i.e train & test functions. Once you understand the statements defined inside these two functions, you will surely get to know what the code is actually doing and in what order the other two functions are being called.

1. Training function that trains NB Model :deftrain(self,dataset,labels) 2. Testing function that is used to predict class labels for the given test examples :deftest(self,test_set)

The other two functions are defined to supplement these two major functions

1. BoW function that supplements training function It is called by the train function. It simply splits the given example using space as a tokenizer and adds every tokenized word to its corresponding BoW :defaddToBow(self,example,dict_index) 2. Probability function that supplements test function. It is called by the test function. It estimates probability of the given test example so that it can be classified for a class label :defgetExampleProb(self,test_example)

You can view the above code in this

Jupyter Notebooktoo

It is much much more easier to organize and reuse the code if we define a class of NB rather than use the traditional structured programming approach. That’s the reason of defining a NB class and all it’s relevant functions inside it.

We don’t just want to write code, rather we want to write beautiful, neat & clean, handy, and reusable code . Yes that’s right - we want to possess all the traits that a good