2023 In this problem we will attempt to identify spam or ham SMS messages using the na ve Bayes | Assignment Collections

Computer Science 2023 Na¨ıve Bayes Classification

2023 In this problem we will attempt to identify spam or ham SMS messages using the na ve Bayes | Assignment Collections

In this problem, we will attempt to identify spam or ham SMS messages using the na¨ıve Bayes model. You will need to download the SMS dataset: http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection. Python users: scikit-learn is not allowed.

Consider an SMS message as a (case-insensitive) sequence of words (X1,··· ,XT). Ignore all other punctuation. Under the na¨ıve Bayes assumption, the probability of the words in each message factors as:

P(x1:T|y) =

T Y t=1

P(xi|y). (3)

When estimated from dataset D with pseudo-count prior of α, the model parameters are:

ˆ P(xi|y) =

CountD(xi,y) + α CountD(y) + Nα

, (4)

where: CountD(xi,y) and CountD(y) are the number of occurances of word xi in spam/ham messages y (from our sample D); and the number of words for label spam/ham words y (from our sample D) respectively; and N is the total number of dictionary words (including words not seen in D). Let us use N = 20,000 and

4

α = 0.1 in our experiments.

Note that the classes are heavily imbalanced. The number of spam messages is 747, while the number of ham messages is 4827. If a simple classifier predicts that all messages are ham, it will get around 86% accuracy. In this case, accuracy is not a good measurement of the classifier’s performance.

Instead of using accuracy, we can use confusion matrix to see the performance of our model. Below is the explanation of confusion matrix:

True condition Positive Negative

Predicted Condition

Positive True positive False positive Negative False negative True negative

Other important performance measurements are precision, recall, and F-score, defined as:

precision =

true positive true positive + false positive

(5)

recall =

true positive true positive + false negative

(6)

F-score = 2·

precision·recall precision + recall

(7)

(a) Randomly split the messages into a training set D1 (80% of messages) and a testing set D2 (20% of messages). Calculate the testing accuracy, confusion matrix, precision, recall, and F-score of the Na¨ıve Bayes classifier in determining whether a message is spam or ham. Submit your source code. Note: Let’s assume that spam is the positive class. (20 points)

(b) How does the change of α effect the classifier performance? Using random split above, evaluate the training and test accuracy and F-score under different selections of α. The selection of α values are 2i where i = −5,··· ,0. Create two plots, the first plot is for the accuracy measure and the second plot is for F-score. In each plot, x-axis represents i, and y-axis represents the performance measure (accuracy/F-score). Each plot contains two line chart, a line chart describing training accuracy/F-score measure, the other line chart is for test accuracy/F-score. Submit your source code. (20 points)

Hints: There are scripts in both Octave (nbayes.m) and Julia (nbayes.jl) for counting occurances of words from spam/ham messages. A few notes: (i) Octave can make use of Java data structures (e.g., the hashmap); this can be easier / more efficient than using Octave structures. (ii) Julia has its own data structure library which includes many standard data structures e.g. Dict for hashmap. (iii) logxy = logx + logy.

 

We give our students 100% satisfaction with their assignments, which is one of the most important reasons students prefer us to other helpers. Our professional group and planners have more than ten years of rich experience. The only reason is that we have successfully helped more than 100000 students with their assignments on our inception days. Our expert group has more than 2200 professionals in different topics, and that is not all; we get more than 300 jobs every day more than 90% of the assignment get the conversion for payment.

Place Order Now

#write essay #research paper #blog writing #article writing #academic writer #reflective paper #essay pro #types of essays #write my essay #reflective essay #paper writer #essay writing service #essay writer free #essay helper #write my paper #assignment writer #write my essay for me #write an essay for me #uk essay #thesis writer #dissertation writing services #writing a research paper #academic essay #dissertation help #easy essay #do my essay #paper writing service #buy essay #essay writing help #essay service #dissertation writing #online essay writer #write my paper for me #types of essay writing #essay writing website #write my essay for free #reflective report #type my essay #thesis writing services #write paper for me #research paper writing service #essay paper #professional essay writers #write my essay online #essay help online #write my research paper #dissertation writing help #websites that write papers for you for free #write my essay for me cheap #pay someone to write my paper #pay someone to write my research paper #Essaywriting #Academicwriting #Assignmenthelp #Nursingassignment #Nursinghomework #Psychologyassignment #Physicsassignment #Philosophyassignment #Religionassignment #History #Writing #writingtips #Students #universityassignment #onlinewriting #savvyessaywriters #onlineprowriters #assignmentcollection #excelsiorwriters #writinghub #study #exclusivewritings #myassignmentgeek #expertwriters #art #transcription #grammer #college #highschool #StudentsHelpingStudents #studentshirt #StudentShoe #StudentShoes #studentshoponline #studentshopping #studentshouse #StudentShoutout #studentshowcase2017 #StudentsHub #studentsieuczy #StudentsIn #studentsinberlin #studentsinbusiness #StudentsInDubai #studentsininternational