# create a tokenized document term matrix a confusion matrix and word frequency in python

In Python, compute the following in the code structure provided below.

(A) – Document Term Matrix. Define a function called compute_dtm as follows: Take a list of docs as a parameter. Tokenize each document into lower-cased words without any leading and trailing punctuations. Let words denote the list of unique words in docs. Compute dtm, which is a 2-dimensional array created from the documents as follows: Each row (i) represents a document. Each column (j) represents a unique word in words. Each cell (i,j) is the count of word j in document i. Fill 0 if word j does not appear in document i. Return dtm and words.

(B) -Performance Analysis. Suppose your machine learning model returns a one-dimensional array of probabilities as the output. Write a function “performance_analysis” to do the following: Take three input parameters: probability array, ground-truth label array, and a threshold th. If a probability > th, the prediction is positive; otherwise, negative. Compare the predictions with the ground truth labels to calculate the confusion matrix as shown in the figure, where: True Positives (TP): the number of correct positive predictions. False Positives (FP): the number of postive predictives which actually are negatives. True Negatives (TN): the number of correct negative predictions. False Negatives (FN): the number of negative predictives which actually are positives. Calculate precision as TP/(TP+FP) and recall as TP/(TP+FN). Return the confusion matrix, precision, and recall 2. Call this function with th set to 0.5, print out confusion matrix, precision, and recall 3. Call this function with th varying from 0.05 to 1 with an increase of 0.05. Plot a line chart to see how precision and recall change by th. Observe how precision and recall change by th.

(C) Define a function called DTM as follows: A list of documents (docs) is passed to inialize a DTM object. The __init__ function creates two attributes: an attribute called words, which saves a list of unique words in the documents an attribute called dtm, which saves the document-term matrix returned by calling the function defined in Q1. This class contains two methods: max_word_freq(): returns the word with the maximum total count in the entire corpus. max_word_df(): returns the word with the largest document frequency, i.e. appear in the most of the documents.

import numpy as np

import pandas as pd

import string from matplotlib

import pyplot as plt

#A

def compute_dtm(docs):

dtm = None

return dtm, words

#B

def evaluate_performance(prob, truth, th):

conf, prec, rec = None, None, None

return conf, prec, rec

#C

class DTM(object):

##### Looking for solution of this Assignment? ## WHY CHOOSE US?

### We deliver quality original papers

Our experts write quality original papers using academic databases.

### Free revisions

We offer our clients multiple free revisions just to ensure you get what you want.

### Discounted prices

All our prices are discounted which makes it affordable to you. Use code FIRST15 to get your discount

### 100% originality

We deliver papers that are written from scratch to deliver 100% originality. Our papers are free from plagiarism and NO similarity

### On-time delivery

We will deliver your paper on time even on short notice or  short deadline, overnight essay or even an urgent essay