In Python, compute the following in the code structure provided below.

(A) – Document Term Matrix. Define a function called compute_dtm as follows: Take a list of docs as a parameter. Tokenize each document into lower-cased words without any leading and trailing punctuations. Let words denote the list of unique words in docs. Compute dtm, which is a 2-dimensional array created from the documents as follows: Each row (i) represents a document. Each column (j) represents a unique word in words. Each cell (i,j) is the count of word j in document i. Fill 0 if word j does not appear in document i. Return dtm and words.

(B) -Performance Analysis. Suppose your machine learning model returns a one-dimensional array of probabilities as the output. Write a function “performance_analysis” to do the following: Take three input parameters: probability array, ground-truth label array, and a threshold th. If a probability > th, the prediction is positive; otherwise, negative. Compare the predictions with the ground truth labels to calculate the confusion matrix as shown in the figure, where: True Positives (TP): the number of correct positive predictions. False Positives (FP): the number of postive predictives which actually are negatives. True Negatives (TN): the number of correct negative predictions. False Negatives (FN): the number of negative predictives which actually are positives. Calculate precision as TP/(TP+FP) and recall as TP/(TP+FN). Return the confusion matrix, precision, and recall 2. Call this function with th set to 0.5, print out confusion matrix, precision, and recall 3. Call this function with th varying from 0.05 to 1 with an increase of 0.05. Plot a line chart to see how precision and recall change by th. Observe how precision and recall change by th.

(C) Define a function called DTM as follows: A list of documents (docs) is passed to inialize a DTM object. The __init__ function creates two attributes: an attribute called words, which saves a list of unique words in the documents an attribute called dtm, which saves the document-term matrix returned by calling the function defined in Q1. This class contains two methods: max_word_freq(): returns the word with the maximum total count in the entire corpus. max_word_df(): returns the word with the largest document frequency, i.e. appear in the most of the documents.

# Structure of your solution:

import numpy as np

import pandas as pd

import string from matplotlib

import pyplot as plt


def compute_dtm(docs):

dtm = None

# add your code here

return dtm, words


def evaluate_performance(prob, truth, th):

conf, prec, rec = None, None, None

return conf, prec, rec


class DTM(object):

# add your code here

Looking for solution of this Assignment?


We deliver quality original papers

Our experts write quality original papers using academic databases.  

Free revisions

We offer our clients multiple free revisions just to ensure you get what you want.

Discounted prices

All our prices are discounted which makes it affordable to you. Use code FIRST15 to get your discount

100% originality

We deliver papers that are written from scratch to deliver 100% originality. Our papers are free from plagiarism and NO similarity

On-time delivery

We will deliver your paper on time even on short notice or  short deadline, overnight essay or even an urgent essay