create a tokenized document term matrix a confusion matrix and word frequency in python

In Python, compute the following in the code structure provided below.

(A) – Document Term Matrix. Define a function called compute_dtm as follows: Take a list of docs as a parameter. Tokenize each document into lower-cased words without any leading and trailing punctuations. Let words denote the list of unique words in docs. Compute dtm, which is a 2-dimensional array created from the documents as follows: Each row (i) represents a document. Each column (j) represents a unique word in words. Each cell (i,j) is the count of word j in document i. Fill 0 if word j does not appear in document i. Return dtm and words.

(B) -Performance Analysis. Suppose your machine learning model returns a one-dimensional array of probabilities as the output. Write a function “performance_analysis” to do the following: Take three input parameters: probability array, ground-truth label array, and a threshold th. If a probability > th, the prediction is positive; otherwise, negative. Compare the predictions with the ground truth labels to calculate the confusion matrix as shown in the figure, where: True Positives (TP): the number of correct positive predictions. False Positives (FP): the number of postive predictives which actually are negatives. True Negatives (TN): the number of correct negative predictions. False Negatives (FN): the number of negative predictives which actually are positives. Calculate precision as TP/(TP+FP) and recall as TP/(TP+FN). Return the confusion matrix, precision, and recall 2. Call this function with th set to 0.5, print out confusion matrix, precision, and recall 3. Call this function with th varying from 0.05 to 1 with an increase of 0.05. Plot a line chart to see how precision and recall change by th. Observe how precision and recall change by th.

(C) Define a function called DTM as follows: A list of documents (docs) is passed to inialize a DTM object. The __init__ function creates two attributes: an attribute called words, which saves a list of unique words in the documents an attribute called dtm, which saves the document-term matrix returned by calling the function defined in Q1. This class contains two methods: max_word_freq(): returns the word with the maximum total count in the entire corpus. max_word_df(): returns the word with the largest document frequency, i.e. appear in the most of the documents.

# Structure of your solution:

import numpy as np

import pandas as pd

import string from matplotlib

import pyplot as plt

def compute_dtm(docs):

dtm = None

# add your code here

return dtm, words

def evaluate_performance(prob, truth, th):

conf, prec, rec = None, None, None

return conf, prec, rec

class DTM(object):

# add your code here

Looking for solution of this Assignment?

WHY CHOOSE US?

We deliver quality original papers	Our experts write quality original papers using academic databases.We dont use AI in our work. We refund your money if AI is detected
Free revisions	We offer our clients multiple free revisions just to ensure you get what you want.
Discounted prices	All our prices are discounted which makes it affordable to you. Use code FIRST15 to get your discount
100% originality	We deliver papers that are written from scratch to deliver 100% originality. Our papers are free from plagiarism and NO similarity.We have ZERO TOLERANCE TO USE OF AI
On-time delivery	We will deliver your paper on time even on short notice or short deadline, overnight essay or even an urgent essay

Place an Order
Type of Service:
Subject:
Academic Level:
Pages:	- +
Number of Sources:
Spacing:
Deadline:	Price per page: $0
Total:	$

create a tokenized document term matrix a confusion matrix and word frequency in python

Looking for solution of this Assignment?

WHY CHOOSE US?

We deliver quality original papers

Free revisions

Discounted prices

100% originality

On-time delivery

Place an Order

Our Services

Pages

Our Guarantees

Contact us