1) (35 points) Consider a corpus that contain five documents

1)  (35 points) Consider a corpus that contain five documents in Table 1. Using python is fine for this question. In case you use Python for this question, submit your python code too.

Doc1

Decide which attribute the decision tree algorithm would choose.

Doc2

A decision tree is a classification algorithm that is widely used in machine learning. 

Doc3

Making a decision to put a tree is very difficult due to lack of power for the decision

Doc4

Language decision varies from person to person and time to time.

Doc5

Decision trees are different from binary trees or binary search trees.

a)  Build a term-document matrix based on raw count of each term for the corpus in Table 1 after removing stopwords and lemmatizing sentences. Use only noun and verb to build a term-document matrix.

b)  Build a term-document matrix based on tf-idf of each term for the corpus in Table 1 after removing stopwords and lemmatizing sentences. Use only noun and verb to build a term-document matrix.

Show the procedure how you calculated tf-idf.

(Use stopwords provided by NLTK given here: 

{‘of’, ‘against’, ‘ll’, ‘they’, ‘aren’, ‘our’, ‘that’, ‘shouldn’, ‘only’, ‘shan’, ‘o’, “isn’t”, ‘been’, “weren’t”, “you’ve”, ‘myself’, ‘as’, ‘once’, ‘my’, ‘both’, ‘too’, ‘be’, ‘should’, ‘hadn’, ‘in’, ‘does’, “you’ll”, ‘during’, ‘herself’, ‘will’, ‘any’, ‘was’, ‘how’, ‘which’, “didn’t”, ‘but’, ‘had’, ‘more’, ‘needn’, ‘further’, ‘whom’, ‘mustn’, ‘no’, ‘did’, “aren’t”, ‘or’, ‘on’, ‘down’, ‘them’, ‘to’, ‘same’, “shouldn’t”, “should’ve”, “mightn’t”, “it’s”, ‘between’, ‘before’, ‘he’, ‘here’, “hadn’t”, ‘have’, ‘if’, “you’re”, ‘haven’, ‘under’, ‘nor’, ‘t’, ‘can’, ‘re’, ‘it’, ‘y’, ‘where’, ‘then’, ‘she’, ‘own’, ‘hers’, ‘is’, ‘isn’, ‘each’, ‘don’, ‘now’, ‘by’, ‘than’, “hasn’t”, ‘his’, ‘who’, ‘above’, ‘this’, “mustn’t”, ‘their’, “couldn’t”, ‘there’, ‘couldn’, ‘over’, “you’d”, ‘m’, ‘doing’, ‘when’, ‘into’, ‘i’, ‘other’, ‘a’, ‘ours’, ‘because’, ‘we’, ‘an’, ‘weren’, ‘most’, ‘for’, ‘wasn’, “won’t”, ‘up’, “shan’t”, ‘while’, ‘your’, ‘am’, ‘through’, ‘after’, “don’t”, ‘theirs’, ‘ain’, ‘him’, ‘having’, ‘until’, ‘those’, ‘yourself’, ‘off’, ‘just’, ‘below’, ‘didn’, “wouldn’t”, “that’ll”, ‘out’, ‘mightn’, ‘ma’, ‘wouldn’, ‘such’, ‘won’, ‘all’, ‘the’, ‘has’, ‘ourselves’, ‘doesn’, ‘some’, ‘few’, ‘these’, ‘and’, “needn’t”, “doesn’t”, ‘what’, ‘with’, ‘very’, ‘himself’, ‘do’, ‘again’, ‘d’, ‘yours’, ‘are’, “wasn’t”, ‘not’, ‘being’, ‘were’, ‘from’, ‘me’, ‘ve’, ‘why’, ‘itself’, ‘s’, ‘so’, ‘hasn’, ‘her’, “she’s”, ‘you’, “haven’t”, ‘themselves’, ‘its’, ‘at’, ‘yourselves’, ‘about’}

Share This Post

Email
WhatsApp
Facebook
Twitter
LinkedIn
Pinterest
Reddit

Order a Similar Paper and get 15% Discount on your First Order

Related Questions

Is the ‘freezing hypothesis’ still important in understanding the

This topic is based on Party systems. The essay will have to discuss which dimension of political conflict (economic or cultural) is more useful in helping us understand recent parliamentary campaigns, referendums, and elections in Western Europe. Also, you will find a paper uploaded which clearly states some VERY important

The assignment: (1–2 pages) Write an analysis of the strengths

The assignment: (1–2 pages) Write an analysis of the strengths and weaknesses of this author’s effort at scholarly writing and an assessment of the paragraph in terms of bias, opinion, quality of evidence, and appropriateness for its target audience. Be sure your analysis is written in a scholarly voice and

Patterns of Segregation: An Analysis of the Spatial Distribution of

Description an ethnic segregation study mapping the distributions of 6 ethnic groups in the Smithdown/Wavertree area of Liverpool between the 2001 and 2011 census years. Will be using an inductive approach – applying existing acaedmics / theories to explain the distributions shown in the maps. focus on the changes over

Makerere University Health & Medical Question Nursing Assignment Help

Task summary: a professional identity paper. Full order description: Dear Freelancer,please write the task   MAIN DETAILS:  My notes: “The reason why I chose nursing is because growing up I was a very sick child and I spent half of my life in hospitals Until today, I remember who made

I would use two forms of communication, first communication by

I would use two forms of communication, first communication by email as preparation for what is coming and secondly a face-to-face meeting to explain reasons and respond to questions. According to Daft, “Face-to-face discussion is the richest medium because it permits direct experience, multiple information cues, immediate feedback, and personal

Why the american revolution happen? No Page Minimum Five (5)

Why the american revolution happen?  No Page Minimum Five (5) Pages Maximum One Inch margins 12 point font, DOUBLE SPACED (very important) The major ideas, events, and people that led Americans to the decision to declare independence from england in 1776. the events before the Seven years War that helped

book : Business Statistics: Communicating with Numbers, 2nd Edition by

  book :  Business Statistics: Communicating with Numbers, 2nd Edition by Alison Kelly, Sanjiv Jaggia, ISBN-13: 9780078020551.    Homework 4 Q1. Business Statistics, Ch 7: Exercise 7.1 (pg236) Problem 3 Q2-5. Business Statistics, Ch 7: Exercise 7.2 (pg242) Problem 7, 11, 13, 15 Q6-10. Business Statistics, Ch 8: Exercise 8.1 (pg276)

This discussion is based on a story of an 18-month

This discussion is based on a story of an 18-month old named Josie King that lost her life because of a medical error. Josie’s mother used the settlement money to create the Josie King Foundation to help reduce the mortality rate by encouraging hospitals to adopt patient-safety programs. Instructions: Read

REQUIRED: 1. Using EDGAR import via PDF the company’s most

REQUIRED:  1.   Using EDGAR import via PDF the company’s most recent annual report on file. The company must be traded on a United States stock exchange, no exceptions!  2. Based on the CEO’s letter provide a 3-paragraph summary on the current challenges facing the company, how they are over coming

Examine Alexa’s skill in ordering drinks from Starbucks. Your response

Premium Paper Help is a professional writing service that provides original papers. Our products include academic papers of varying complexity and other personalized services, along with research materials for assistance purposes only. All the materials from our website should be used with proper references.

Scheduling and Controlling of a Project

The CIO wants to make sure that your project schedule is accurate, that the project does not bring big surprises that are not planned for, and that he can be sure that what is needed gets accomplished. Your presentation to the project staff on the PERT techniques was highly appraised.

In this unit, we surveyed a few Latinx Youth Movements

In this unit, we surveyed a few Latinx Youth Movements that organized around a myriad of social justice issues of their time. Specifically, with respect to the earlier iterations of the Latinx youth movements of the 1970s, a major component of their work involved the topic of “self-determination” and education

2 replies APA 250 words each must ***use a biblical

2 replies APA  250 words each must ***use a biblical reference as was as ***one scholarly reference. ***You must use student 1 and student 2, so I know whom reply is for which student.**   ****Student 1**** What are the components of a Mental Status Examination and what is its

*remember – this is your final assignment* The article on

*remember – this is your final assignment* The article on Photovoice assigned will illuminate the power of utilizing this technique.  I want you to leave this class with some tools you can use in the future.  Photovoice is a great tool for research, community building, program planning, and policy- making.

Discussion Focus: Ise is strongly organized around poetry. In your

  Discussion Focus: Ise is strongly organized around poetry. In your discussion with your peers this week, discuss with appropriate page citation and detailed reference to text… What is the relationship between poetry and prose in Tales of Ise? What characterizes the love relationship as depicted in Tales of Ise? Is

Develop a discussion project report (3 pages total) that captures

 Develop a discussion project report (3 pages total) that captures the Privacy Notice for the COVID-19 contact tracing app for use by commercial airlines for EU passengers traveling to the United State from the Article attached below. This writing should encapsulate, Privacy Notice Introduction,  How we collect, use, and share