Project 3 – Ensemble Methods and Unsupervised Learning In this project you will explore some techniques in unsupervised learning as well as ensemble

Project 3 – Ensemble Methods and Unsupervised Learning

In this project you will explore some techniques in unsupervised learning
as well as ensemble methods. It is important to realize that understanding
an algorithm or technique requires understanding how it behaves under a
variety of circumstances. You will go through the process of choosing and
exploring two classification datasets, tuning the algorithms you have
learned about, writing a thorough analysis of your findings, and presenting
your findings. The most crucial part of this assignment is the analysis and
your ability to explain and justify your results.

I. Choosing Datasets

The first task in this assignment is choosing two interesting classification
datasets, these can be binary or multiclass. The features can be of any
type, and it is recommended that you choose datasets with diverse feature
sets. I don’t care where you get the data from. You can download some,
take some from your own research, or make some up on your own. What I
do care about is that the datasets must be interesting. They should
contain a decent amount of features and a sufficiently large amount of
examples. Do not choose an “easy” dataset, however don’t go crazy either
trying to find the perfect one. Your two datasets should also differ in some
way such that you can compare and contrast your results between the
two. You should also be following standard machine learning practice by
splitting your dataset into training and testing, and only touching the
testing dataset at the very end when you are ready to report results. (Cross
validation is highly recommended).

II. Coding (10%)

After choosing your datasets you will now be tasked with writing code to apply
the machine learning algorithms you have learned about. Your code must be
written in python, but you may use any libraries that have already implemented
the machine learning algorithms (e.g scikit-learn). You are not expected to code
the algorithms from scratch, and in fact I would highly discourage it. What you
may not do is copy code from the internet. Below are the analyses you are
required to run.

1) Run K-means and Hierarchical Clustering on your datasets and analyze
what you observe.

2) Run two dimensionality reduction algorithms (PCA and t-SNE) on your
datasets. Observe and analyze the results.

3) Re-run the K-means and Hierarchical Clustering on your dimensionality
reduced datasets and compare the results to part (1).

4) Tune and train two ensemble models (AdaBoost and Random Forests) on
both your original and dimensionality reduced datasets. Compare and
analyze the results.

Your code does not have to be pretty or well written. However, it must be written
in python and I must be able to run one script (main.py) that will produce all the
results and figures in your report.

III. Report (80%)

You will then produce a report describing and analyzing your methods and
results. Here you will describe the datasets you have chosen and why they are
interesting. You will then provide an analysis on how the different machine
learning algorithms performed on each dataset. The report must be limited to 10
pages maximum. Plots and figures are highly recommended. It is up to you
how you wish to demonstrate your understanding of the machine learning
algorithms you have explored, but below I have listed some potential ideas for
analysis and items you may wish to include in the report.

• A description of your two datasets and why you feel that they are interesting.

• Hypotheses on how you believe the learning algorithms will perform on each

dataset and why.

• How you dealt with different features in your datasets? missing data? different

scalings?

• Training and testing error rates you obtained for your various learning

algorithms (some sort of cross validation is highly recommended)

• The effect of hyperparameters on performance

• Comparing and contrasting results between datasets

• Comparing and contrasting results between learning algorithms

• Training and testing error rates as a function of training dataset size

• Timing analysis of how long it takes to train/test each algorithm

• Conclusions

• Ideas for future analyses

• What you may have done differently

• References

You are NOT being graded on how well the algorithms perform on your datasets.
What is most important is WHY? You should be explaining and justifying all of
your figures and results, and demonstrating that you understand the intricate
details of the machine learning process, and the machine leaning algorithms you
are using.

IV. Presentation (10%)

Finally you will give a maximum 7 minute presentation of your results (You will be
cut off exactly at the 7 minute mark). In this presentation you will describe your
datasets, your methods, and any interesting results you found!

What to turn in?

Below is a list of items you will be required to turn in via canvas. Please make
sure all documents are named as described bellow.

• report.pdf – Your maximum 10 page report in pdf format. Do not use super
tiny or large font. No specific formatting is required but use common sense.

• presentation.pptx or presentation.key – Your presentation slides either in a
powerpoint or keynote document.

• code.zip – A zip file with all of the code you have written. Within the folder
there should a file called README.txt that contains instructions on how to run
your code, and a python file called main.py that will produce all figures and
plots in your report/presentation. I should be able to reproduce your results
easily.

• data.zip – A zip file that contains the two datasets you have chosen.

Grading

You are being scored on your analysis more than anything else. Roughly
speaking, implementing everything and getting it to run is worth very little for
this assignment. Of course though, analysis without proof of working code
makes the analysis suspect. The key thing is that your explanations should be

both thorough and concise, and your analysis should prove to me you have a
deep understanding of the machine learning process and the machine learning
algorithms you are using.

Share This Post

Email
WhatsApp
Facebook
Twitter
LinkedIn
Pinterest
Reddit

Order a Similar Paper and get 15% Discount on your First Order

Related Questions

Formation and development of Chicana/o and Mexican origin people

 Instructions: Respond to the prompt and follow these directions carefully. This is a take home exam. Put your name and section number in the heading of your document and include page numbers. The midterm consists of one 2-page paper, and one 3-page paper. Both papers must use quotations from, and

Foundations of Feminist – Premium Paper Help

This graduate course on Foundations of Feminist Research examines feminist perspectives on methodologies for conducting and interpreting research. Historically, feminist research expanded the theories and objects of investigation to a focus on women.  This work then moved towards recognizing gender disparities. Eventually, this work moved towards contemporary critiques that included

Unit 6 organizational comm

Description Instructions For this assignment, you will write a case study analysis that focuses on the communication strategy of an organization of your choice. In this assignment, you will need to a) summarize the communication strategy of your chosen organization, b) analyze the communication strategy of your chosen organization, and

Use the Internet to research how to make an authoritative,

Use the Internet to research how to make an authoritative, but concise, PowerPoint presentation. Respond to the following: Why is it important that you know how to create an effective presentation? What common mistakes do people make when presenting information to a group? What tips did you learn about effective

In this pandemic, our country is faced with much ethnic/racial/political

  In this pandemic, our country is faced with much ethnic/racial/political unrest. The wailing for fairness in justice and healthcare is blatant. There is clear uncertainty regarding our value for  human life, for some Americans struggle to believe that their lives really matter.  As caring nurses in our College, please

This activity will allow you to practice making recommendations, which

This activity will allow you to practice making recommendations, which is a skill needed for the final project. In order to prepare for this assignment, view the video How to Analyze a Case Study. Then, examine the case study Artesanias de Colombia. What best practices led to program success? Use

Final reflection on the role of advanced practice nurse You

Final reflection on the role of advanced practice nurse You have written your perception of the APN role at the beginning of the semester.  Upon completing the role class, any change in your perception about the role of APN?  Any suggestion or action that we can take to strengthen our role in health care? Please

Florida Project Review

Instructions: Write a well-organized and thoughtful review of ONE of the following works: 1. Play – Offensive to Some by Berni Stapleton. NOTE: this play runs at the LSPU HALL from January 24 – 27. Please get your tickets as soon as possible. 2. Film – The Florida Project (Dir.

You explored the significance of ethical considerations and their implications

  You explored the significance of ethical considerations and their implications for business decisions. Today’s organizations are no longer in the situation of being able to disregard or become oblivious to the prevalence of unethical behaviors and misdemeanors. It’s now time to unpack the ethical considerations that organizations must not

Option 1: Considering all you have learned so far, why

Option 1: Considering all you have learned so far, why do you need a moral compass? Considering all of the theories (or any others you have researched) which theory or theories best fit your personal disposition and how can you use this theory or theories to develop criteria for determining the

Describe and provide a detailed explanation and summary of the

Describe and provide a detailed explanation and summary of the following theories and free expression issues.  This assignment will require you to conduct a research on the Internet.  You will research the topics below and as a minimum use several academic papers for the topics listed from the JSTOR database

Writing a persuasive message about a company that you would

  Writing a persuasive message about a company that you would like to work for and why you are a good fit for that company and want to work there. Look up and do some research on a company that you think you would like to work for. You should

For this assignment, you will conduct an article review of

 For this assignment, you will conduct an article review of the article listed below. This review should be three (3) pages double-spaced, 12-point font. The title page and the reference page(s) do not count towards the three (3) pages double-spaced, 12-point font. The review should contain:    summary: State the

Consider the Criminal Injustice podcast, the Scientific American article and

   Consider the Criminal Injustice podcast, the Scientific American article and the compilation article by Radley Balko on racism in the justice system, each of these discusses either systemic bias or faulty evidence/ confirmation bias in the justice process. What surprised you, if anything, in these three pieces? (And if