ZeroR, OneR, J48, IBK and construct a table of the training and cross-validation errors

PART 1: CLASSIFICATION 20 marks
1. This part of the assignment is concerend with the the file: /KDrive/SEH/SCSIT/Students/Courses/COSC2111/DataMining/
data/other/bank-balanced.csv.
There is a description of the data in the file bank-names.txt in the same direc-
tory.
2. Run the following classifiers, with the default parameters, on this data: ZeroR,
OneR, J48, IBK and construct a table of the training and cross-validation errors.
You can get the training error by selecting “Use training set” as the test option.
What do you conclude from these results?
Run No Classifier Parameters Training Cross-valid Over-
Parameters Error Error Fitting
1 ZeroR None 30.0% 30.0% None
3. Using the J48 classifier, can you find a combination of the C and M parameter
values that minimizes the amount of overfitting? Include the results of your best
five runs, including the parameter values, in your table of results.
4. Reset J48 parameters to their default values. What is the effect of lowering the
number of examples in the training set? Include your runs in your table of re-
sults.
5. Using the IBk classifier, can you find the value of k that minimizes the amount
of overfitting? Include your runs in your table of results.
6. Try a number of other classifiers. Aside from ZeroR, which classifiers are best
and worst in terms of predictiveaccuracy? Include 5 runs in your table of results.
7. What are the implications of the above range of accuracies for developing a bank
application using classification techniques?
8. Compare the accuracy of ZeroR, OneR and J48. What do you conclude?
9. What golden nuggets did you find, if any?
10. [OPTIONAL] Use an attribute selection algorithm to get a reduced attribute set.
How does the accuracy on the reduced set compare with the accuracy on the full
set.
Data Mining 1 25-Jul-2016
Submit: Up to two pages that describe what you did for each of the above ques-
tions and your results and conclusions.

PART 2: NUMERIC PREDICTION 10 marks
1. Numeric Prediction of the Balance attribute in the bank data of part 1.
2. Run the following classifers, with default parameters, on this data: ZeroR, MP5,
IBk and construct a table of the training and cross-validation errors. You may
want to turn on “Output Predictions” to get a better sense of the magnitude of
the error on each example. What do you conclude from these results?
3. Explore different parameter settings for M5P and IBk. Which values give the
best performance in terms of predictive accuracy and overfitting. Include the
results of the best five runs in your table of results.
4. Investigate three other classifiers for numeric prediction and their associated pa-
rameters. Include your best five runs in your table of results. Which classifier
gives the best performance in terms of predictive accuracy and overfitting?
5. What golden nuggets did you find, if any?
Submit: Up to one page that describes what you did for each of the above ques-
tions and your results and conclusions.

PART 3: CLUSTERING 10 marks
1. Clustering of the bank data of part 1.
For this part use only the attributes Age, Marital, Education and Balance.
2. Run the Kmeans clustering algorithm on this data for the following values of K:
1,2,3,4,5,10,20. Analyse the resulting clusters. What do you conclude?
3. Choose a value of K and run the algorithm with different seeds. What is the
effect of changing the seed?
4. Run the EM algorithm on this data with the default parameters and describe the
output.
5. The EM algorithm can be quite sensitive to whether the data is normalized or
not. Usethewekanormalizefilter(Preprocess –> Filter –> unsupervised
–> normalize) to normalize the numeric attributes. What difference does
this make to the clustering runs?
6. The algorithm can be quite sensitive to the values of minLogLikelihoodImprove-
mentCV minStdDev and minLogLikelihoodImprovementIterating, Explore the effect
of changing these values. What do you conclude?
7. How many clusters do you think are in the data? Give an English language
description of one of them.
8. Compare the use Kmeans and EM for clustering tasks. Which do you think is
best? Why?
9. What golden nuggets did you find, if any?
Submit: Up to one page that describes what you did for each of the above ques-
tions and your results and conclusions.

PART 4: ASSOCIATION FINDING 10 marks
1. The files supermarket1.arff and supermarket2.arff in the folder
/KDrive/SEH/SCSIT/Students/Courses/COSC2111/DataMining/data/arff
contain the same details of shopping transactions represented in two different
ways. You can use a text viewer to look at the files.
2. What is the difference in representations?
3. Load the file supermarket1.arff into weka and run the Apriori algorithm on
this data. You will need to restrict the number of attributes and/or the number
of examples. What significant associations can you find?
4. Exploredifferentpossibilitiesofthemetrictypeandassociatedparameters. What
do you find?
5. Load the file supermarket2.arff into weka and run the Apriori algorithm on
this data. What do you find?
6. Exploredifferentpossibilitiesofthemetrictypeandassociatedparameters. What
do you find?
7. Try the other associators. What are the differences to Apriori?
8. What golden nuggets did you find, if any?
9. [OPTIONAL] Can you find any meaningful associations in the bank data?
Submit: Up to one page that describes what you did for each of the above questions
and your results and conclusions.

Are you looking for a similar paper or any other quality academic essay? Then look no further. Our research paper writing service is what you require. Our team of experienced writers is on standby to deliver to you an original paper as per your specified instructions with zero plagiarism guaranteed. This is the perfect way you can prepare your own unique academic paper and score the grades you deserve.

Use the order calculator below and get ordering with idealtermpapers.com now! Contact our live support team for any assistance or inquiry.

Type of paper Academic level Subject area
Number of pages Paper urgency Cost per page:
 Total:

Purchase Guarantee

Why ORDER at IdealTermPapers.com?

  • Educated and experienced writers.
  • Quality, Professionalism and experience.
  • Original Content writing.
  • Best customer support.
  • Affordable Pricing on orders.
  • Thorough research.
  • Ontime delivery of finished work.
  • 100% plagiarism free papers.

Reasonable Prices

  • To get the best quality papers isn’t cheap so don’t trust extremely low prices.
  • We can’t claim that we have unreasonably low prices because low prices equal to low quality.
  • Our prices are good and they balance with the quality of our work.
  • We have a Moneyback guarantee.

Original and Quality work

  • Our writers are professionals and they write your paper from scratch and we don’t encourage copy pasting.
  • All writers are assessed and they have to pass our standards for them to work with us.
  • Plagiarism is an offence and it’s never tolerated in our company.

Native Writers plus Researchers

  • Our writers are qualified and excellent and will guarantee the best performance in your order.
  • Our team has writers who have master's and PhD qualifications who can handle any assignment
  • We have the best standards in essay writing.

We have been in business for over 7 syears

  • We have always served our customers from all over the world and they have continued to order with us.
  • We value our customers since they have trusted us to do their assignments.
  • We are competent in our writing gained from experience over the years
  • Our company has 24/7 Live Support.

You will get

  •  Custom Admission Essay written by competent professional English writers.
  •  Free revisions according to our revision policy if required
  •  Paper format:  275 words per page, Times New Roman font and size 12, doublespaced text and1 inch margin
  •  On time delivery and direct order download
  •  Privacy guaranteed

We can help you:

  •  acquire a comprehensive professional presentation.
  •  get a unique and remarkable content as per your instructions.
  •  Get an additional portion that can be included to your existing presentation;
  •  turn your work in to an eyecatching presentation with well communicated ideas.
  •  improve your presentation to acquire the best professional standards.