Sunday, 10 March 2019

10/3/2018 COMP 809/10 Data Mining and Machine Learning  Assignment


Ok I'm going to get a bit ranty to start. I made a move to AUT to do work with IRASR, become a Professional Astronomer, and perhaps one day get a chance to work with the SKA the biggest Radio Telescope Human Beings have ever made.


Inspired by movies like Contact and 2001: A Space Odyssey, this young man decided to offer himself to the alter of the Astrophysical Sciences

I'm not, here to read papers about Data Mining related to the banking sector, loan eligibility assessment, Telecom Trouble Ticketing, ML for Self Driving Cars.

Honestly I thought I had left planet Earth behind already.

Kind of feels like that.
But here I am, doing a Series of Assignments for COMP 809/10 Data Mining and Machine Learning which requires me to read a bunch of papers on DM and ML and extract crucial information about them and put them in a two page report per paper.

Here are the list of Papers:


  1. Coussement, Kristof & Lessmann, Stefan & Verstraeten, Geert. (2016). A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry. Decision Support Systems. 95. 10.1016/j.dss.2016.11.007.
  2. Gerritsen, R. (1999). Assessing loan risks: a data mining case study.
  3. Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao: “End to End Learning for Self-Driving Cars”, 2016;
  4. M. R. Kraft, K. C. Desouza and I. Androwich, "Data mining in healthcare information systems: case study of a veterans' administration spinal cord injury population," 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the, Big Island, HI, USA, 2003,
  5. Nuno Carneiro, Gonçalo Figueira, Miguel Costa, A data mining based system for credit-card fraud detection in e-tail, Decision Support Systems, Volume 95, 2017, Pages 91-101, ISSN 0167-9236,
  6. Aslıhan Dursun, Meltem Caber, Using data mining techniques for profiling profitable hotel customers: An application of RFM analysis, Tourism Management Perspectives, Volume 18, 2016, Pages 153-160, ISSN 2211-9736,
  7. Y. Temprado, F. J. Molinero, C. Garcia and J. Gomez, "Knowledge Discovery from Trouble Ticketing Reports in a Large Telecommunication Company," 2008 International Conference on Computational Intelligence for Modelling Control & Automation, Vienna, 2008, pp. 37-42.
    doi: 10.1109/CIMCA.2008.116
  8. C. Leon, F. Biscarri, I. Monedero, J. I. Guerrero, J. Biscarri and R. Millan, "Variability and Trend-Based Generalized Rule Induction Model to NTL Detection in Power Companies," in IEEE Transactions on Power Systems, vol. 26, no. 4, pp. 1798-1807, Nov. 2011.
  9. R. M. Gardner, J. Bieker and S. Elwell, "Solving tough semiconductor manufacturing problems using data mining," 2000 IEEE/SEMI Advanced Semiconductor Manufacturing Conference and Workshop. ASMC 2000 (Cat. No.00CH37072), Boston, MA, USA, 2000
I have to choose 3 from among the 9 listed papers. One of which will be article [3] on Self Driving cars and Extract the following information from them to put in a 2 page report.

Background Information: Who was it that was doing the data mining, which organization ie; Google Deep Mind, Nvidia Corporation, or Universite Catholique De Lille etc...

Target Application: What Problem they were trying to solve. Usually this problem comes in the form of situations that human beings face every day such as whether a bank is deciding whether to issue a loan or not, Non Technical Losses in power Generation faced by Power Companies, or Semiconductor Manufacturing in a factory, determining the churn rate for Telecommunications Operators.

These are all situations that are relevant to every day life where data mining and machine learning can make a difference if applied properly. (Well have to read the paper to find out)

Description of the Data Collected: What are the data features/ feature vectors that the papers looked at, The Columns of Data that they collected.

Identification of Algorithm: Decision Tree, k-Nearest Neighbour,  k Mean Clustering, CNN. This is where we describe the machine learning model used. (No details required)

Pre-processing Method: How the data was pre-processed before being fed to the model. Massaging/ cleaning and filling in the blanks. Data Munging is the word. How was the data Munged.

Potential/ Actual Outcome: What was the expected result and what was the actual result. What we expect to see in the model output.

Organizational Benefit: What positive outcome was gained from Data Mining. Here is where we have to go into the nitty gritty details. We have to talk about the accuracy of the results and Potential/ Actual Benefits gained in terms of the $ and Time Saving.

Some critical thinking is involved here because your own reflection on the level of success achieved must be included. Argue the case.  With your knowledge on Data Mining and Machine Learning how could it have been done differently. Would you do it the same or tweak the model in a Certain way to get better results.



Now all we have to do is await the rise of our robotic Overlords

And that's my Assignment. Its not worth much, and there's no hands on exercise in it unlike the subsequent part 2 and unlike my other class ENSE 807 Digital Signal Processing. But it is due in 3 weeks. I have about 2 weeks to pen most of it down.

I'll keep you updated if I find anything interesting.

Sincerely yours,
SofOfTerra92


No comments:

Post a Comment

Diaries of an Aspiring Astrophysicist (DAS Astro) Podcast

Diaries of an Aspiring Astrophysicist Episode 1: The last year has been weird Episode 2: Cosmic Collisions and Gravitational Wa...