Data Mining and Big Data Analytics

Course Level: 
Master’s
Doctoral
Campus: 
Vienna
Course Open to: 
Students on-site
Academic Year: 
2020-2021
Term: 
Winter
US Credits: 
2
ECTS Credits: 
4
Course Code: 
DNDS 6005
Course Description: 

Data mining and big data analytics is a core subject in data science with the aim to develop methods to examine sizable and multivariate datasets. Their common purpose is to uncover hidden patterns, unknown correlations and other useful information useful to make better decisions. In this course we will introduce methods of data aqusition and concepts of data mining, machine learning and big data analytics. We will cover the key data mining methods of clustering, classification and pattern mining are illustrated, together with practical tools for their execution. We will also demonstrate the applications of these tools on real datasets, to show how they can help us to analyse the digital traces of human activities at societal scale, to understand and forecast many complex socio-economic phenomena. The course will have a hands-on approach, with homeworks, practical classes and with the development of a project. Students are free to work in any computer language/network software they feel most comfortable. However, during the class all examples and sample code will be provided in Python and Jupyter notebooks, thus the use of Python is strongly encouraged.

You need to be proficient with Python to take this course – read the “Prerequisites” section below. Basic programming skills and basic skills in statistics and linear algebra are required.

Learning activities and teaching methods

The course is given as an alternation between lectures and practical sessions in order to develop skills in data management and application of data mining techniques. More specifically there are two hands-on sessions during the course. In addition, the students need to complete homework, special assignments and a final project. 

Learning Outcomes: 

The aim of the course is to provide a basic but comprehensive introduction to data mining. By the end of the course students will be able to:

  • Design basic data collection strategies and obtain data from a number of open data sources
  • Choose the right algorithms for data science problems
  • Demonstrate knowledge of statistical data analysis techniques used in decision making
  • Apply principles of Data Science to the analysis of large-scale problems
  • Implement and use data mining software to solve real-world problems

What you will NOT learn in this course: This course is about the methods and algorithms to find information in the data. It will not provide you advanced coding and data visualization skills, neither training on data handling and database management. For learning to code, consider attending DNDS 6288 Scientific Python. For learning to visualize data, consider attending DNDS 6002 Data and Network Visualization.

Assessment: 

Students are expected to attend lectures and hands-on sessions, to hand in 1 to 3 assignments during the course and to develop a project during the entire term.

Grading:

  • Attendance of the classes and hands-on sessions: 30% of the final grade
  • Assignments: 30% of the final grade
  • Final project: 40% of the final grade
Prerequisites: 

This course has a focus on data mining and big data analytics. As such, we use a programming language, Python, to solve real world learning problems and extract knowledge from real datasets. Since we need to pick one programming language for the course, we require students to prove proficiency with Python before the course starts, in one of the following ways:

a) Have passed the course DNDS 6288 Scientific Python.

b) Take a MOOC course on programming with Python and show the certificate. I recommend the course on Code Academy, however other courses are also fine. Please bring the syllabus of the course together with the certificate.

c) Show and discuss a project you developed in Python. Projects from someone else (web, friend, previous students) are not considered.

If you use options b) or c): if there is a waiting list for the course, the certificate or the project must be shown before the beginning of the term to hold a place among the regular attendees. If there is no waiting list, it is fine to provide the certificate or show your previous project before the course begins. However, the instructor holds no responsibility in case you do not satisfy the prerequisite and need to drop the course.