Data mining and big data analytics is a core subject in data science with the aim to develop methods to examine sizable and multivariate datasets. Their common purpose is to uncover hidden patterns, unknown correlations and other useful information useful to make better decisions. In this course we will introduce methods of data aqusition and concepts of data mining, machine learning and big data analytics. We will cover the key data mining methods of clustering, classification and pattern mining are illustrated, together with practical tools for their execution. We will also demonstrate the applications of these tools on real datasets, to show how they can help us to analyse the digital traces of human activities at societal scale, to understand and forecast many complex socio-economic phenomena. The course will have a hands-on approach, with homeworks, practical classes and with the development of a project. Students are free to work in any computer language/network software they feel most comfortable. However, during the class all examples and sample code will be provided in Python and Jupyter notebooks, thus the use of Python is strongly encouraged.
The aim of the course is to provide a basic but comprehensive introduction to data mining. By the end of the course students will be able to:
• Design basic data collection strategies and obtain data from a number of open data sources;
• Choose the right algorithms for data science problems;
• Demonstrate knowledge of statistical data analysis techniques used in decision making;
• Apply principles of Data Science to the analysis of large-scale problems;
• Implement and use data mining software to solve real-world problems.
Students are expected to attend lectures and hands-on sessions, to hand in 1 to 3 assignments during the course and to develop a project during the entire term.
Grading:
• Attendance of the classes and hands-on sessions: 30% of the final grade
• Assignments: 30% of the final grade
• Final project: 40% of the final grade
Completion of the course on Data Analysis in Python is required.