The increasing volume and nature of big datasets in business, economics, social and political sciences call for more complex and sophisticated mathematical and data mining tools. The complex systems monitored by big databases are successfully described in terms of networks. In this course we will present and discuss mathematical and data mining tools used to characterize large empirical or model networks. Large datasets will be computationally investigated and the limits of the used algorithms will be discussed. The assessment of the statistical validity of the observed results will be analyzed and, when possible, quantitatively evaluated. Besides the mathematical theory, the course will have a practical approach with homework and the development of a project. The project will involve data analysis and computer simulations in Python (with e.g. Jupyter notebooks).
Tentative course topics
Overview of the course. Basic concepts that will be treated in the course. Summary of prerequisite course (Statistical Methods in Network Science 1): Network science, probability and statistics.
2 Sampling and estimation in networks
Basic elements of statistical sampling theory. Induced and incident subgraph sampling. Star and snowball sampling. Estimation of original network graphs. Examples from social networks, Internet, and species detection.
3 Statistical paradoxes on networks
Friendship and generalised friendship paradox (in coautorship, social networks, and happiness studies). Majority illusion. Simpson’s paradox.
4 Motif detection
Motifs in a directed graph. Basic motifs and triad census. Efficient algorithms to find motifs. Optimality of the triangle-finding algorithm. Shuffling methods in directed networks. Application to financial crises and human mobility.
5 Signed networks and structural balance
Signed networks. Structural balance and triadic closure. Differences between positive and negative tie networks. Shuffling methods and null models. Applications to social, alliance, and biological networks.
6 Recommendation systems and link prediction
Problem description and evaluation metrics. Local similarity indices. Global similarity indices. Applications of link prediction in online social networks. Classification of partially labeled nodes.
7 Applications of maximum matching
Examples of maximum matching in shareability networks, network control theory, and organ exchange.
8 Spreading dynamics on networks.
Epidemic spreading. Diffusion of ideas and innovations. Opinion formation, consensus and polarization. Role of network structure in spreading dynamics. Data-driven examples.
9 Guest lecture
Selected topic in statistical methods in network science by a guest lecturer. To be announced.
10 Project presentations by students
- Newman, Mark. Networks: Second edition. Oxford University Press, 2018.
- Latora, Nicosia, Russo. Complex Networks. Cambridge University Press, 2017.
- Kolaczyk. Statistical Analysis of Network Data. Springer, 2010.
- Easley, Kleinberg. Networks, crowds, and markets, Cambridge University Press, 2010.
- Leskovec, Rajaraman, Ullman. Mining of Massive Datasets. Cambridge University Press
A list of papers and online resources will be provided during class.
Further information, such as the course website, assessment deadlines, office hours, contact details, etc. will be given during the course. The instructor reserves the right to modify this syllabus as deemed necessary any time during the term. Any modifications to the syllabus will be discussed with students during a class period. Students are responsible for information given in class.