Data Management with Python

Course Description: 

Instructor: Anikó Hannák (hannaka@ceu.edu, office hours: Tuesdays 4:00pm-5:30pm by appointment)
Credits: 2 (4 ECTS)
Term: Winter 2017-2018

This course will provide a comprehensive introduction to programming with Python, starting from the basics. Beyond confidently using Python, the class will focus on solving problems around Data Processing and Analysis. Additionally, we will discuss what types of problems Python is the right choice and how to further extend your knowledge after the class. The overarching goal is to equip students with enough programming experience to start working in any area of computation and data-intensive research.

The course will run with a mix of theoretical classes and hands-on sessions organized into 12 classes of 100min. Use of a computer will be required during most of the lectures. Students can use their own laptops or the facilities provided by CEU. No prior programming experience is required. However, the classes use knowledge and skills from the previous ones thus it is important to attend all classes. We will provide extra exercises between classes in the form of homework. The course can accommodate a maximum of 30 students.

Course Instructor: Dr. Aniko Hannak, hannaka@ceu.edu

Teaching Assistant: Anna May, May_Anna@student.ceu.edu - please send her all inquiries about schedule, classroom and technical requests.

Tentative calendar

Week 1

  • Why Python? Notebook, basic usage, variables
  • Arrays, if/else, logical operators, loops

Week 2

  • File operations, string operations, parsing
  • Working with various data structures

Week 3

  • Functions, libraries, complex exercises
  • Statistics with Numpy

Week 4

  • Web scraping, using APIs
  • Storing data: csv, json, html. Parsing

Week 5

  • Data cleaning using Pandas, data transformations
  • Data analysis using Pandas

Week 6

  • Data visualization using MatPlotLib
  • Additional useful and fun python tools

Suggested reading

  • Online resources and documentation provided during classes - Bill Mark Lutz, Learning Python, O’Reilly (2013)
  • Also available for free online
  • Bill Lubanovic, Introducing Python, O’Reilly (2014)
  • Wes McKinney, Python for Data Analysis, O’Reilly (2013)

Further information, such as the course website, assessment deadlines, office hours, contact details etc. will be given during the course. The instructor reserves the right to modify this syllabus as deemed necessary any time during the term. Any modifications to the syllabus will be discussed with students during a class period. Students are responsible for information given in class.

Learning Outcomes: 

By the end of the course, students will have experience with techniques which are vital to effective data management:

  • The basic syntax and use of Python as a data analysis tool, including writing and executing scripts to automate common tasks, using the IPython interpreter for interactive exploration of data and code, and using the Jupyter notebook to share and collaborate.
  • Loading data from a variety of common formats
  • Manipulating data efficiently with Pandas
  • Basic web scraping
  • Use of web APIs
  • Use of special python packages such as data visualization libraries.
Assessment: 
  • Students shall not miss more than 2 classes. Failing to do so will yield an administrative fail grade. (If you have a major impediment, please contact the Instructor.)
  • To pass, students will need to get at least 50% of the overall grade. The grade will be based 80% on homework and 20% on class participation.
Prerequisites: 

No pre-requisites.