Text Analysis Across Disciplines - Data Collection and Curation

Course Description: 

This is a University-wide Course and it is open to all CEU students.

In the era of pervasive and networked computing, data curation is an increasingly important practice. It might be defined as “the active and on-going management of data through its lifecycle of interest and usefulness.” The practices and consequences of data curation ramify throughout all stages of the assembly and use of data sets—digital, textual, visual, or otherwise—and accordingly this course aims to create a dialogue between creators, users, and subjects of data, and offers a critical humanistic and/or social science perspective on the creation and use of the data which drives so much discovery today.

Because of the drive towards interdisciplinary methodologies, the traditional distinction between the way we train as social scientists and as humanists is beginning to disappear. Some interesting questions come up that traverse disciplines: what do we consider to be ‘data’? how do we collect it as researchers? and how do we curate it as scholars? These questions can be asked at any stage of research, and of traditional, print-based scholarship as well as digital humanities projects and computation-based scholarship more broadly. This course will focus on the task of turning raw materials – both analog and digital – into a usable dataset for research purposes, surveying how this works across disciplines, and asking critical questions of the move toward data-based research in both the humanities and social scientists.

The broad and compelling insight that motivates the current course is that data itself is richly contextualized both at the site of its creation and the site of its use, and these social, cultural, and personal contexts are essential to understanding the forms which it takes, the uses to which it can be put, and the costs and benefits of making decisions based on data analysis.

Learning Outcomes: 

By the end of this course, students will be able to:

  • categorize and critique data-driven research in their respective field
  • scan and OCR a range of difficult materials (manuscript or older typeset materials)
  • distinguish between different metadata standards and protocols
  • compare current data privacy protocols in different parts of the world
  • identify basic principles of database design, both standard and relational
  • mark up a document with TEI or other mark-up alternatives
  • conduct preliminary data analysis on a curated dataset
  • ask research questions of curated data

(1) Presence and participation (30% of the final grade) In addition to regular attendance, students are expected to actively contribute to class discussions, online discussions, and notify instructors of any expected absence will ahead of time.

(2) Profile: Data in your Discipline (25% of the final grade) This will be a short sketch (2-3 pages, or equivalent in graphic form) of how the concept of data has been introduced and changed in your discipline over time. Social and cultural contexts will be key to shaping this narrative, as well as any relationship you can draw with larger disciplinary trends. Due Nov 1.

(3) Final Project: Dataset with Research Agenda (45% of the final grade). Over the second half of the semester, students will identify a set of materials relevant to their area of interest, collect and curate a sample dataset, and propose a project that would use this dataset with a clear research agenda. Every project should be by definition interdisciplinary; we will workshop ideas in class and discuss how this could work. In the final project, possible methodologies for analyzing the data could be included, as well as the relevance and possible impact on the field. On the last day of class, we will workshop and discuss each student’s dataset in class; full proposals are due at the end of the semester.



File attachments: