Syllabus

Course details

Course description

Data are increasingly prevalent in politics, yet using data to learn about the events of the world is complex and challenging. In this class, we will primarily focus on two components of this process. The first part of the course will focus on R programming skills relevant for data collection and data preprocessing, which often takes a lion’s share of time when working with datasets. The second part will be devoted to statistical modeling and inference and expand your methodological knowledge in regression analysis beyond linear regression. In particular, we will cover some generalized linear models, which will allow you to model binary and count dependent variables. You will also learn the tools to visualize the results of our statistical analysis and communicate it in meaningful and well-interpretable way.

In this course, you will use the open-source and free statistical programming language R—one of the most popular, sought-after, and in-demand statistical programming languages, along with other professional tools for data analysis, such as RStudio, git, and GitHub.

Learning objectives

By the end of this course, you should be able to:

  • use the statistical programming language R for data wrangling and analysis
  • translate your social science theory into a statistical model
  • use directed acyclic graphs (DAGs) to build causal models and use them when building statistical models
  • create meaningful visualizations to communicate insights from statistical analysis
  • describe your analysis in research papers
  • use Quarto (and RMarkdown) to write reproducible reports
  • use git and GitHub for version control and collaboration

Course materials

There is no single textbook for this class, and I will be assigning chapters nd video from various resources. All of the material will be available to you via this website. I will sometimes ask you to watch videos rather than read book chapters. Yet the most important part of the class will be coding and scripts, which will be made available to you via GitHub.

There will be both required and suggested material. While required texts and videos I expect everyone to be familiar with, the suggested ones are there in case you want to dive deeper into a particular topic.

Books & Videos

We will primarily use the books listed below. Most of the time these would be suggested rather than required readings:

While not the direct focus of this course, visualization is an essential part of working with data. Should you require more resources on the topic beyond the ones I provide you directly in class, you are welcome to look at these sources:

  • Kieran Healy. Data Visualization: A Practical Introduction (Princeton, NJ: Princeton University Press, 2018), http://socviz.co/.
  • Claus O. Wilke. Fundamentals of Data Visualization. (O’Reilly Media, 2019), https://clauswilke.com/dataviz/.
  • Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen. ggplot2: Elegant Graphics for Data Analysis. (in progress) 3rd edition. Springer, 2021. https://ggplot2-book.org/.

Software

We will learn data analysis by doing data analysis. We will use the open-source (and free!) statistical programming language called R. R is very popular in data science and statistics, it makes beautiful graphics, and it is great fun. You will use RStudio as the main program to access R. Think of R as an engine and RStudio as a car dashboard—R handles all the calculations and the actual statistics, while RStudio provides a nice interface for running R code.

To make our workflow with R more efficient, we will also be using a version control tool called git and its online hosting GitHub. In a nutshell, think of this as Google Docs, but for code and on steroids. I will use GitHub to distribute material for our sessions, hand out and collect homework assignments, and provide you with solutions and feedback back on your homework when necessary.

You can find instructions for installing R, RStudio, and git here.

Getting Help

Most of you will need help at some point and I want to make sure you can identify when that is without getting too frustrated and feel comfortable seeking help when you need it. Programming-related issues deserve a special mention: I encourage you to try searching for an answer online on your own first. The reason for this is that learning to google R and git errors efficiently is an essential skill for anyone who codes, and you should try to learn it as early as possible.

Classes

If you have a question during the class, don’t hesitate to ask it! There are likely other students with the same question, so by asking you will create a learning opportunity for everyone.

Office hours

If for whatever reason you’d rather come and talk in person rather than post on the GitHub discussions, you can do this during the office hours. Considering the deadline for homework in on Monday, I will make sure to keep the slot on Mondays and Wednesdays, 15:30-17:00, for the office hours. This is, however, not the only time I am available. You can always just pick from the time slots that work for me on that particular week and come at a different day/time. You can come in groups if you want. Just note that you’ll need to let me know you’re coming at least 6 hours in advance.

You are also welcome to sick help from our TA, David. Also just pick from the time slots that are available on this page.

Slack

Have a question that can’t wait for office hours? Prefer to write out your question in detail rather than asking in person? Our Slack workspace is the best venue for these. Once you join the GitHub organization for this course, you will have access to the discussion tool. There is a chance another student has already asked a similar question, so please check the other posts in the channel before adding a new one. If you know the answer to a question that is posted, I encourage you to respond and if someone’s response was helpful, you can mark it as an answer!

Email

Please refrain from emailing any course content questions (those should go to Slack), and only use email for questions about personal matters that may not be appropriate for the public course forum (e.g., illness, accommodations, etc.). For such matters, you may email me at semenova@uni-mannheim.de. Barring extenuating circumstances, I will respond to emails within 48 hours Monday - Friday. Response time may be slower for emails sent Friday evening - Sunday.

Online Resources

Computer programming can be frustrating because computers are stupid, and error messages are not always helpful. Fortunately, help is out there. StackOverflow is a Q&A site dedicated to helping people with their programming problems. RStudio Community is a similar concept, but tailored for people using RStudio (which applies to us). If you see these websites when you are googling your errors, these would quite likely contain some helpful info.

Searching for help with R on Google can sometimes be tricky because the program name is basically a single letter. Google is generally smart enough to figure out what you mean when you search for “r scatterplot”, but if it does struggle, try searching for “rstats” instead (e.g. “rstats scatterplot”).

Course structure

In-person classes

We meet weekly from 13:45–15:15 on Wednesdays in B317 in A5, 6. There are no PCs in the room so to follow along during the labs, you will need to bring a laptop. Alternatively, you could work from a tablet with a cloud version of Rstudio, but this would be less convenient.

I expect you to come to the sessions prepared, which means having completed everything that is listed for the respective week in Material section of the website. We will have a short lecture-part and then we will move to code-along part and other activities.

Homework Assignments

The only way to learn programming is through practice, which means you will have coding homework assignments. These problem sets will not be long but will be frequent, so you will need to submit them weekly. To pass, you need to get more than half of the points for that assignment.

Your first few assignments will be individual as it’s important that everyone gets comfortable with the workflow. As we progress during the semester, we will switch to group assignments. You can work in groups of 2-3 students, but if you prefer, you can continue working on your own.

I will distribute homework assignments via GitHub on Wednesdays (usually by 18:00). You will have until next Tuesday, 22:00 to complete the problem set. You will need to upload your solutions to GitHub by the deadline.

After the deadline, I will provide you with suggested solutions and it will be your job to go over them and compare your answers by that week’s Friday (also via GitHub). The TA and I will randomly select a few assignments each week to verify that you did not over- or under-evaluate yourself. In case of more than a 2-point difference between our and your evaluations, that homework assignment will be granted zero points. You are more than welcome to ask me questions if you are not sure when evaluating yourselves.

This approach allows for you to get the feedback quickly and for me to be sure that you have seen and understood the correct solutions without spending time on it in class.

Content-wise, the problem sets will be primarily coding tasks, but they will also include theoretical questions related to readings and material we cover in class.

Attendance and participation

Attendance and participation are important to your success in this course. You’re expected to come to class each Wednesday prepared—having read the material and watched the videos—and ready to discuss the content and work with R.

However, if you are sick, please just let me know you’re not coming if you can (so I can adjust the activities in class if needed) and stay home.

Since active participation is listed as a coursework requirement, I will take attendance at some point during the session.

Evaluation and Grading policy

Final Paper

Grades for this class are intended to assess your learning about (1) data and code, (2) scientific and critical reasoning, and (3) statistical reasoning. Following university regulations, 100% of your grade will be determined by a final paper. Make sure to sign up for the examination via Portal2 in addition to registering for the class during the semester (otherwise, I cannot submit your grade).

Your final project will be a data essay: you will get raw dataset(s), a research question, and one or two hypotheses based on a short theoretical story. You will need to select a research design to test the hypotheses and thus answer the research question to the best of your ability with the data provided. Essentially, you will be doing the same thing as writing your own paper, but you skip the part of developing a theory and focus on translating and testing it. The project must be completed individually, without any cooperation. You will have until December 16, 2023 to complete the data essay1.

Data essay is designed to primarily evaluate the skills gained in this class, hence the focus on data analysis and presentation. If you are in your final semesters and feel like you would benefit more from working on a paper of your own, you are welcome to discuss this option with me before November 15, 2023, so we can agree on the option that would be a relevant evaluation for this class but also helpful for you. Please beware that this option is potentially more time- and effort-consuming, so make the decision taking this into account.

I can only evaluate the final papers if you pass all the coursework requirements during the lecture period. This means passing every homework assignment (aka problem set):

Homework assignments

You need to pass all of the homework assignments. Please note that handing in the assignment is necessary but not sufficient for passing. Should the homework be below the standard of passing, please reach out to me right away so we can go over the assignment togethr and clarify all the issues. We will use a point scale for evaluation and you will need to have more than half of the total points for that homework to pass. For group assignments, each student is expected to make contributions (aka commits) to the homework repository, but all team members who contributed to the repository receive the same point score.

As an extra incentive, the student(s) who will have the highest score for the problem sets at the end of the course will get a final grade improvement of 0.3 points. There are no adverse effects for others, so you can still get a 1.0 for the class without the bonus. See Late work and extensions for more info.

Course policies

Be nice. Be honest. Don’t cheat.

Late work and extensions

The due dates for assignments are there to help you keep up with the course material and to ensure that you can get feedback in a timely manner. Since I will be providing solutions for everyone after the deadline, this eliminates the possibility of individual extensions. If you require accommodations or extensions, please reach out to me in advance.

For the final project, the deadline is a hard one and as a rule no late work will be accepted.

Collaboration

Collaboration and consultation with your classmates will help you learn. I encourage good-natured collaboration on assignments, but you must turn in your own work (or the work of your group). For individual assignments, you are welcome to discuss the assignment with classmates at a high level (e.g., discuss what’s the best way for approaching a problem, what functions are useful for accomplishing a particular task, etc.). However you may not directly share answers to homework questions (including any code) with anyone other than myself. Plagiarism is easy to spot—I know because I have spotted it in the past—and will not be tolerated. Assignment sheets may contain additional guidance on acceptable forms of collaboration, and I am more than happy to answer your questions on acceptable forms of collaboration.

Sharing and reusing code

I am well aware that a huge volume of code is available on the web to solve any number of problems. Unless I explicitly tell you not to use something, the course’s policy is that you may make use of any online resources (e.g. RStudio Community, StackOverflow) but you must explicitly cite where you obtained any code you directly use (or use as inspiration). Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism. On individual assignments you may not directly share code with another student in this class, and on team assignments you may not directly share code with another team in this class.

Syllabus Change

This syllabus reflects a plan for the semester. Deviations may become necessary as the semester progresses. Except for changes that substantially affect the implementation of the evaluation (grading) statement, this syllabus and our schedule is a guide for the course and is subject to change with advance notice.

Recording course content

I kindly ask you to refrain from recording (including audio) during our class. For one reason, your fellow classmates may speak up but not consent to being recorded and doing so violates their privacy. If you feel that you need to record the lecture part of the sessions, using audio or video devices, you must get permission from me ahead of time and these recordings should be used for personal study only, no for distribution.

Computer emergencies

I am not sympathetic about computer emergencies. In general, you should learn to keep your work so that it shouldn’t matter if your computer explodes. If it does explode, you will lose only the work after your last sync. You can then restart your work on a public computer, like the one at the uni. For this course in particular, because I expect you to use a cloud service GitHub in your workflow and the availability of Posit.cloud (a cloud version of the software we will be using), a broken computer will not be a valid reason to ask for an extension.

Accessibility and accommodations

It is my goal that this class is an accessible and welcoming experience for all students, including those with disabilities that may impact learning in this class. If you anticipate or experience academic barriers based on your disability (including mental health, chronic or temporary medical conditions), please let me know immediately so that we can privately discuss options. After registration, make arrangements with me as soon as possible to discuss your accommodations so that they may be implemented in a timely fashion. Please consult the University website for more information on the topic.

Use of generative artificial intelligence (AI)

You should treat generative AI, such as ChatGPT, the same as other online resources. There are two guiding principles that govern how you can use AI in this course:2 (1) Cognitive dimension: Working with AI should not reduce your ability to think clearly. We will practice using AI to facilitate—rather than hinder—learning. (2) Ethical dimension: Students using AI should be transparent about their use and make sure it aligns with academic integrity.

-   **AI tools for code:** You may make use of the technology for coding examples on assignments; if you do so, you must explicitly cite where you obtained the code. Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism. You may use [these guidelines](https://guides.lib.monash.edu/c.php?g=219786&p=6972087) for citing AI-generated content.

-   **AI tools for narrative:** Unless instructed otherwise, you may *not* use generative AI to write narrative on assignments. In general, you may use generative AI as a resource as you complete assignments but not to answer the exercises for you. You are ultimately responsible for the work you turn in; it should reflect your understanding of the course content.

Foods and drinks in class

I am totally fine with you bringing drinks, especially caffeine-containing, to the class. However, please do not use the class time to have snacks.

Student feedback

Once we are halfway through the class, I will ask you for some feedback regarding the class. We will discuss your responses and I will try to incorporate this feedback for the second part of the class. At the end of the course, you will receive the official evaluations administered by the University, and I would appreciate you filling those in. Apart from these opportunities, you are always welcome to share your concerns or feedback any time during the course, in person during office hours or anonymously via the pop-up form on the course website.

Acknoweldgements

This course benefited greatly from other publicly available course resources, in particular Gov50 by Matt Blackwell (Harvard University), Advanced Data Visualisation by Mine Cetinskaya-Rundlett (Duke University), Data Visualization by Andrew Heiss (Georgia State University), and Quantitative Methods in Political Science by Thomas Gschwend (University of Mannheim).

Footnotes

  1. Once we have the exam schedule, we can revisit the deadline issue. Either way, you will have at least 10 days to complete the assignment, and everyone will need to start working on it at the same time.↩︎

  2. These guiding principles are based on Course Policies related to ChatGPT and other AI Tools developed by Joel Gladd, Ph.D.↩︎