CS 860 - Algorithms for Private Data Analysis - Fall 2022

Course Description

This course is on algorithms for differentially private analysis of data. As necessitated by the nature of differential privacy, this course will be theoretically and mathematically based. References to practice will be provided as relevant, especially towards the end of the course. Prerequisites include an undergraduate understanding of algorithms, comfort with probability, and mathematical maturity.

Logistics

This course will consist of two parts. The first third of the course will be a series of lectures covering the basics of differential privacy. The remainder of the course will consist of paper reading and discussions.

Lecture videos for a previous offering of this course are available here. Our lecture part of the course will cover roughly the first seven lectures from that offering. You are still expected to attend class meetings, both during the lecture portion and especially during the discussion portion.

Lectures

All videos and notes are from the prior course offering, but should still be relevant.

Lecture Number Topic Release Date Video Link Lecture Notes Written Notes References and Readings
1 Some Attempts at Data Privacy 9/8/2022 Part 1
Part 2
PDF PDF Required: Recommended: Optional:
2 Reconstruction Attacks 9/13/2022 Part 1
Part 2
Part 3
PDF PDF Highly Recommended: Recommended: Optional:
3 Continue Reconstruction Attacks, start Intro to Differential Privacy 9/15/2022 Part 1
Part 2
PDF PDF Recommended: Optional:
4 Continue Intro to Differential Privacy 9/20/2022
5 Start Intro to Differential Privacy, Part 2 9/22/2022 Part 1
Part 2
PDF PDF Recommended: Optional:
6 Finish Intro to Differential Privacy Part 2, Start Approximate Differential Privacy 9/27/2022 Part 1
Part 2
PDF PDF Recommended: Optional:
7 Continue Approximate Differential Privacy 9/29/2022
8 Finish Approximate Differential Privacy, Start Exponential Mechanism 10/4/2022 Part 1
Part 2
PDF PDF Recommended:
9 Finish Exponential Mechanism 10/4/2022

Paper Discussions

Meeting Number Date Paper 1 Paper 1 Presenter Paper 2 Paper 2 Presenter Additional Readings
10 10/18/22 Smooth Sensitivity and Sampling in Private Data Analysis, by Nissim, Raskhodnikova, and Smith. A A Simple and Practical Algorithm for Differentially Private Data Release, by Hardt, Ligett, and McSherry. R
11 10/20/22 On Significance of the Least Significant Bits For Differential Privacy, by Mironov. V Preserving Statistical Validity in Adaptive Data Analysis, by Dwork, Feldman, Hardt, Pitassi, Reingold, and Roth. A
12 10/27/22 The matrix mechanism: optimizing linear counting queries under differential privacy, by Li, Miklau, Hay, McGregor, and Rastogi. SK Communication-Efficient Learning of Deep Networks from Decentralized Data, by McMahan, Moore, Ramage, Hampson, and Agüera y Arcas. C
13 11/01/22 Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds, by Bun and Steinke. SK Deep Learning with Differential Privacy, by Abadi, Chu, Goodfellow, McMahan, Mironov, Talwar, and Zhang. R
14 11/03/22 Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data, by Papernot, Abadi, Erlingsson, Goodfellow, and Talwar. S Exposed! A Survey of Attacks on Private Data, by Dwork, Smith, Steinke, and Ullman. V
15 11/08/22 Prochlo: Strong Privacy for Analytics in the Crowd, by Bittau, Erlingsson, Maniatis, Mironov, Raghunathan, Lie, Rudominer, Kode, Tinnes, and Seefeld. V Private PAC learning implies finite Littlestone dimension, by Alon, Livni, Malliaris, and Moran. S Local Differential Privacy
16 11/10/22 Distributed Differential Privacy via Shuffling, by Cheu, Smith, Ullman, Zeber, and Zhilyaev. C Gaussian Differential Privacy, by Dong, Roth, and Su. SK
17 11/15/22 LinkedIn's Audience Engagements API: A Privacy Preserving Data Analytics System at Scale, by Rogers, Subramaniam, Peng, Durfee, Lee, Kancha, Sahay, and Ahammad. SS An Equivalence Between Private Classification and Online Prediction, by Bun, Livni, and Moran. SK
18 11/17/22 CoinPress: Practical Private Mean and Covariance Estimation, by Biswas, Dong, Kamath, and Ullman. A Auditing Differentially Private Machine Learning: How Private is Private SGD?, by Jagielski, Ullman, and Oprea. S
19 11/22/22 Differentially Private Learning Needs Better Features (or Much More Data), by Tramèr and Boneh. SS Practical and Private (Deep) Learning without Sampling or Shuffling, by Kairouz, McMahan, Song, Thakkar, Thakurta, and Xu. C Federated Learning with Formal Differential Privacy Guarantees
20 11/24/22 Large Language Models Can Be Strong Differentially Private Learners, by Li, Tramèr, Liang, and Hashimoto.
Differentially Private Fine-tuning of Language Models, by Yu, Naik, Backurs, Gopi, Inan, Kamath, Kulkarni, Lee, Manoel, Wutschitz, Yekhanin, and Zhang.
C Membership Inference Attacks From First Principles, by Carlini, Chien, Nasr, Song, Terzis, and Tramèr. S
21 11/29/22 What Does it Mean for a Language Model to Preserve Privacy?, by Brown, Lee, Mireshghallah, Shokri, and Tramèr. R Attacks on Deidentification's Defenses, by Cohen. SS
22 12/01/22 The 2020 Census Disclosure Avoidance System TopDown Algorithm, by Abowd, Ashmead, Cumings-Menon, Garfinkel, Heineck, Heiss, Johns, Kifer, Leclerc, Machanavajjhala, Moran, Sexton, Spence, and Zhuravlev. V Considerations for Differentially Private Learning with Large-Scale Public Pretraining (Forthcoming) SS

Assignment

Problem Set, due October 24, by 11:59 PM Waterloo time.

Books and References

Don't buy a textbook for this course. But we'll be going off the following resources, and others. Also check out the Resources page on DifferentialPrivacy.org.

Grading

Grades for this course will be determined with the following breakdown:

Paper Presentation and Discussions

The format of these paper discussions (as well as the text below) is based on a course by Florian Kerschbaum. Logistics may change slightly as we approach the relevant portion of the course. The majority of credit for this portion will be based on presenting the paper, with the rest assigned to your reviews and participation in discussions.

There are two or three papers assigned to each lecture, selected from the course topics. All students have to read all of the papers before the class. Each student must submit a review for each of the papers several days before class. The review must contain a short summary of the paper, its related work and context in the scientific literature, its strong points making a contribution to the advancement of knowledge and its weak points where further research work is necessary. It is good to phrase weak points as questions or hypothesis. Please give enough context to make your questions accessible to other students.

Each student responsible for presenting a paper, must read all reviews and prepare a presentation that structures the discussion on the paper. This includes providing the necessary background and context of the paper, i.e., it may be necessary to read more than just the assigned paper when presenting. The presenter is to assume that each student has read the paper and is familiar with its content, i.e. not verbatim repeat the paper's contribution. The presenter will then structure the reviews into main contributions and questions, answer questions where the necessary background is available and lead a discussion on the remaining ones with the help of the instructor. Each paper should start with a 20 minute presentation of the reviews and last for another 20 minutes of discussion.

Final Project

For your final project, you should do something cool related to differential privacy. As a very rough guideline, the goal should be to produce original research that would be hypothetically publishable at a reasonable conference, or at least a workshop (say, e.g., TPDP) -- note that the polish of results in these workshops ranges from very preliminary to essentially conference papers). That said, publishability is not perfectly aligned with my expectations: not everything cool is publishable, and not everything publishable is cool.

You are allowed to work in groups of up to three.

There are three deliverables for this project. Credit is only assigned for the last one, the first two are just to make sure things are going OK. All are due by email to the instructor (either his MIT or Waterloo one).