CS 860 - Algorithms for Private Data Analysis - Fall 2022

Course Description

This course is on algorithms for differentially private analysis of data. As necessitated by the nature of differential privacy, this course will be theoretically and mathematically based. References to practice will be provided as relevant, especially towards the end of the course. Prerequisites include an undergraduate understanding of algorithms, comfort with probability, and mathematical maturity.

Logistics

Lecture: Tuesdays and Thursday 10:00 AM - 11:20 AM in DC 2585
Instructor: Gautam Kamath
Office Hours: Thursday 11:20 AM to 12:30 PM, or by appointment
Piazza: Link here

This course will consist of two parts. The first third of the course will be a series of lectures covering the basics of differential privacy. The remainder of the course will consist of paper reading and discussions.

Lecture videos for a previous offering of this course are available here. Our lecture part of the course will cover roughly the first seven lectures from that offering. You are still expected to attend class meetings, both during the lecture portion and especially during the discussion portion.

Lectures

All videos and notes are from the prior course offering, but should still be relevant.

Lecture Number	Topic	Release Date	Video Link	Lecture Notes	Written Notes	References and Readings
1	Some Attempts at Data Privacy	9/8/2022	Part 1 Part 2	PDF	PDF	Required: Nikolov, Some Useful Probability Facts, 2020. Recommended: Pandurangan, On Taxis and Rainbows, 2014. Narayanan and Shmatikov, Robust De-anonymization of Large Sparse Datasets, 2008. Carlini, Liu, Erlingsson, Kos, and Song, The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks, 2019. Wallace, Tramèr, Jagielski, and Herbert-Voss, Does GPT-2 Know Your Phone Number?, 2020. Carlini et al., Extracting Training Data from Large Language Models, 2021. Optional: Whong, FOILing NYC's Taxi Trip Data, 2014. Homer et al., Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays, 2008. Samarati and Sweeney, Generalizing Data to Provide Anonymity when Disclosing Information, 1998. Ganta, Kasiviswanathan, and Smith, Composition Attacks and Auxiliary Information in Data Privacy, 2008.
2	Reconstruction Attacks	9/13/2022	Part 1 Part 2 Part 3	PDF	PDF	Highly Recommended: Section 8.1 of Dwork and Roth, The Algorithmic Foundations of Differential Privacy, 2014. Garfinkel, Abowd, Martindale, Understanding Database Reconstruction Attacks on Public Data, 2019. Dinur and Nissim, Revealing Information while Preserving Privacy, 2003. Cohen and Nissim, Linear Program Reconstruction in Practice, 2020. Recommended: Abowd, Tweetorial: Reconstruction-abetted re-identification attacks and other traditional vulnerabilities, 2019. Francis, Diffix Cedar bounty prize awarded to UPenn students, 2020. Optional: Dwork, McSherry, and Talwar, The Price of Privacy and the Limits of LP Decoding, 2007. Hanson, To Reduce Privacy Risks, the Census Plans to Report Less Accurate Data, 2018.
3	Continue Reconstruction Attacks, start Intro to Differential Privacy	9/15/2022	Part 1 Part 2	PDF	PDF	Recommended: Section 1.4-1.6 of Vadhan, The Complexity of Differential Privacy, 2017. Sections 2, 3.1-3.2 of Dwork and Roth, The Algorithmic Foundations of Differential Privacy, 2014. Optional: Dwork, McSherry, Nissim, and Smith, Calibrating Noise to Sensitivity in Private Data Analysis, 2006.
4	Continue Intro to Differential Privacy	9/20/2022
5	Start Intro to Differential Privacy, Part 2	9/22/2022	Part 1 Part 2	PDF	PDF	Recommended: Section 2 of Vadhan, The Complexity of Differential Privacy, 2017. Section 3.3 of Dwork and Roth, The Algorithmic Foundations of Differential Privacy, 2014. Optional: Dwork, McSherry, Nissim, and Smith, Calibrating Noise to Sensitivity in Private Data Analysis, 2006.
6	Finish Intro to Differential Privacy Part 2, Start Approximate Differential Privacy	9/27/2022	Part 1 Part 2	PDF	PDF	Recommended: Appendix A of Dwork and Roth, The Algorithmic Foundations of Differential Privacy, 2014. Optional: Dwork, Kenthapadi, McSherry, Mironov, and Naor, Our Data, Ourselves: Privacy via Distributed Noise Generation, 2006.
7	Continue Approximate Differential Privacy	9/29/2022
8	Finish Approximate Differential Privacy, Start Exponential Mechanism	10/4/2022	Part 1 Part 2	PDF	PDF	Recommended: Sections 3.4, 10.1-10.2 of Dwork and Roth, The Algorithmic Foundations of Differential Privacy, 2014. Section 8.1 of Vadhan, The Complexity of Differential Privacy, 2017. McSherry, Talwar, Mechanism Design via Differential Privacy, 2007.
9	Finish Exponential Mechanism	10/4/2022

Paper Discussions

Meeting Number	Date	Paper 1	Paper 1 Presenter	Paper 2	Paper 2 Presenter	Additional Readings
10	10/18/22	Smooth Sensitivity and Sampling in Private Data Analysis, by Nissim, Raskhodnikova, and Smith.	A	A Simple and Practical Algorithm for Differentially Private Data Release, by Hardt, Ligett, and McSherry.	R
11	10/20/22	On Significance of the Least Significant Bits For Differential Privacy, by Mironov.	V	Preserving Statistical Validity in Adaptive Data Analysis, by Dwork, Feldman, Hardt, Pitassi, Reingold, and Roth.	A
12	10/27/22	The matrix mechanism: optimizing linear counting queries under differential privacy, by Li, Miklau, Hay, McGregor, and Rastogi.	SK	Communication-Efficient Learning of Deep Networks from Decentralized Data, by McMahan, Moore, Ramage, Hampson, and Agüera y Arcas.	C
13	11/01/22	Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds, by Bun and Steinke.	SK	Deep Learning with Differential Privacy, by Abadi, Chu, Goodfellow, McMahan, Mironov, Talwar, and Zhang.	R
14	11/03/22	Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data, by Papernot, Abadi, Erlingsson, Goodfellow, and Talwar.	S	Exposed! A Survey of Attacks on Private Data, by Dwork, Smith, Steinke, and Ullman.	V
15	11/08/22	Prochlo: Strong Privacy for Analytics in the Crowd, by Bittau, Erlingsson, Maniatis, Mironov, Raghunathan, Lie, Rudominer, Kode, Tinnes, and Seefeld.	V	Private PAC learning implies finite Littlestone dimension, by Alon, Livni, Malliaris, and Moran.	S	Local Differential Privacy
16	11/10/22	Distributed Differential Privacy via Shuffling, by Cheu, Smith, Ullman, Zeber, and Zhilyaev.	C	Gaussian Differential Privacy, by Dong, Roth, and Su.	SK
17	11/15/22	LinkedIn's Audience Engagements API: A Privacy Preserving Data Analytics System at Scale, by Rogers, Subramaniam, Peng, Durfee, Lee, Kancha, Sahay, and Ahammad.	SS	An Equivalence Between Private Classification and Online Prediction, by Bun, Livni, and Moran.	SK
18	11/17/22	CoinPress: Practical Private Mean and Covariance Estimation, by Biswas, Dong, Kamath, and Ullman.	A	Auditing Differentially Private Machine Learning: How Private is Private SGD?, by Jagielski, Ullman, and Oprea.	S
19	11/22/22	Differentially Private Learning Needs Better Features (or Much More Data), by Tramèr and Boneh.	SS	Practical and Private (Deep) Learning without Sampling or Shuffling, by Kairouz, McMahan, Song, Thakkar, Thakurta, and Xu.	C	Federated Learning with Formal Differential Privacy Guarantees
20	11/24/22	Large Language Models Can Be Strong Differentially Private Learners, by Li, Tramèr, Liang, and Hashimoto. Differentially Private Fine-tuning of Language Models, by Yu, Naik, Backurs, Gopi, Inan, Kamath, Kulkarni, Lee, Manoel, Wutschitz, Yekhanin, and Zhang.	C	Membership Inference Attacks From First Principles, by Carlini, Chien, Nasr, Song, Terzis, and Tramèr.	S
21	11/29/22	What Does it Mean for a Language Model to Preserve Privacy?, by Brown, Lee, Mireshghallah, Shokri, and Tramèr.	R	Attacks on Deidentification's Defenses, by Cohen.	SS
22	12/01/22	The 2020 Census Disclosure Avoidance System TopDown Algorithm, by Abowd, Ashmead, Cumings-Menon, Garfinkel, Heineck, Heiss, Johns, Kifer, Leclerc, Machanavajjhala, Moran, Sexton, Spence, and Zhuravlev.	V	Considerations for Differentially Private Learning with Large-Scale Public Pretraining (Forthcoming)	SS

Assignment

Problem Set, due October 24, by 11:59 PM Waterloo time.

Books and References

Don't buy a textbook for this course. But we'll be going off the following resources, and others.

The Algorithmic Foundations of Differential Privacy, by Cynthia Dwork and Aaron Roth
The Complexity of Differential Privacy, by Salil Vadhan

Also check out the Resources page on DifferentialPrivacy.org.

Grading

Grades for this course will be determined with the following breakdown:

Assignments: 25%. Will be mathematical/proof based. One large assignment.
Paper Discussion and Participation: 35%. You are expected to read and review all assigned papers, and lead discussions on some of them.
Final Project: 40%. A project may be theoretical, experimental, or a mix of the two.

Paper Presentation and Discussions

The format of these paper discussions (as well as the text below) is based on a course by Florian Kerschbaum. Logistics may change slightly as we approach the relevant portion of the course. The majority of credit for this portion will be based on presenting the paper, with the rest assigned to your reviews and participation in discussions.

There are two or three papers assigned to each lecture, selected from the course topics. All students have to read all of the papers before the class. Each student must submit a review for each of the papers several days before class. The review must contain a short summary of the paper, its related work and context in the scientific literature, its strong points making a contribution to the advancement of knowledge and its weak points where further research work is necessary. It is good to phrase weak points as questions or hypothesis. Please give enough context to make your questions accessible to other students.

Each student responsible for presenting a paper, must read all reviews and prepare a presentation that structures the discussion on the paper. This includes providing the necessary background and context of the paper, i.e., it may be necessary to read more than just the assigned paper when presenting. The presenter is to assume that each student has read the paper and is familiar with its content, i.e. not verbatim repeat the paper's contribution. The presenter will then structure the reviews into main contributions and questions, answer questions where the necessary background is available and lead a discussion on the remaining ones with the help of the instructor. Each paper should start with a 20 minute presentation of the reviews and last for another 20 minutes of discussion.

Final Project

For your final project, you should do something cool related to differential privacy. As a very rough guideline, the goal should be to produce original research that would be hypothetically publishable at a reasonable conference, or at least a workshop (say, e.g., TPDP) -- note that the polish of results in these workshops ranges from very preliminary to essentially conference papers). That said, publishability is not perfectly aligned with my expectations: not everything cool is publishable, and not everything publishable is cool.

You are allowed to work in groups of up to three.

There are three deliverables for this project. Credit is only assigned for the last one, the first two are just to make sure things are going OK. All are due by email to the instructor (either his MIT or Waterloo one).

Project Proposal - due October 21. A paragraph or two stating what you intend to do for your final project.
Project Check-in - due November 18. A paragraph or two updating on what you have done for your final project. I expect to see measurable progress done by this time.
Project submission - due December 9. A report of up to 8 pages (plus unlimited pages for references and appendix).