April 10: Sublinear Day

Information

The second Sublinear Algorithms Day will take place at MIT on Friday, April 10, 2015.

This event will bring together researchers from academic institutions in the northeast for a day of interaction and discussion. It will feature talks by experts in the areas of sublinear and streaming algorithms, as well as a poster session for up-and-coming researchers to showcase their work.

Wondering what sublinear algorithms are? See this short description by Ronitt Rubinfeld.

When?
Friday, April 10, 2015.
Where?
Ray and Maria Stata Center, 32 Vassar Street, Cambridge MA. See Google Maps for directions.
All talks will be in the Patil/Kiva Seminar Room (32-G449), and the poster session will be in the Stata Center lobby.
What?
See the schedule and the list of speakers
Do I need to register?
Please fill out this form by Tuesday, April 7. It's not required, but it would help with organization.
What's next?
Try some open problems: sublinear.info.

Poster Session

For the first time this year, we will have a poster session! We strongly encourage young researchers to present work related to sublinear algorithms.

The list of accepted posters is available here! Even thought the deadline has passed, we still have space for a few more posters! Please follow the directions below if you wish to submit!

If you wish to present a poster, please send an email to g@csail.mit.edu with the subject "Sublinear Day Poster Submission: <your_name>" by Friday, March 20, 2015. Please include a title, list of authors, and abstract for your poster, and if available, a link to the corresponding paper. Notification of acceptance will be sent by Friday, March 27, 2015.

Schedule

All talks will be in the Stata Center's Patil/Kiva Seminar Room (32-G449), and the poster session will be in the Stata Center lobby.

09:00 AM – 09:25 AM
Coffee Break
09:25 AM – 09:30 AM
Opening Remarks
09:30 AM – 10:15 AM
Robert Krauthgamer
The Sketching Complexity of Graph Cuts
We study the problem of sketching an input graph \(G\) on \(n\) vertices, so that given (only) the sketch, one can estimate the weight of any cut in the graph within a small approximation factor \(1+\epsilon\). The classic cut-sparsifier construction of Benczur and Karger (STOC 1996) implies a sketch of size \(\tilde O(n/\epsilon^2)\) [this notation hides logarithmic factors].

We show that the dependence on \(\epsilon\) can be brought down to linear, at the price of achieving a slightly relaxed guarantee. Specifically, we design a randomized scheme that produces from \(G\) a sketch of size \(\tilde O(n/\epsilon)\) bits, from which the weight of any cut \((S,\bar S)\) can be reported, with high probability, within factor \(1+\epsilon\). We also demonstrate some applications where this "for each" guarantee is indeed useful.

We further prove that that our relaxation is necessary. Specifically, a sketch that can \((1+\epsilon)\)-approximate the weight of all cuts in the graph simultaneously (a so-called "for all" guarantee instead of "for each"), must be of size \(\Omega(n/\epsilon^2)\) bits.

Joint work with Alexandr Andoni and David Woodruff.
10:25 AM – 11:10 AM
Jelani Nelson
Time Lower Bounds for Nonadaptive Turnstile Streaming Algorithms
We say a turnstile streaming algorithm is "non-adaptive" if the memory cells it probes when receiving an update to coordinate \(x_i\) only depend on \(i\) as well as random coins flipped before seeing the stream (e.g. to determine hash functions). All known turnstile streaming algorithms for non-promise problems are non-adaptive -- in fact, they are linear sketches.

We give the first ever space/update time tradeoff lower bounds that hold against non-adaptive turnstile streaming algorithms. Our lower bounds hold against classically studied problems such as heavy hitters, point query, moment estimation, and entropy estimation. In the case of deterministic L1 point query, known algorithms show that our lower bounds are tight for constant error parameter epsilon.

Joint work with Kasper Green Larsen and Nguyễn Lê Huy. Full version at http://arxiv.org/abs/1407.2151.
11:20 AM – 12:05 PM
Shubhangi Saraf
A Survey of Some Recent Results for LDCs and LCCs
Locally decodable codes (and the closely related locally correctable codes) are error-correcting codes that admit efficient decoding algorithms: They give a method to encode k bit messages into n bit codewords such that even after a constant fraction of the bits of the codeword get corrupted, any bit of the original message (or original codeword for LCCs) can be recovered by only looking at only a sub linear or even just constant number of bits of the corrupted codeword. The tradeoff between the rate of a code and the locality/efficiency of its decoding algorithms has been studied extensively in the past decade, with numerous applications to complexity theory and pseudorandomness. In this talk I will survey some recent and not-so-recent results (both constructions and lower bounds) for locally decodable and locally correctable codes. I will also highlight some of the most interesting challenges that remain.
12:15 PM – 2:00 PM
Lunch (on your own)
2:00 PM – 2:45 PM
Paul Valiant
Instance-Optimal Testing and Learning
We consider two fundamental statistical tasks given samples from an unknown distribution: hypothesis testing against a fixed distribution, and learning the unknown distribution - where the goal on both cases is to use as few samples as possible. For both problems, we introduce the notion of an "instance-optimal" algorithm, which performs as well as the best conceivable algorithm tailored to the distribution in question. In the first setting of hypothesis testing, this means essentially designing a hypothesis tester that makes optimal use of its hypothesis; the second setting, more mysteriously, demands an algorithm that must depend optimally on the very distribution it seeks to learn. We introduce instance-optimal algorithms for both settings, which take very different forms: our hypothesis tester can be viewed as a nuanced adaptation of Pearson's chi-squared test, though with an analysis which takes us through a novel "automated inequality prover"; our distribution learner leverages the "unseen" histogram estimator that had previously been introduced to estimate entropy and related properties of distributions. In general, this instance-optimal perspective has revealed surprising new structure to classic problems, and promises a theoretically principled approach to designing algorithms that compete with the most finely-tuned practical heuristics. The two examples we present seem just the tip of the iceberg for this instance-optimal style of analysis.
2:55 PM – 3:40 PM
Costis Daskalakis
Learning and Testing Poisson Binomial Distributions
It is well-known that ~1/\(\varepsilon^2\) coin tosses are necessary and sufficient to estimate the bias of a coin. But how many samples are needed to learn the distribution of the sum of \(n\) independent Bernoullis? I will show that the answer is still \(O(1/\varepsilon^2)\) independent of \(n\). I will argue that the celebrated Berry-Esseen theorem and its variants fall short from characterizing the general structure of PBDs, and offer stronger finitary central limit theorems, which tightly characterize their structure and have the afore-mentioned implications to learning. I will then use these structural results to obtain optimal testers for distinguishing whether an unknown distribution is a PBD or far from all PBDs.
3:50 PM – 5:30 PM
Poster Session (with coffee and snacks)
List of accepted posters

Speakers

Costis Daskalakis

Costis Daskalakis received his PhD at UC Berkeley under the supervision of Christos Papadimitriou, and is currently an Associate Professor at MIT. His research focuses on algorithmic game theory, computational biology, and applied probability.
Robert Krauthgamer

Robert Krauthgamer received his PhD at the Weizmann Institute of Science under the supervision of Uriel Feige, where he is now a Professor. His research focuses on analysis of algorithms, including algorithms for data analysis and massive datasets.
Jelani Nelson

Jelani Nelson received his PhD at MIT under the supervision of Erik Demaine and Piotr Indyk, and is currently an Assistant Professor at Harvard University. His research focuses on the development of efficient algorithms for massive datasets, particularly streaming algorithms.
Shubhangi Saraf

Shubhangi Saraf received her PhD at MIT under the supervision of Madhu Sudan, and is currently an Assistant Professor at Rutgers University. Her research focuses on complexity theory and property testing.
Paul Valiant

Paul Valiant received his PhD at MIT under the supervision of Silvio Micali, and is currently an Assistant Professor at Brown University. His research focuses on the theory behind big data, scientific computing, and evolution.

Organizers and Support

This event is organized by Gautam Kamath, with guidance and support from Grigory Yaroslavtsev, Piotr Indyk, and Costis Daskalakis. Thanks to Ilya Razenshteyn for suggestions related to the poster session and website design, and Clément Canonne for help carrying a preposterous amount of coffee.