Dec 6, 2021
What advantage do sequential procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions 饾挓_1 and 饾挓_2 on {1,..., n} are equal or 系-far, we give several answers to this question. We show that for a small alphabet size n, there is a sequential algorithm that outperforms any batch algorithm by a factor of at least 4 in terms sample complexity. For a general alphabet size n, we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance between 饾挓_1 and 饾挓_2 is larger than 系. As a corollary, letting 系 go to 0, we obtain a sequential algorithm for testing closeness (with no a priori bound on the distance between 饾挓_1 and 饾挓_2) with a sample complexity 饾挭虄(n^2/3/TV(饾挓_1, 饾挓_2)^4/3): this improves over the 饾挭虄(n/log n/TV(饾挓_1, 饾挓_2)^2) tester of [Daskalakis and Kawase 2017] and is optimal up to multiplicative constants. We also establish limitations of sequential algorithms for the problem of testing closeness: they can improve the worst case number of samples by at most a constant factor.
Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker