References

Graham Cormode; Ke Yi

References

Published online by Cambridge University Press: 22 October 2020

Graham Cormode and

Ke Yi

Show author details

Graham Cormode: Affiliation:
University of Warwick
Ke Yi: Affiliation:
Hong Kong University of Science and Technology

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: Small Summaries for Big Data , pp. 253 - 266

DOI: https://doi.org/10.1017/9781108769938 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Achlioptas, D.. Database-friendly random projections. In ACM Symposium on Principles of Database Systems, pages 274–281, 2001.CrossRef Google Scholar

Agarwal, P. K., Cormode, G., Huang, Z., Phillips, J., Wei, Z., and Mergeable, K. Yi. summaries. ACM Transactions on Database Systems, 38(4), 2013.Google Scholar

Agarwal, P. K., Har-Peled, S., and Varadarajan, K. R.. Approximating extent measures of points. Journal of the ACM, 51:606–635, 2004.CrossRef Google Scholar

Agarwal, P. K. and Sharathkumar, R.. Streaming algorithms for extent problems in high dimensions. In ACM-SIAM Symposium on Discrete Algorithms, pages 1481–1489, 2010.Google Scholar

Agarwal, P. K. and Yu, H.. A space-optimal data-stream algorithm for coresets in the plane. In Symposium on Computational Geometry, pages 1–10, 2007.Google Scholar

Aggarwal, C. C.. On biased reservoir sampling in the presence of stream evolution. In International Conference on Very Large Data Bases, pages 607–618, 2006.Google Scholar

Ahn, K. J., Guha, S., and McGregor, A.. Analyzing graph structure via linear measurements. In ACM-SIAM Symposium on Discrete Algorithms, pages 459–467, 2012.Google Scholar

Ailon, N. and Chazelle, B.. Approximate nearest neighbors and the fast Johnson– Lindenstrauss transform. SIAM Journal on Computing, 39(1):302–322, 2009.CrossRef Google Scholar

Alon, N., Gibbons, P., Matias, Y., and Szegedy, M.. Tracking join and self-join sizes in limited storage. In ACM Symposium on Principles of Database Systems, pages 10–20, 1999.Google Scholar

Alon, N., Matias, Y., and Szegedy, M.. The space complexity of approximating the frequency moments. In ACM Symposium on Theory of Computing, pages 20–29, 1996.Google Scholar

Alon, N., Matias, Y., and Szegedy, M.. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58:137–147, 1999.CrossRef Google Scholar

Anderson, D., Bevan, P., Lang, K., Liberty, E., Rhodes, L., and Thaler, J.. A high-performance algorithm for identifying frequent items in data streams. In Internet Measurement Conference, pages 268–282, 2017.CrossRef Google Scholar

Andoni, A. and Indyk, P.. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In IEEE Conference on Foundations of Computer Science, pages 459–468, 2006.Google Scholar

Andoni, A., Indyk, P., and Razenshteyn, I.. Approximate nearest neighbor search in high dimensions. https://arxiv.org/abs/1806.09823, 2018.Google Scholar

Andoni, A. and Nguyên, H. L.. Width of points in the streaming model. ACM Transactions on Algorithms, 12(1):5:1–5:10, 2016.Google Scholar

Aronov, B., Ezra, E., and Sharir, M.. Small-size ε-nets for axis-parallel rectangles and boxes. SIAM Journal on Computing, 39(7):3248–3282, 2010.Google Scholar

Arya, S., Mount, D., Netanyahu, N. S., Silverman, R., and Wu, A. Y.. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM, 45(6):891–923, 1998.Google Scholar

Bansal, N.. Constructive algorithms for discrepancy minimization. In IEEE Conference on Foundations of Computer Science, 2010.Google Scholar

Bar-Yossef, Z., Jayram, T., Kumar, R., Sivakumar, D., and Trevisian, L.. Counting distinct elements in a data stream. In Proceedings of RANDOM 2002, pages 1–10, 2002.Google Scholar

Bar-Yossef, Z., Kumar, R., and Sivakumar, D.. Reductions in streaming algorithms, with an application to counting triangles in graphs. In ACM-SIAM Symposium on Discrete Algorithms, pages 623–632, 2002.Google Scholar

Barkay, N., Porat, E., and Shalem, B.. Feasible sampling of non-strict turnstile data streams. In Fundamentals of Computation Theory, pages 48–59, September 2013.Google Scholar

Ben-Basat, R., Einziger, G., Friedman, R., and Kassner, Y.. Heavy hitters in streams and sliding windows. In IEEE INFOCOM, page 1–9, 2016.Google Scholar

Ben-Basat, R., Einziger, G., Friedman, R., and Kassner, Y.. Optimal elephant flow detection. In IEEE INFOCOM, pages 1–9, 2017.Google Scholar

Ben-Basat, R., Einziger, G., and Friedman, R.. Fast flow volume estimation. In Proceedings of the International Conference on Distributed Computing and Networking, pages 44:1–44:10, 2018.Google Scholar

Bentley, J. L. and Saxe, J. B.. Decomposable searching problems I: static-to-dynamic transformation. Journal of Algorithms, 1:301–358, 1980.Google Scholar

Berinde, R., Cormode, G., Indyk, P., and Strauss, M.. Space-optimal heavy hitters with strong error bounds. In ACM Symposium on Principles of Database Systems, pages 157–166, 2009.CrossRef Google Scholar

Beyer, K. S., Haas, P. J., Reinwald, B., Sismanis, Y., and Gemulla, R.. On synopses for distinct-value estimation under multiset operations. In ACM SIGMOD International Conference on Management of Data, pages 199–210, 2007.Google Scholar

Bianchi, G., Duffy, K., Leith, D. J., and Shneer, V.. Modeling conservative updates in multi-hash approximate count sketches. In 24th International Teletraffic Congress, pages 1–8, 2012.Google Scholar

Błasiok, J.. Optimal streaming and tracking distinct elements with high probability. In ACM-SIAM Symposium on Discrete Algorithms, pages 2432–2448, 2018.Google Scholar

Bledaite, L.. Count-min sketches in real data applications. https://skillsmatter.com/skillscasts/6844-count-min-sketch-in-real-data-applications, 2015.Google Scholar

Bloom, B.. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426, July 1970.Google Scholar

Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M.. Learnability and the Vapnik–Chervonenkis dimension. Journal of the ACM, 36:929–965, 1989.Google Scholar

Bollobás, B.. Extremal Graph Theory. Academic Press, 1978.Google Scholar

Bose, P., Kranakis, E., Morin, P., and Tang, Y.. Bounds for frequency estimation of packet streams. In SIROCCO, pages 33–42, 2003.Google Scholar

Boyer, B. and Moore, J.. A fast majority vote algorithm. Technical Report ICSCA-CMP-32, Institute for Computer Science, University of Texas, Feb. 1981.Google Scholar

Braverman, V., Frahling, G., Lang, H., Sohler, C., and Yang, L. F.. Clustering high dimensional dynamic data streams. In the 34th International Conference on Machine Learning, pages 576–585, 2017.Google Scholar

Braverman, V. and Ostrovsky, R.. Smooth histograms for sliding windows. In IEEE Conference on Foundations of Computer Science, pages 283–293, 2007.CrossRef Google Scholar

Braverman, V. and Ostrovsky, R.. Zero-one frequency laws. In ACM Symposium on Theory of Computing, pages 281–290, 2010.CrossRef Google Scholar

Braverman, V., Ostrovsky, R., and Vilenchik, D.. How hard is counting triangles in the streaming model? In International Colloquium on Automata, Languages and Programming, pages 244–254, 2013.Google Scholar

Broder, A. Z. and Mitzenmacher, M.. Network applications of Bloom filters: a survey. Internet Mathematics, 1(4):485–509, 2004.Google Scholar

Bădoiu, M. and Clarkson, K. L.. Smaller core-sets for balls. In ACM-SIAM Symposium on Discrete Algorithms, pages 801–802, 2003.Google Scholar

Bădoiu, M. and Clarkson, K. L.. Optimal core-sets for balls. Computational Geometry: Theory and Applications, 40(1):14–22, 2008.CrossRef Google Scholar

Bădoiu, M., Har-Peled, S., and Indyk, P.. Approximate clustering via core-sets. In ACM Symposium on Theory of Computing, pages 250–257, 2002.Google Scholar

Buriol, L. S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., and Sohler, C.. Counting triangles in data streams. In ACM Symposium on Principles of Database Systems, pages 253–262, 2006.Google Scholar

Cai, D., Mitzenmacher, M., and Adams, R. P.. A Bayesian nonparametric view on count-min sketch. In Advances in Neural Information Processing Systems, pages 8782–8791, 2018.Google Scholar

Carter, J. L. and Wegman, M. N.. Universal classes of hash functions. Journal of Computer and System Sciences, 18(2):143–154, 1979.CrossRef Google Scholar

Chambers, J., Mallows, C., and Stuck, B.. A method for simulating stable random variables. Journal of the American Statistical Association, 71(354):340–344, 1976.Google Scholar

Chan, T. M.. Faster core-set constructions and data-stream algorithms in fixed dimensions. Computational Geometry: Theory and Applications, 35:20–35, 2006.CrossRef Google Scholar

Chan, T. M. and Pathak, V.. Streaming and dynamic algorithms for minimum enclosing balls in high dimensions. In International Symposium on Algorithms and Data Structures, pages 195–206, 2011.Google Scholar

Chandra, K.. View counting at reddit. https://redditblog.com/2017/05/24/view-counting-at-reddit/, 2017.Google Scholar

Charikar, M., Chen, K., and Farach-Colton, M.. Finding frequent items in data streams. In Procedings of the International Colloquium on Automata, Languages and Programming, pages 693–703, 2002.CrossRef Google Scholar

Charikar, M., O’Callaghan, L., and Panigrahy, R.. Better streaming algorithms for clustering problems. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing, pages 30–39, 2003.Google Scholar

Charikar, M. S.. Similarity estimation techniques from rounding algorithms. In ACM Symposium on Theory of Computing, pages 380–388, 2002.Google Scholar

Chen, K. and Rao, S.. An improved frequent items algorithm with applications to web caching. Technical Report UCB/CSD-05-1383, EECS Department, University of California, Berkeley, 2005.Google Scholar

Clarkson, K. L. and Woodruff, D. P.. Numerical linear algebra in the streaming model. In ACM Symposium on Theory of Computing, pages 205–214, 2009.Google Scholar

Cohen, E.. Size-estimation framework with applications to transitive closure and reachability. Journal of Computer and System Sciences, 55(3):441–453, 1997.Google Scholar

Cohen, E.. All-distances sketches, revisited: HIP estimators for massive graphs analysis. IEEE Transactions on Knowledge and Data Engineering, 27(9):2320–2334, 2015.Google Scholar

Cohen, E., Duffield, N., Kaplan, H., Lund, C., and Thorup, M.. Efficient stream sampling for variance-optimal estimation of subset sums. SIAM Journal on Computing, 40(5):1402–1431, 2011.Google Scholar

Cohen, E. and Strauss, M.. Maintaining time-decaying stream aggregates. In ACM Symposium on Principles of Database Systems, 223–233, 2003.CrossRef Google Scholar

Cohen, S. and Matias, Y.. Spectral Bloom filters. In ACM SIGMOD International Conference on Management of Data, 241–252, 2003.Google Scholar

Considine, J., Hadjieleftheriou, M., Li, F., Byers, J. W., and Kollios, G.. Robust approximate aggregation in sensor data management systems. ACM Transactions on Database Systems, 34(1):6:1–6:35, 2009.Google Scholar

Coppersmith, D. and Kumar, R.. An improved data stream algorithm for frequency moments. In ACM-SIAM Symposium on Discrete Algorithms, pages 151–156, 2004.Google Scholar

Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C.. Introduction to Algorithms, 3rd edition. MIT Press, 2009.Google Scholar

Cormode, G., Datar, M., Indyk, P., and Muthukrishnan, S.. Comparing data streams using Hamming norms. IEEE Transactions on Knowledge and Data Engineering, 15(3):529–541, 2003.CrossRef Google Scholar

Cormode, G. and Firmani, D.. On unifying the space of ℓ₀ -sampling algorithms. In Algorithm Engineering and Experiments, pages 163–172, 2013.Google Scholar

Cormode, G. and Garofalakis, M.. Sketching streams through the net: distributed approximate query tracking. In International Conference on Very Large Data Bases, pages 13–24, 2005.Google Scholar

Cormode, G., Garofalakis, M., and Sacharidis, D.. Fast approximate wavelet tracking on streams. In International Conference on Extending Database Technology, pages 4–22, 2006.Google Scholar

Cormode, G. and Hadjieleftheriou, M.. Finding frequent items in data streams. In International Conference on Very Large Data Bases, Pages 1530–1541, 2008.Google Scholar

Cormode, G. and Jowhari, H.. A second look at counting triangles in graph streams (corrected). Theoretical Computer Science, 683:22–30, 2017.Google Scholar

Cormode, G. and Jowhari, H.. l_p samplers and their applications: a survey. ACM Computing Surveys, pages 16:1–16:3, 2019.Google Scholar

Cormode, G., Korn, F., Muthukrishnan, S., and Srivastava, D.. Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In ACM Symposium on Principles of Database Systems, pages 263–272, 2006.CrossRef Google Scholar

Cormode, G., Korn, F., and Tirthapura, S.. Exponentially decayed aggregates on data streams. In IEEE International Conference on Data Engineering, pages 1379–1381, 2008.Google Scholar

Cormode, G. and Muthukrishnan, S.. Improved data stream summary: the Count-Min sketch and its applications. Technical Report 2003-20, DIMACS, 2003.Google Scholar

Cormode, G. and Muthukrishnan, S.. What’s hot and what’s not: tracking most frequent items dynamically. In ACM Symposium on Principles of Database Systems, pages 296–306, 2003.Google Scholar

Cormode, G. and Muthukrishnan, S.. What’s new: finding significant differences in network data streams. In Proceedings of IEEE Infocom, pages 1534–1545, 2004.Google Scholar

Cormode, G. and Muthukrishnan, S.. An improved data stream summary: the Count-Min sketch and its applications. Journal of Algorithms, 55(1):58–75, 2005.Google Scholar

Cormode, G. and Muthukrishnan, S.. Space efficient mining of multigraph streams. In ACM Symposium on Principles of Database Systems, pages 271–282, 2005.Google Scholar

Cormode, G., Muthukrishnan, S., and Rozenbaum, I.. Summarizing and mining inverse distributions on data streams via dynamic inverse sampling. In International Conference on Very Large Data Bases, pages 25–36, 2005.Google Scholar

Cormode, G., Tirthapura, S., and Xu, B.. Time-decaying sketches for sensor data aggregation. In ACM Conference on Principles of Distributed Computing, pages 215–224, 2007.Google Scholar

Das, S., Antony, S., Agrawal, D., and Abbadi, A. E.. Cots: a scalable framework for parallelizing frequency counting over data streams. In IEEE International Conference on Data Engineering, pages 1323–1326, 2009.Google Scholar

Dasgupta, A., Lang, K. J., Rhodes, L., and Thaler, J.. A framework for estimating stream expression cardinalities. In International Conference on Database Theory, pages 6:1–6:17, 2016.Google Scholar

Dasgupta, S. and Gupta, A.. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1):60–65, 2003.Google Scholar

Datar, M., Gionis, A., Indyk, P., and Motwani, R.. Maintaining stream statistics over sliding windows. In ACM-SIAM Symposium on Discrete Algorithms, pages 635–644, 2002.Google Scholar

Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. S.. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, pages 253–262, 2004.Google Scholar

Demaine, E., López-Ortiz, A., and Munro, J. I.. Frequency estimation of internet packet streams with limited space. In European Symposium on Algorithms (ESA), pages 348–360, 2002.Google Scholar

Deng, F. and Rafiei, D.. New estimation algorithms for streaming data: Count-Min can do more. Unpublished manuscript.Google Scholar

Dietzfelbinger, M., Goerdt, A., Mitzenmacher, M., Montanari, A., Pagh, R., and Rink, M.. Tight thresholds for cuckoo hashing via XORSAT. In International Colloquium on Automata, Languages and Programming, pages 213–225, 2010.CrossRef Google Scholar

Dobra, A. and Rusu, F.. Sketches for size of join estimation. ACM Transactions on Database Systems, 33(3): 5:1–15:46, 2008.Google Scholar

Donoho, D.. Compressed sensing. IEEE Transactions on on Information Theory, 52(4):1289–1306, April 2006.Google Scholar

Drineas, P., Magdon-Ismail, M., Mahoney, M. W., and Woodruff, D. P.. Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research, 13:3475–3506, 2012.Google Scholar

Duffield, N., Lund, C., and Thorup, M.. Estimating flow distributions from sampled flow statistics. In ACM SIGCOMM, pages 325–336, 2003.Google Scholar

Duffield, N., Lund, C., and Thorup, M.. Priority sampling for estimation of arbitrary subset sums. Journal of the ACM, 54(6):32, 2007.Google Scholar

Durand, M. and Flajolet, P.. Loglog counting of large cardinalities (extended abstract). In European Symposium on Algorithms, pages 605–617, 2003.Google Scholar

Durstenfeld, R.. Algorithm 235: random permutation. Communications of the ACM, 7(7):420, 1964.CrossRef Google Scholar

Eg̃eciog̃lu, O. and Kalantari, B.. Approximating the diameter of a set of points in the Euclidean space. Information Processing Letters, 32:205–211, 1989.Google Scholar

Einziger, G. and Friedman, R.. A formal analysis of conservative update based approximate counting. In International Conference on Computing, Networking and Communications, pages 255–259, 2015.Google Scholar

Elkin, M.. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Transactions on Algorithms, 7(2):20, 2011.Google Scholar

Eppstein, D. and Goodrich, M. T.. Straggler identification in round-trip data streams via Newton’s identities and invertible Bloom filters. IEEE Transactions on Knowledge and Data Engineering, 23(2):297–306, 2011.Google Scholar

Erlingsson, Ú., Pihur, V., and Korolova, A.. RAPPOR: randomized aggregatable privacy-preserving ordinal response. In Computer and Communications Security, pages 1054–1067, 2014.Google Scholar

Estan, C. and Varghese, G.. New directions in traffic measurement and accounting. In ACM SIGCOMM, volume 32, 4 of Computer Communication Review, pages 323–338, 2002.Google Scholar

Fan, L., Cao, P., Almeida, J., and Broder, A.. Summary cache: A scalable wide-area web cache sharing protocol. In ACM SIGCOMM, pages 254–265, 1998.Google Scholar

Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., and Zhang, J.. Graph distances in the streaming model: the value of space. In ACM-SIAM Symposium on Discrete Algorithms, pages 745–754, 2005.Google Scholar

Felber, D. and Ostrovsky, R.. A randomized online quantile summary in O(1/ϵ log(1/ϵ)) words. In APPROX-RANDOM, pages 775–785, 2015.Google Scholar

Flajolet, P.. Approximate counting: a detailed analysis. BIT, 25:113–134, 1985.Google Scholar

Flajolet, P., Fusy, E., Gandouet, O., and Meunier, F.. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Analysis of Algorithms, pages 127–146, 2007.Google Scholar

Flajolet, P. and Martin, G. N.. Probabilistic counting. In IEEE Conference on Foundations of Computer Science, pages 76–82, 1983.Google Scholar

Flajolet, P. and Martin, G. N.. Probabilistic counting algorithms for database applications. Journal of Computer and System Sciences, 31:182–209, 1985.CrossRef Google Scholar

Frahling, G., Indyk, P., and Sohler, C.. Sampling in dynamic data streams and applications. In Symposium on Computational Geometry, pages 142–149, June 2005.CrossRef Google Scholar

Ganguly, S.. Counting distinct items over update streams. In International Sympoisum on Algorithms and Computation, pages 505–514, 2005.Google Scholar

Ghashami, M., Liberty, E., Phillips, J. M., and Woodruff, D. P.. Frequent directions: simple and deterministic matrix sketching. SIAM Journal on Computing, 45(5):1762–1792, 2016.CrossRef Google Scholar

Ghashami, M. and Phillips, J. M.. Relative errors for deterministic low-rank matrix approximations. In ACM-SIAM Symposium on Discrete Algorithms, pages 707–717, 2014.Google Scholar

Giannopoulos, P., Knauer, C., Wahlstrom, M., and Werner, D.. Hardness of discrepancy computation and ε-net verification in high dimension. Journal of Complexity, 28(2):162–176, 2012.Google Scholar

Gibbons, P. and Tirthapura, S.. Estimating simple functions on the union of data streams. In ACM Symposium on Parallel Algorithms and Architectures, pages 281–290, 2001.Google Scholar

Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., and Strauss, M.. Fast, small-space algorithms for approximate histogram maintenance. In ACM Symposium on Theory of Computing, pages 389–398, 2002.Google Scholar

Gilbert, A., Guha, S., Indyk, P., Muthukrishnan, S., and Strauss, M.. Near-optimal sparse Fourier representation via sampling. In ACM Symposium on Theory of Computing, pages 152–161, 2002.Google Scholar

Gilbert, A., Kotidis, Y., Muthukrishnan, S., and Strauss, M.. Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. IEEE Transactions on Knowledge and Data Engineering, 15(3):541–554, 2003.Google Scholar

Gilbert, A. C. and Indyk, P.. Sparse recovery using sparse matrices. Proceedings of the IEEE, 98(6):937–947, 2010.Google Scholar

Gilbert, A. C., Kotidis, Y., Muthukrishnan, S., and Strauss, M. J.. How to summarize the universe: dynamic maintenance of quantiles. In International Conference on Very Large Data Bases, pages 454–465, 2002.Google Scholar

Gionis, A., Indyk, P., and Motwani, R.. Similarity search in high dimensions via hashing. In International Conference on Very Large Data Bases, pages 518–529, 1999.Google Scholar

Goel, A., Indyk, P., and Varadarajan, K.. Reductions among high dimensional proximity problems. In ACM-SIAM Symposium on Discrete Algorithms, pages 769–778, 2001.Google Scholar

Golab, L. and Özsu, M. T.. Issues in data stream management. SIGMOD Record, 32(2):5–14, June 2003.Google Scholar

Goodrich, M. T. and Mitzenmacher, M.. Invertible Bloom lookup tables. In Annual Allerton Conference on Communication, Control, and Computing, pages 792–799, 2011.Google Scholar

Greenwald, M. and Khanna, S.. Space-efficient online computation of quantile summaries. In ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.Google Scholar

Greenwald, M. and Khanna, S.. Power-conserving computation of order-statistics over sensor networks. In ACM Symposium on Principles of Database Systems, pages 275–285, 2004.Google Scholar

Gronemeier, A. and Sauerhoff, M.. Applying approximate counting for computing the frequency moments of long data streams. Theory of Computer Systems, 44(3):332–348, 2009.Google Scholar

Guha, S.. Tight results for clustering and summarizing data streams. In International Conference on Database Theory, pages 268–275, 2009.Google Scholar

Guha, S., Meyerson, A., Mishra, N., Motwani, R., and O’Callaghan, L.. Clustering data streams: theory and practice. IEEE Transactions on Knowledge and Data Engineering, 15(3):515–528, 2003.Google Scholar

Hall, A., Bachmann, O., Büssow, R., Ganceanu, S., and Nunkesser, M.. Processing a trillion cells per mouse click. PVLDB, 5(11):1436–1446, 2012.Google Scholar

Har-Peled, S., Indyk, P., and Motwani, R.. Approximate nearest neighbor: towards removing the curse of dimensionality. Theory of Computing, 8:321–350, 2012.Google Scholar

Hassanieh, H., Indyk, P., Katabi, D., and Price, E.. Simple and practical algorithm for sparse fourier transform. In ACM-SIAM Symposium on Discrete Algorithms, pages 1183–1194, 2012.Google Scholar

Haussler, D. and Welzl, E.. Epsilon-nets and simplex range queries. Discrete and Computational Geometry, 2:127–151, 1987.Google Scholar

Heule, S., Nunkesser, M., and Hall, A.. Hyperloglog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In International Conference on Extending Database Technology, pages 683–692, 2013.Google Scholar

Huang, Z., Wang, L., Yi, K., and Liu, Y.. Sampling based algorithms for quantile computation in sensor networks. In ACM SIGMOD International Conference on Management of Data, pages 745–756, 2011.Google Scholar

Huang, Z. and Yi, K.. The communication complexity of distributed epsilon-approximations. In IEEE Conference on Foundations of Computer Science, pages 591–600, 2014.Google Scholar

Huang, Z., Yi, K., Liu, Y., and Chen, G.. Optimal sampling algorithms for frequency estimation in distributed data. In IEEE INFOCOM, pages 1997–2005, 2011.Google Scholar

Hung, R. Y. S. and Ting, H. F.. An ω(1/ϵ log 1/ϵ) space lower bound for finding ϵ-approximate quantiles in a data stream. In Proceedings of the 4th International Conference on Frontiers in Algorithmics, pages 89–100, 2010.Google Scholar

Indyk, P.. A small approximately min-wise independent family of hash functions. Journal of Algorithms, 38(1):84–90, 2001.Google Scholar

Indyk, P.. Stable distributions, pseudorandom generators, embeddings and data stream computation. Journal of the ACM, 53(3):307–323, 2006.CrossRef Google Scholar

Indyk, P., Matoušek, J., and Sidiropoulos, A.. Low-distortion embeddings of finite metric spaces, In Toth, C. D., O’Rourke, J., and Goodman, J. E., eds., Handbook of Discrete and Computational Geometry, 3rd edition, pages 211–231. CRC Press, 2017.Google Scholar

Indyk, P. and Motwani, R.. Approximate nearest neighbors: towards removing the curse of dimensionality. In ACM Symposium on Theory of Computing, pages 604–613, 1998.Google Scholar

Ivkin, N., Liberty, E., Lang, K., Karnin, Z., and Braverman, V.. Streaming quantiles algorithms with small space and update time. ArXiV CoRR abs/1907.00236, 2019.Google Scholar

Jayaram, R. and Woodruff, D. P.. Perfect Lp sampling in a data stream. In IEEE Conference on Foundations of Computer Science, pages 544–555, 2018.Google Scholar

Jayram, T. S.. Information complexity: a tutorial. In ACM Symposium on Principles of Database Systems, pages 159–168, 2010.Google Scholar

Jayram, T. S., Kumar, R., and Sivakumar, D.. The one-way communication complexity of gap hamming distance. www.madalgo.au.dk/img/SumSchoo2007 Lecture20slides/Bibliography/p14 Jayram 07 Manusc ghd.pdf, 2007.Google Scholar

Jayram, T. S. and Woodruff, D. P.. The data stream space complexity of cascaded norms. In IEEE Conference on Foundations of Computer Science, pages 765–774, 2009.CrossRef Google Scholar

Jayram, T. S. and Woodruff, D. P.. Optimal bounds for Johnson–Lindenstrauss transforms and streaming problems with low error. In ACM-SIAM Symposium on Discrete Algorithms, pages 1–10, 2011.Google Scholar

Jin, C., Qian, W., Sha, C., Yu, J. X., and Zhou, A.. Dynamically maintaining frequent items over a data stream. In CIKM, pages 287–294, 2003.Google Scholar

Johnson, W. and Lindenstrauss, J.. Extensions of Lipshitz mapping into Hilbert space. Contemporary Mathematics, 26:189–206, 1984.Google Scholar

Jowhari, H. and Ghodsi, M.. New streaming algorithms for counting triangles in graphs. In International Conference on Computing and Combinatorics, pages 710–716, 2005.Google Scholar

Jowhari, H., Saglam, M., and Tardos, G.. Tight bounds for Lp samplers, finding duplicates in streams, and related problems. In ACM Symposium on Principles of Database Systems, pages 49–58, 2011.Google Scholar

Kalyanasundaram, B. and Schnitger, G.. The probabilistic communication complexity of set intersection. SIAM Journal on Discrete Mathematics, 5(4):545–557, 1992.Google Scholar

Kane, D. M. and Nelson, J.. Sparser Johnson–Lindenstrauss transforms. In ACM-SIAM Symposium on Discrete Algorithms, pages 1195–1206, 2012.CrossRef Google Scholar

Kane, D. M., Nelson, J., and Woodruff, D. P.. An optimal algorithm for the distinct elements problem. In ACM Symposium on Principles of Database Systems, pages 41–52, 2010.Google Scholar

Kapron, B. M., King, V., and Mountjoy, B.. Dynamic graph connectivity in poly-logarithmic worst case time. In ACM-SIAM Symposium on Discrete Algorithms, pages 1131–1142, 2013.Google Scholar

Karnin, Z., Lang, K., and Liberty, E.. Optimal quantile approximation in streams. In IEEE Conference on Foundations of Computer Science, pages 41–52, 2016.Google Scholar

Karp, R., Papadimitriou, C., and Shenker, S.. A simple algorithm for finding frequent elements in sets and bags. ACM Transactions on Database Systems, 28:51–55, 2003.Google Scholar

Karp, R. M. and Rabin, M. O.. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249–260, 1987.CrossRef Google Scholar

Kirsch, A. and Mitzenmacher, M.. Less hashing, same performance: building a better Bloom filter. In European Symposium on Algorithms (ESA), pages 456–467, 2006.Google Scholar

Knuth, D. E.. The Art of Computer Programming, Vol. 1, Fundamental Algorithms. Addison-Wesley, 2nd edition, 1998.Google Scholar

Knuth, D. E.. The Art of Computer Programming, Vol. 2, Seminumerical Algorithms. Addison-Wesley, 2nd edition, 1998.Google Scholar

Kollios, G., Byers, J., Considine, J., Hadjieleftheriou, M., and Li, F.. Robust aggregation in sensor networks. IEEE Data Engineering Bulletin, 28(1), March 2005.Google Scholar

Komlós, J., Pach, J., and Woeginger, G.. Almost tight bounds for ε-nets. Discrete and Computational Geometry, 7:163–173, 1992.Google Scholar

Kumar, P., Mitchell, J. S. B., and Yildirim, E. A.. Approximate minimum enclosing balls in high dimensions using core-sets. ACM Journal of Experimental Algorithmics, 8, 2003.Google Scholar

Kushilevitz, E. and Nisan, N.. Communication Complexity. Cambridge University Press, 1997.Google Scholar

Kushilevitz, E., Ostrovsky, R., and Rabani, Y.. Efficient search for approximate nearest neighbor in high dimensional spaces. In ACM Symposium on Theory of Computing, pages 614–623, 1998.Google Scholar

Lang, K. J.. Back to the future: an even more nearly optimal cardinality estimation algorithm. Technical report, ArXiV, 2017.Google Scholar

Larsen, K. G., Nelson, J., Nguyen, H. L., and Thorup, M.. Heavy hitters via cluster-preserving clustering. In IEEE Conference on Foundations of Computer Science, pages 61–70, 2016.Google Scholar

Lee, G. M., Liu, H., Yoon, Y., and Zhang, Y.. Improving sketch reconstruction accuracy using linear least squares method. In Internet Measurement Conference, pages 273–278, 2005.Google Scholar

Lee, L. and Ting, H.. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In ACM Symposium on Principles of Database Systems, pages 290–297, 2006.Google Scholar

Li, P.. Very sparse stable random projections, estimators and tail bounds for stable random projections. Technical Report cs.DS/0611114, ArXiV, 2006.Google Scholar

Li, Y., Long, P., and Srinivasan, A.. Improved bounds on the sample complexity of learning. Journal of Computer and System Sciences, 62(3):516–527, 2001.Google Scholar

Liberty, E.. Simple and deterministic matrix sketching. In ACM SIGKDD, pages 581–588, 2013.Google Scholar

Lipton, R. J.. Fingerprinting sets. Technical Report CS-TR-212-89, Princeton, 1989.Google Scholar

Lu, Y., Montanari, A., Dharmapurikar, S., Kabbani, A., and Prabhakar, B.. Counter braids: a novel counter architecture for per-flow measurement. In ACM SIGMETRICS, pages 121–132, 2008.Google Scholar

Lumbroso, J. O.. How Flajolet processed streams with coin flips. Technical Report 1805.00612, ArXiV, 2018.Google Scholar

Manjhi, A., Shkapenyuk, V., Dhamdhere, K., and Olston, C.. Finding (recently) frequent items in distributed data streams. In IEEE International Conference on Data Engineering, pages 767–778, 2005.Google Scholar

Manku, G. S., Rajagopalan, S., and Lindsay, B. G.. Approximate medians and other quantiles in one pass and with limited memory. In ACM SIGMOD International Conference on Management of Data, pages 426–435, 1998.Google Scholar

Manku, G. S., Rajagopalan, S., and Lindsay, B. G.. Random sampling techniques for space efficient online computation of order statistics of large datasets. In ACM SIGMOD International Conference on Management of Data, pages 251–262, 1999.Google Scholar

Matoušek, J.. Tight upper bounds for the discrepancy of halfspaces. Discrete and Computational Geometry, 13:593–601, 1995.Google Scholar

McGregor, A., Vorotnikova, S., and Vu, H. T.. Better algorithms for counting triangles in data streams. In ACM Symposium on Principles of Database Systems, pages 401–411, 2016.Google Scholar

McIlroy, D.. Development of a spelling list. Technical report, Bell Labs, 1982.Google Scholar

Metwally, A., Agrawal, D., and Abbadi, A. E.. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Transactions on Database Systems, 31(3):1095–1133, 2006.Google Scholar

Misra, J. and Gries, D.. Finding repeated elements. Science of Computer Programming, 2:143–152, 1982.Google Scholar

Mitzenmacher, M.. Bloom Filters, pages 252–255. Springer, 2009.Google Scholar

Mitzenmacher, M. and Upfal, E.. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.CrossRef Google Scholar

Mitzenmacher, M. and Varghese, G.. Biff (Bloom filter) codes: fast error correction for large data sets. In IEEE International Symposium on Information Theory, pages 483–487, 2012.CrossRef Google Scholar

Molloy, M.. Cores in random hypergraphs and Boolean formulas. Random Structures and Algorithms, 27(1):124–135, 2005.CrossRef Google Scholar

Monemizadeh, M. and Woodruff, D. P.. 1-pass relative-error lp-sampling with applications. In ACM-SIAM Symposium on Discrete Algorithms, pages 1143–1160, 2010.Google Scholar

Morris, R.. Counting large numbers of events in small registers. Communications of the ACM, 21(10):840–842, 1977.Google Scholar

Moser, S. and Chen, P. N.. A Student’s Guide to Coding and Information Theory. Cambridge University Press, 2012.CrossRef Google Scholar

Motwani, R. and Raghavan, P.. Randomized Algorithms. Cambridge University Press, 1995.Google Scholar

Mount, D. and Arya, S.. ANN: library for approximate nearest neighbor searching. Technical report, University of Maryland, 2010.Google Scholar

Munro, J. I. and Paterson, M. S.. Selection and sorting with limited storage. Theoretical Computer Science, 12:315–323, 1980.Google Scholar

Nelson, J. and Nguyen, H. L.. OSNAP: faster numerical linear algebra algorithms via sparser subspace embeddings. In IEEE Conference on Foundations of Computer Science, pages 117–126, 2013.Google Scholar

Nelson, J. and Nguyen, H. L.. Sparsity lower bounds for dimensionality reducing maps. In ACM Symposium on Theory of Computing, pages 101–110, 2013.Google Scholar

Nelson, J. and Woodruff, D.. Fast Manhattan sketches in data streams. In ACM Symposium on Principles of Database Systems, pages 99–110, 2010.CrossRef Google Scholar

O’Donnell, R., Wu, Y., and Zhou, Y.. Optimal lower bounds for locality-sensitive hashing (except when q is tiny). ACM Transactions on Computation Theory, 6(1):5, 2014.Google Scholar

Pach, J. and Tardos, G.. Tight lower bounds for the size of epsilon-nets. Journal of the American Mathematical Society, 26:645–658, 2013.Google Scholar

Pagh, R.. Compressed matrix multiplication. In ITCS, pages 442–451, 2012.Google Scholar

Pavan, A., Tangwongsan, K., Tirthapura, S., and Wu, K.. Counting and sampling triangles from a graph stream. PVLDB, 6(14):1870–1881, 2013.Google Scholar

Pavan, A. and Tirthapura, S.. Range-efficient counting of distinct elements in a massive data stream. SIAM Journal on Computing, 37(2):359–379, 2007.Google Scholar

Pham, N. and Pagh, R.. Fast and scalable polynomial kernels via explicit feature maps. In ACM SIGKDD, pages 239–247, 2013.Google Scholar

Pike, R., Dorward, S., Griesemer, R., and Quinlan, S.. Interpreting the data: parallel analysis with sawzall. Dynamic Grids and Worldwide Computing, 13(4):277–298, 2005.Google Scholar

Razborov, A. A.. On the distributional complexity of disjointness. Theoretical Computer Science, 106(2):385–390, 1992.Google Scholar

Sarlós, T.. Improved approximation algorithms for large matrices via random projections. In IEEE Conference on Foundations of Computer Science, pages 143–152, 2006.Google Scholar

Särndal, C.-E., Swensson, B., and Wretman, J.. Model Assisted Survey Sampling. Springer, 1992.Google Scholar

Schechter, S. E., Herley, C., and Mitzenmacher, M.. Popularity is everything: a new approach to protecting passwords from statistical-guessing attacks. In 5th USENIX Workshop on Hot Topics in Security, pages 1–8, 2010.Google Scholar

Schmidt, J. P., Siegel, A., and Srinivasan, A.. Chernoff–Hoeffding bounds for applications with limited independence. In ACM-SIAM Symposium on Discrete Algorithms, pages 331–340, 1993.Google Scholar

Schweller, R., Li, Z., Chen, Y., Gao, Y., Gupta, A., Zhang, Y., Dinda, P. A., Kao, M.-Y., and Memik, G.. Reversible sketches: enabling monitoring and analysis over high-speed data streams. IEEE Transactions on Networks, 15(5):1059–1072, 2007.Google Scholar

Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A. J., and Vishwanathan, S. V. N.. Hash kernels for structured data. Journal of Machine Learning Research, 10:2615–2637, 2009.Google Scholar

Shrivastava, N., Buragohain, C., Agrawal, D., and Suri, S.. Medians and beyond: new aggregation techniques for sensor networks. In ACM SenSys, Pages 239–249, 2004.Google Scholar

Simpson, O., Seshadhri, C., and McGregor, A.. Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution. In IEEE International Conference on Data Mining, pages 979–984, 2015.Google Scholar

Srinivasan, A.. Improving the discrepancy bound for sparse matrices: better approximations for sparse lattice approximation problems. In ACM-SIAM Symposium on Discrete Algorithms, pages 692–701, 1997.Google Scholar

Suri, S., Tóth, C. D., and Zhou, Y.. Range counting over multidimensional data streams. Discrete and Computational Geometry, 26(4):633–655, 2006.Google Scholar

Szegedy, M.. The DLT priority sampling is essentially optimal. In ACM Symposium on Theory of Computing, pages 150–158, 2006.Google Scholar

Szegedy, M. and Thorup, M.. On the variance of subset sum estimation. In European Symposium on Algorithms, pages 75–86, 2007.Google Scholar

Talagrand, M.. Sharper bounds for Gaussian and empirical processes. The Annals of Probability, 22(1):28–76, 1994.Google Scholar

Team, D. P.. Learning with privacy at scale. Apple Machine Learning Journal, 1(8):1–25, December 2017.Google Scholar

Thorup, M.. Even strongly universal hashing is pretty fast. In ACM-SIAM Symposium on Discrete Algorithms, pages 496–497, 2000.Google Scholar

Thorup, M.. Equivalence between priority queues and sorting. Journal of the ACM, 54(6):1–27, 2007.CrossRef Google Scholar

Thorup, M. and Zhang, Y.. Tabulation based 4-universal hashing with applications to second moment estimation. In ACM-SIAM Symposium on Discrete Algorithms, pages 615–624, 2004.Google Scholar

Ting, D.. Count-min: optimal estimation and tight error bounds using empirical error distributions. In ACM SIGKDD, pages 2319–2328, 2018.Google Scholar

Tirthapura, S. and Woodruff, D. P.. Rectangle-efficient aggregation in spatial data streams. In ACM Symposium on Principles of Database Systems, pages 283–294, 2012.Google Scholar

Tirthapura, S. and Woodruff, D. P.. A general method for estimating correlated aggregates over a data stream. Algorithmica, 73(2):235–260, 2015.Google Scholar

Tridgell, A. and Mackerras, P.. The rsync algorithm. Technical Report TR-CS-96-05, Department of Computer Science, The Australian National University, 1996.Google Scholar

Tsang, I. W., Kwok, J. T., and Cheung, P.-M.. Core vector machines: fast SVM training on very large data sets. Journal of Machine Learning Research, 6:363–392, 2005.Google Scholar

Vapnik, V. N. and Chervonenkis, A. Y.. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, 16:264–280, 1971.CrossRef Google Scholar

Venkataraman, S., Song, D. X., Gibbons, P. B., and Blum, A.. New streaming algorithms for fast detection of superspreaders. In Network and Distributed System Security Symposium, pages 149–166, 2005.Google Scholar

Vitter, J. S.. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37–57, March 1985.Google Scholar

Wang, J., Liu, W., Kumar, S., and Chang, S.-F.. Learning to hash for indexing big data: a survey. Proceedings of the IEEE, 104(1):34–57, 2016.Google Scholar

Wang, L., Luo, G., Yi, K., and Cormode, G.. Quantiles over data streams: an experimental study. In ACM SIGMOD International Conference on Management of Data, pages 737–748, 2013.Google Scholar

Whang, K. Y., Vander-Zanden, B. T., and Taylor, H. M.. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 15(2):208, 1990.Google Scholar

Woodruff, D.. Optimal space lower bounds for all frequency moments. In ACM-SIAM Symposium on Discrete Algorithms, pages 167–175, 2004.Google Scholar

Woodruff, D. P.. Low rank approximation lower bounds in row-update streams. In Advances in Neural Information Processing Systems, pages 1781–1789, 2014.Google Scholar

Woodruff, D. P.. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1–2):1–157, October 2014.Google Scholar

Woodruff, D. P. and Zhang, Q.. Tight bounds for distributed functional monitoring. In ACM Symposium on Theory of Computing, pages 941–960, 2012.Google Scholar

Woodruff, D. P. and Zhang, Q.. Subspace embeddings and ℓ_p -regression using exponential random variables. In Conference on Learning Theory, pages 546–567, 2013.Google Scholar

Yu, H., Agarwal, P. K., Poreddy, R., and Varadarajan, K. R.. Practical methods for shape fitting and kinetic data structures using coresets. Algorithmica, 52(3):378–402, 2008.Google Scholar

Zarrabi-Zadeh, H.. An almost space-optimal streaming algorithm for coresets in fixed dimensions. Algorithmica, 60(1):46–59, 2011.Google Scholar

Zhang, Q., Pell, J., Canino-Koning, R., Howe, A. C., and Brown, C. T.. These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. PLoS ONE, 9(7):1–13, July 2014.Google Scholar

Zhang, Y., Singh, S., Sen, S., Duffield, N., and Lund, C.. Online identification of hierarchical heavy hitters: algorithms, evaluation and applications. In Internet Measurement Conference, pages 101–114, 2004.Google Scholar

Zhao, Q., Ogihara, M., Wang, H., and Xu, J.. Finding global icebergs over distributed data sets. In ACM Symposium on Principles of Database Systems, pages 298–307, 2006.Google Scholar

Zolotarev, V. M.. One dimensional stable distributions, volume 65 of Translations of Mathematical Monographs. American Mathematical Society, 1983.Google Scholar

Book contents

References

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive