Skip to main content Accessibility help
×
Hostname: page-component-77c89778f8-cnmwb Total loading time: 0 Render date: 2024-07-19T21:21:11.336Z Has data issue: false hasContentIssue false

References

Published online by Cambridge University Press:  22 October 2020

Graham Cormode
Affiliation:
University of Warwick
Ke Yi
Affiliation:
Hong Kong University of Science and Technology
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Achlioptas, D.. Database-friendly random projections. In ACM Symposium on Principles of Database Systems, pages 274–281, 2001.CrossRefGoogle Scholar
Agarwal, P. K., Cormode, G., Huang, Z., Phillips, J., Wei, Z., and Mergeable, K. Yi. summaries. ACM Transactions on Database Systems, 38(4), 2013.Google Scholar
Agarwal, P. K., Har-Peled, S., and Varadarajan, K. R.. Approximating extent measures of points. Journal of the ACM, 51:606635, 2004.CrossRefGoogle Scholar
Agarwal, P. K. and Sharathkumar, R.. Streaming algorithms for extent problems in high dimensions. In ACM-SIAM Symposium on Discrete Algorithms, pages 1481–1489, 2010.Google Scholar
Agarwal, P. K. and Yu, H.. A space-optimal data-stream algorithm for coresets in the plane. In Symposium on Computational Geometry, pages 1–10, 2007.Google Scholar
Aggarwal, C. C.. On biased reservoir sampling in the presence of stream evolution. In International Conference on Very Large Data Bases, pages 607–618, 2006.Google Scholar
Ahn, K. J., Guha, S., and McGregor, A.. Analyzing graph structure via linear measurements. In ACM-SIAM Symposium on Discrete Algorithms, pages 459–467, 2012.Google Scholar
Ailon, N. and Chazelle, B.. Approximate nearest neighbors and the fast Johnson– Lindenstrauss transform. SIAM Journal on Computing, 39(1):302322, 2009.CrossRefGoogle Scholar
Alon, N., Gibbons, P., Matias, Y., and Szegedy, M.. Tracking join and self-join sizes in limited storage. In ACM Symposium on Principles of Database Systems, pages 10–20, 1999.Google Scholar
Alon, N., Matias, Y., and Szegedy, M.. The space complexity of approximating the frequency moments. In ACM Symposium on Theory of Computing, pages 20–29, 1996.Google Scholar
Alon, N., Matias, Y., and Szegedy, M.. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58:137147, 1999.CrossRefGoogle Scholar
Anderson, D., Bevan, P., Lang, K., Liberty, E., Rhodes, L., and Thaler, J.. A high-performance algorithm for identifying frequent items in data streams. In Internet Measurement Conference, pages 268–282, 2017.CrossRefGoogle Scholar
Andoni, A. and Indyk, P.. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In IEEE Conference on Foundations of Computer Science, pages 459–468, 2006.Google Scholar
Andoni, A., Indyk, P., and Razenshteyn, I.. Approximate nearest neighbor search in high dimensions. https://arxiv.org/abs/1806.09823, 2018.Google Scholar
Andoni, A. and Nguyên, H. L.. Width of points in the streaming model. ACM Transactions on Algorithms, 12(1):5:1–5:10, 2016.Google Scholar
Aronov, B., Ezra, E., and Sharir, M.. Small-size ε-nets for axis-parallel rectangles and boxes. SIAM Journal on Computing, 39(7):32483282, 2010.Google Scholar
Arya, S., Mount, D., Netanyahu, N. S., Silverman, R., and Wu, A. Y.. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM, 45(6):891923, 1998.Google Scholar
Bansal, N.. Constructive algorithms for discrepancy minimization. In IEEE Conference on Foundations of Computer Science, 2010.Google Scholar
Bar-Yossef, Z., Jayram, T., Kumar, R., Sivakumar, D., and Trevisian, L.. Counting distinct elements in a data stream. In Proceedings of RANDOM 2002, pages 1–10, 2002.Google Scholar
Bar-Yossef, Z., Kumar, R., and Sivakumar, D.. Reductions in streaming algorithms, with an application to counting triangles in graphs. In ACM-SIAM Symposium on Discrete Algorithms, pages 623–632, 2002.Google Scholar
Barkay, N., Porat, E., and Shalem, B.. Feasible sampling of non-strict turnstile data streams. In Fundamentals of Computation Theory, pages 48–59, September 2013.Google Scholar
Ben-Basat, R., Einziger, G., Friedman, R., and Kassner, Y.. Heavy hitters in streams and sliding windows. In IEEE INFOCOM, page 1–9, 2016.Google Scholar
Ben-Basat, R., Einziger, G., Friedman, R., and Kassner, Y.. Optimal elephant flow detection. In IEEE INFOCOM, pages 1–9, 2017.Google Scholar
Ben-Basat, R., Einziger, G., and Friedman, R.. Fast flow volume estimation. In Proceedings of the International Conference on Distributed Computing and Networking, pages 44:1–44:10, 2018.Google Scholar
Bentley, J. L. and Saxe, J. B.. Decomposable searching problems I: static-to-dynamic transformation. Journal of Algorithms, 1:301358, 1980.Google Scholar
Berinde, R., Cormode, G., Indyk, P., and Strauss, M.. Space-optimal heavy hitters with strong error bounds. In ACM Symposium on Principles of Database Systems, pages 157–166, 2009.CrossRefGoogle Scholar
Beyer, K. S., Haas, P. J., Reinwald, B., Sismanis, Y., and Gemulla, R.. On synopses for distinct-value estimation under multiset operations. In ACM SIGMOD International Conference on Management of Data, pages 199–210, 2007.Google Scholar
Bianchi, G., Duffy, K., Leith, D. J., and Shneer, V.. Modeling conservative updates in multi-hash approximate count sketches. In 24th International Teletraffic Congress, pages 1–8, 2012.Google Scholar
Błasiok, J.. Optimal streaming and tracking distinct elements with high probability. In ACM-SIAM Symposium on Discrete Algorithms, pages 2432–2448, 2018.Google Scholar
Bledaite, L.. Count-min sketches in real data applications. https://skillsmatter.com/skillscasts/6844-count-min-sketch-in-real-data-applications, 2015.Google Scholar
Bloom, B.. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422426, July 1970.Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M.. Learnability and the Vapnik–Chervonenkis dimension. Journal of the ACM, 36:929965, 1989.Google Scholar
Bollobás, B.. Extremal Graph Theory. Academic Press, 1978.Google Scholar
Bose, P., Kranakis, E., Morin, P., and Tang, Y.. Bounds for frequency estimation of packet streams. In SIROCCO, pages 33–42, 2003.Google Scholar
Boyer, B. and Moore, J.. A fast majority vote algorithm. Technical Report ICSCA-CMP-32, Institute for Computer Science, University of Texas, Feb. 1981.Google Scholar
Braverman, V., Frahling, G., Lang, H., Sohler, C., and Yang, L. F.. Clustering high dimensional dynamic data streams. In the 34th International Conference on Machine Learning, pages 576–585, 2017.Google Scholar
Braverman, V. and Ostrovsky, R.. Smooth histograms for sliding windows. In IEEE Conference on Foundations of Computer Science, pages 283–293, 2007.CrossRefGoogle Scholar
Braverman, V. and Ostrovsky, R.. Zero-one frequency laws. In ACM Symposium on Theory of Computing, pages 281–290, 2010.CrossRefGoogle Scholar
Braverman, V., Ostrovsky, R., and Vilenchik, D.. How hard is counting triangles in the streaming model? In International Colloquium on Automata, Languages and Programming, pages 244–254, 2013.Google Scholar
Broder, A. Z. and Mitzenmacher, M.. Network applications of Bloom filters: a survey. Internet Mathematics, 1(4):485509, 2004.Google Scholar
Bădoiu, M. and Clarkson, K. L.. Smaller core-sets for balls. In ACM-SIAM Symposium on Discrete Algorithms, pages 801–802, 2003.Google Scholar
Bădoiu, M. and Clarkson, K. L.. Optimal core-sets for balls. Computational Geometry: Theory and Applications, 40(1):1422, 2008.CrossRefGoogle Scholar
Bădoiu, M., Har-Peled, S., and Indyk, P.. Approximate clustering via core-sets. In ACM Symposium on Theory of Computing, pages 250–257, 2002.Google Scholar
Buriol, L. S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., and Sohler, C.. Counting triangles in data streams. In ACM Symposium on Principles of Database Systems, pages 253–262, 2006.Google Scholar
Cai, D., Mitzenmacher, M., and Adams, R. P.. A Bayesian nonparametric view on count-min sketch. In Advances in Neural Information Processing Systems, pages 8782–8791, 2018.Google Scholar
Carter, J. L. and Wegman, M. N.. Universal classes of hash functions. Journal of Computer and System Sciences, 18(2):143154, 1979.CrossRefGoogle Scholar
Chambers, J., Mallows, C., and Stuck, B.. A method for simulating stable random variables. Journal of the American Statistical Association, 71(354):340344, 1976.Google Scholar
Chan, T. M.. Faster core-set constructions and data-stream algorithms in fixed dimensions. Computational Geometry: Theory and Applications, 35:2035, 2006.CrossRefGoogle Scholar
Chan, T. M. and Pathak, V.. Streaming and dynamic algorithms for minimum enclosing balls in high dimensions. In International Symposium on Algorithms and Data Structures, pages 195–206, 2011.Google Scholar
Chandra, K.. View counting at reddit. https://redditblog.com/2017/05/24/view-counting-at-reddit/, 2017.Google Scholar
Charikar, M., Chen, K., and Farach-Colton, M.. Finding frequent items in data streams. In Procedings of the International Colloquium on Automata, Languages and Programming, pages 693–703, 2002.CrossRefGoogle Scholar
Charikar, M., O’Callaghan, L., and Panigrahy, R.. Better streaming algorithms for clustering problems. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing, pages 30–39, 2003.Google Scholar
Charikar, M. S.. Similarity estimation techniques from rounding algorithms. In ACM Symposium on Theory of Computing, pages 380–388, 2002.Google Scholar
Chen, K. and Rao, S.. An improved frequent items algorithm with applications to web caching. Technical Report UCB/CSD-05-1383, EECS Department, University of California, Berkeley, 2005.Google Scholar
Clarkson, K. L. and Woodruff, D. P.. Numerical linear algebra in the streaming model. In ACM Symposium on Theory of Computing, pages 205–214, 2009.Google Scholar
Cohen, E.. Size-estimation framework with applications to transitive closure and reachability. Journal of Computer and System Sciences, 55(3):441453, 1997.Google Scholar
Cohen, E.. All-distances sketches, revisited: HIP estimators for massive graphs analysis. IEEE Transactions on Knowledge and Data Engineering, 27(9):23202334, 2015.Google Scholar
Cohen, E., Duffield, N., Kaplan, H., Lund, C., and Thorup, M.. Efficient stream sampling for variance-optimal estimation of subset sums. SIAM Journal on Computing, 40(5):14021431, 2011.Google Scholar
Cohen, E. and Strauss, M.. Maintaining time-decaying stream aggregates. In ACM Symposium on Principles of Database Systems, 223–233, 2003.CrossRefGoogle Scholar
Cohen, S. and Matias, Y.. Spectral Bloom filters. In ACM SIGMOD International Conference on Management of Data, 241–252, 2003.Google Scholar
Considine, J., Hadjieleftheriou, M., Li, F., Byers, J. W., and Kollios, G.. Robust approximate aggregation in sensor data management systems. ACM Transactions on Database Systems, 34(1):6:1–6:35, 2009.Google Scholar
Coppersmith, D. and Kumar, R.. An improved data stream algorithm for frequency moments. In ACM-SIAM Symposium on Discrete Algorithms, pages 151–156, 2004.Google Scholar
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C.. Introduction to Algorithms, 3rd edition. MIT Press, 2009.Google Scholar
Cormode, G., Datar, M., Indyk, P., and Muthukrishnan, S.. Comparing data streams using Hamming norms. IEEE Transactions on Knowledge and Data Engineering, 15(3):529541, 2003.CrossRefGoogle Scholar
Cormode, G. and Firmani, D.. On unifying the space of 0 -sampling algorithms. In Algorithm Engineering and Experiments, pages 163–172, 2013.Google Scholar
Cormode, G. and Garofalakis, M.. Sketching streams through the net: distributed approximate query tracking. In International Conference on Very Large Data Bases, pages 13–24, 2005.Google Scholar
Cormode, G., Garofalakis, M., and Sacharidis, D.. Fast approximate wavelet tracking on streams. In International Conference on Extending Database Technology, pages 4–22, 2006.Google Scholar
Cormode, G. and Hadjieleftheriou, M.. Finding frequent items in data streams. In International Conference on Very Large Data Bases, Pages 1530–1541, 2008.Google Scholar
Cormode, G. and Jowhari, H.. A second look at counting triangles in graph streams (corrected). Theoretical Computer Science, 683:2230, 2017.Google Scholar
Cormode, G. and Jowhari, H.. lp samplers and their applications: a survey. ACM Computing Surveys, pages 16:1–16:3, 2019.Google Scholar
Cormode, G., Korn, F., Muthukrishnan, S., and Srivastava, D.. Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In ACM Symposium on Principles of Database Systems, pages 263–272, 2006.CrossRefGoogle Scholar
Cormode, G., Korn, F., and Tirthapura, S.. Exponentially decayed aggregates on data streams. In IEEE International Conference on Data Engineering, pages 1379–1381, 2008.Google Scholar
Cormode, G. and Muthukrishnan, S.. Improved data stream summary: the Count-Min sketch and its applications. Technical Report 2003-20, DIMACS, 2003.Google Scholar
Cormode, G. and Muthukrishnan, S.. What’s hot and what’s not: tracking most frequent items dynamically. In ACM Symposium on Principles of Database Systems, pages 296–306, 2003.Google Scholar
Cormode, G. and Muthukrishnan, S.. What’s new: finding significant differences in network data streams. In Proceedings of IEEE Infocom, pages 1534–1545, 2004.Google Scholar
Cormode, G. and Muthukrishnan, S.. An improved data stream summary: the Count-Min sketch and its applications. Journal of Algorithms, 55(1):5875, 2005.Google Scholar
Cormode, G. and Muthukrishnan, S.. Space efficient mining of multigraph streams. In ACM Symposium on Principles of Database Systems, pages 271–282, 2005.Google Scholar
Cormode, G., Muthukrishnan, S., and Rozenbaum, I.. Summarizing and mining inverse distributions on data streams via dynamic inverse sampling. In International Conference on Very Large Data Bases, pages 25–36, 2005.Google Scholar
Cormode, G., Tirthapura, S., and Xu, B.. Time-decaying sketches for sensor data aggregation. In ACM Conference on Principles of Distributed Computing, pages 215–224, 2007.Google Scholar
Das, S., Antony, S., Agrawal, D., and Abbadi, A. E.. Cots: a scalable framework for parallelizing frequency counting over data streams. In IEEE International Conference on Data Engineering, pages 1323–1326, 2009.Google Scholar
Dasgupta, A., Lang, K. J., Rhodes, L., and Thaler, J.. A framework for estimating stream expression cardinalities. In International Conference on Database Theory, pages 6:1–6:17, 2016.Google Scholar
Dasgupta, S. and Gupta, A.. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1):6065, 2003.Google Scholar
Datar, M., Gionis, A., Indyk, P., and Motwani, R.. Maintaining stream statistics over sliding windows. In ACM-SIAM Symposium on Discrete Algorithms, pages 635–644, 2002.Google Scholar
Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. S.. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, pages 253–262, 2004.Google Scholar
Demaine, E., López-Ortiz, A., and Munro, J. I.. Frequency estimation of internet packet streams with limited space. In European Symposium on Algorithms (ESA), pages 348–360, 2002.Google Scholar
Deng, F. and Rafiei, D.. New estimation algorithms for streaming data: Count-Min can do more. Unpublished manuscript.Google Scholar
Dietzfelbinger, M., Goerdt, A., Mitzenmacher, M., Montanari, A., Pagh, R., and Rink, M.. Tight thresholds for cuckoo hashing via XORSAT. In International Colloquium on Automata, Languages and Programming, pages 213–225, 2010.CrossRefGoogle Scholar
Dobra, A. and Rusu, F.. Sketches for size of join estimation. ACM Transactions on Database Systems, 33(3): 5:1–15:46, 2008.Google Scholar
Donoho, D.. Compressed sensing. IEEE Transactions on on Information Theory, 52(4):12891306, April 2006.Google Scholar
Drineas, P., Magdon-Ismail, M., Mahoney, M. W., and Woodruff, D. P.. Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research, 13:34753506, 2012.Google Scholar
Duffield, N., Lund, C., and Thorup, M.. Estimating flow distributions from sampled flow statistics. In ACM SIGCOMM, pages 325–336, 2003.Google Scholar
Duffield, N., Lund, C., and Thorup, M.. Priority sampling for estimation of arbitrary subset sums. Journal of the ACM, 54(6):32, 2007.Google Scholar
Durand, M. and Flajolet, P.. Loglog counting of large cardinalities (extended abstract). In European Symposium on Algorithms, pages 605–617, 2003.Google Scholar
Durstenfeld, R.. Algorithm 235: random permutation. Communications of the ACM, 7(7):420, 1964.CrossRefGoogle Scholar
Eg̃eciog̃lu, O. and Kalantari, B.. Approximating the diameter of a set of points in the Euclidean space. Information Processing Letters, 32:205–211, 1989.Google Scholar
Einziger, G. and Friedman, R.. A formal analysis of conservative update based approximate counting. In International Conference on Computing, Networking and Communications, pages 255–259, 2015.Google Scholar
Elkin, M.. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Transactions on Algorithms, 7(2):20, 2011.Google Scholar
Eppstein, D. and Goodrich, M. T.. Straggler identification in round-trip data streams via Newton’s identities and invertible Bloom filters. IEEE Transactions on Knowledge and Data Engineering, 23(2):297306, 2011.Google Scholar
Erlingsson, Ú., Pihur, V., and Korolova, A.. RAPPOR: randomized aggregatable privacy-preserving ordinal response. In Computer and Communications Security, pages 1054–1067, 2014.Google Scholar
Estan, C. and Varghese, G.. New directions in traffic measurement and accounting. In ACM SIGCOMM, volume 32, 4 of Computer Communication Review, pages 323–338, 2002.Google Scholar
Fan, L., Cao, P., Almeida, J., and Broder, A.. Summary cache: A scalable wide-area web cache sharing protocol. In ACM SIGCOMM, pages 254–265, 1998.Google Scholar
Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., and Zhang, J.. Graph distances in the streaming model: the value of space. In ACM-SIAM Symposium on Discrete Algorithms, pages 745–754, 2005.Google Scholar
Felber, D. and Ostrovsky, R.. A randomized online quantile summary in O(1/ϵ log(1/ϵ)) words. In APPROX-RANDOM, pages 775–785, 2015.Google Scholar
Flajolet, P.. Approximate counting: a detailed analysis. BIT, 25:113134, 1985.Google Scholar
Flajolet, P., Fusy, E., Gandouet, O., and Meunier, F.. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Analysis of Algorithms, pages 127–146, 2007.Google Scholar
Flajolet, P. and Martin, G. N.. Probabilistic counting. In IEEE Conference on Foundations of Computer Science, pages 76–82, 1983.Google Scholar
Flajolet, P. and Martin, G. N.. Probabilistic counting algorithms for database applications. Journal of Computer and System Sciences, 31:182209, 1985.CrossRefGoogle Scholar
Frahling, G., Indyk, P., and Sohler, C.. Sampling in dynamic data streams and applications. In Symposium on Computational Geometry, pages 142–149, June 2005.CrossRefGoogle Scholar
Ganguly, S.. Counting distinct items over update streams. In International Sympoisum on Algorithms and Computation, pages 505–514, 2005.Google Scholar
Ghashami, M., Liberty, E., Phillips, J. M., and Woodruff, D. P.. Frequent directions: simple and deterministic matrix sketching. SIAM Journal on Computing, 45(5):17621792, 2016.CrossRefGoogle Scholar
Ghashami, M. and Phillips, J. M.. Relative errors for deterministic low-rank matrix approximations. In ACM-SIAM Symposium on Discrete Algorithms, pages 707–717, 2014.Google Scholar
Giannopoulos, P., Knauer, C., Wahlstrom, M., and Werner, D.. Hardness of discrepancy computation and ε-net verification in high dimension. Journal of Complexity, 28(2):162176, 2012.Google Scholar
Gibbons, P. and Tirthapura, S.. Estimating simple functions on the union of data streams. In ACM Symposium on Parallel Algorithms and Architectures, pages 281–290, 2001.Google Scholar
Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., and Strauss, M.. Fast, small-space algorithms for approximate histogram maintenance. In ACM Symposium on Theory of Computing, pages 389–398, 2002.Google Scholar
Gilbert, A., Guha, S., Indyk, P., Muthukrishnan, S., and Strauss, M.. Near-optimal sparse Fourier representation via sampling. In ACM Symposium on Theory of Computing, pages 152–161, 2002.Google Scholar
Gilbert, A., Kotidis, Y., Muthukrishnan, S., and Strauss, M.. Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. IEEE Transactions on Knowledge and Data Engineering, 15(3):541554, 2003.Google Scholar
Gilbert, A. C. and Indyk, P.. Sparse recovery using sparse matrices. Proceedings of the IEEE, 98(6):937947, 2010.Google Scholar
Gilbert, A. C., Kotidis, Y., Muthukrishnan, S., and Strauss, M. J.. How to summarize the universe: dynamic maintenance of quantiles. In International Conference on Very Large Data Bases, pages 454–465, 2002.Google Scholar
Gionis, A., Indyk, P., and Motwani, R.. Similarity search in high dimensions via hashing. In International Conference on Very Large Data Bases, pages 518–529, 1999.Google Scholar
Goel, A., Indyk, P., and Varadarajan, K.. Reductions among high dimensional proximity problems. In ACM-SIAM Symposium on Discrete Algorithms, pages 769–778, 2001.Google Scholar
Golab, L. and Özsu, M. T.. Issues in data stream management. SIGMOD Record, 32(2):514, June 2003.Google Scholar
Goodrich, M. T. and Mitzenmacher, M.. Invertible Bloom lookup tables. In Annual Allerton Conference on Communication, Control, and Computing, pages 792–799, 2011.Google Scholar
Greenwald, M. and Khanna, S.. Space-efficient online computation of quantile summaries. In ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.Google Scholar
Greenwald, M. and Khanna, S.. Power-conserving computation of order-statistics over sensor networks. In ACM Symposium on Principles of Database Systems, pages 275–285, 2004.Google Scholar
Gronemeier, A. and Sauerhoff, M.. Applying approximate counting for computing the frequency moments of long data streams. Theory of Computer Systems, 44(3):332348, 2009.Google Scholar
Guha, S.. Tight results for clustering and summarizing data streams. In International Conference on Database Theory, pages 268–275, 2009.Google Scholar
Guha, S., Meyerson, A., Mishra, N., Motwani, R., and O’Callaghan, L.. Clustering data streams: theory and practice. IEEE Transactions on Knowledge and Data Engineering, 15(3):515528, 2003.Google Scholar
Hall, A., Bachmann, O., Büssow, R., Ganceanu, S., and Nunkesser, M.. Processing a trillion cells per mouse click. PVLDB, 5(11):14361446, 2012.Google Scholar
Har-Peled, S., Indyk, P., and Motwani, R.. Approximate nearest neighbor: towards removing the curse of dimensionality. Theory of Computing, 8:321350, 2012.Google Scholar
Hassanieh, H., Indyk, P., Katabi, D., and Price, E.. Simple and practical algorithm for sparse fourier transform. In ACM-SIAM Symposium on Discrete Algorithms, pages 1183–1194, 2012.Google Scholar
Haussler, D. and Welzl, E.. Epsilon-nets and simplex range queries. Discrete and Computational Geometry, 2:127151, 1987.Google Scholar
Heule, S., Nunkesser, M., and Hall, A.. Hyperloglog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In International Conference on Extending Database Technology, pages 683–692, 2013.Google Scholar
Huang, Z., Wang, L., Yi, K., and Liu, Y.. Sampling based algorithms for quantile computation in sensor networks. In ACM SIGMOD International Conference on Management of Data, pages 745–756, 2011.Google Scholar
Huang, Z. and Yi, K.. The communication complexity of distributed epsilon-approximations. In IEEE Conference on Foundations of Computer Science, pages 591–600, 2014.Google Scholar
Huang, Z., Yi, K., Liu, Y., and Chen, G.. Optimal sampling algorithms for frequency estimation in distributed data. In IEEE INFOCOM, pages 1997–2005, 2011.Google Scholar
Hung, R. Y. S. and Ting, H. F.. An ω(1/ϵ log 1/ϵ) space lower bound for finding ϵ-approximate quantiles in a data stream. In Proceedings of the 4th International Conference on Frontiers in Algorithmics, pages 89–100, 2010.Google Scholar
Indyk, P.. A small approximately min-wise independent family of hash functions. Journal of Algorithms, 38(1):8490, 2001.Google Scholar
Indyk, P.. Stable distributions, pseudorandom generators, embeddings and data stream computation. Journal of the ACM, 53(3):307323, 2006.CrossRefGoogle Scholar
Indyk, P., Matoušek, J., and Sidiropoulos, A.. Low-distortion embeddings of finite metric spaces, In Toth, C. D., O’Rourke, J., and Goodman, J. E., eds., Handbook of Discrete and Computational Geometry, 3rd edition, pages 211–231. CRC Press, 2017.Google Scholar
Indyk, P. and Motwani, R.. Approximate nearest neighbors: towards removing the curse of dimensionality. In ACM Symposium on Theory of Computing, pages 604–613, 1998.Google Scholar
Ivkin, N., Liberty, E., Lang, K., Karnin, Z., and Braverman, V.. Streaming quantiles algorithms with small space and update time. ArXiV CoRR abs/1907.00236, 2019.Google Scholar
Jayaram, R. and Woodruff, D. P.. Perfect Lp sampling in a data stream. In IEEE Conference on Foundations of Computer Science, pages 544–555, 2018.Google Scholar
Jayram, T. S.. Information complexity: a tutorial. In ACM Symposium on Principles of Database Systems, pages 159–168, 2010.Google Scholar
Jayram, T. S., Kumar, R., and Sivakumar, D.. The one-way communication complexity of gap hamming distance. www.madalgo.au.dk/img/SumSchoo2007 Lecture20slides/Bibliography/p14 Jayram 07 Manusc ghd.pdf, 2007.Google Scholar
Jayram, T. S. and Woodruff, D. P.. The data stream space complexity of cascaded norms. In IEEE Conference on Foundations of Computer Science, pages 765–774, 2009.CrossRefGoogle Scholar
Jayram, T. S. and Woodruff, D. P.. Optimal bounds for Johnson–Lindenstrauss transforms and streaming problems with low error. In ACM-SIAM Symposium on Discrete Algorithms, pages 1–10, 2011.Google Scholar
Jin, C., Qian, W., Sha, C., Yu, J. X., and Zhou, A.. Dynamically maintaining frequent items over a data stream. In CIKM, pages 287–294, 2003.Google Scholar
Johnson, W. and Lindenstrauss, J.. Extensions of Lipshitz mapping into Hilbert space. Contemporary Mathematics, 26:189206, 1984.Google Scholar
Jowhari, H. and Ghodsi, M.. New streaming algorithms for counting triangles in graphs. In International Conference on Computing and Combinatorics, pages 710–716, 2005.Google Scholar
Jowhari, H., Saglam, M., and Tardos, G.. Tight bounds for Lp samplers, finding duplicates in streams, and related problems. In ACM Symposium on Principles of Database Systems, pages 49–58, 2011.Google Scholar
Kalyanasundaram, B. and Schnitger, G.. The probabilistic communication complexity of set intersection. SIAM Journal on Discrete Mathematics, 5(4):545557, 1992.Google Scholar
Kane, D. M. and Nelson, J.. Sparser Johnson–Lindenstrauss transforms. In ACM-SIAM Symposium on Discrete Algorithms, pages 1195–1206, 2012.CrossRefGoogle Scholar
Kane, D. M., Nelson, J., and Woodruff, D. P.. An optimal algorithm for the distinct elements problem. In ACM Symposium on Principles of Database Systems, pages 41–52, 2010.Google Scholar
Kapron, B. M., King, V., and Mountjoy, B.. Dynamic graph connectivity in poly-logarithmic worst case time. In ACM-SIAM Symposium on Discrete Algorithms, pages 1131–1142, 2013.Google Scholar
Karnin, Z., Lang, K., and Liberty, E.. Optimal quantile approximation in streams. In IEEE Conference on Foundations of Computer Science, pages 41–52, 2016.Google Scholar
Karp, R., Papadimitriou, C., and Shenker, S.. A simple algorithm for finding frequent elements in sets and bags. ACM Transactions on Database Systems, 28:5155, 2003.Google Scholar
Karp, R. M. and Rabin, M. O.. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249260, 1987.CrossRefGoogle Scholar
Kirsch, A. and Mitzenmacher, M.. Less hashing, same performance: building a better Bloom filter. In European Symposium on Algorithms (ESA), pages 456–467, 2006.Google Scholar
Knuth, D. E.. The Art of Computer Programming, Vol. 1, Fundamental Algorithms. Addison-Wesley, 2nd edition, 1998.Google Scholar
Knuth, D. E.. The Art of Computer Programming, Vol. 2, Seminumerical Algorithms. Addison-Wesley, 2nd edition, 1998.Google Scholar
Kollios, G., Byers, J., Considine, J., Hadjieleftheriou, M., and Li, F.. Robust aggregation in sensor networks. IEEE Data Engineering Bulletin, 28(1), March 2005.Google Scholar
Komlós, J., Pach, J., and Woeginger, G.. Almost tight bounds for ε-nets. Discrete and Computational Geometry, 7:163173, 1992.Google Scholar
Kumar, P., Mitchell, J. S. B., and Yildirim, E. A.. Approximate minimum enclosing balls in high dimensions using core-sets. ACM Journal of Experimental Algorithmics, 8, 2003.Google Scholar
Kushilevitz, E. and Nisan, N.. Communication Complexity. Cambridge University Press, 1997.Google Scholar
Kushilevitz, E., Ostrovsky, R., and Rabani, Y.. Efficient search for approximate nearest neighbor in high dimensional spaces. In ACM Symposium on Theory of Computing, pages 614–623, 1998.Google Scholar
Lang, K. J.. Back to the future: an even more nearly optimal cardinality estimation algorithm. Technical report, ArXiV, 2017.Google Scholar
Larsen, K. G., Nelson, J., Nguyen, H. L., and Thorup, M.. Heavy hitters via cluster-preserving clustering. In IEEE Conference on Foundations of Computer Science, pages 61–70, 2016.Google Scholar
Lee, G. M., Liu, H., Yoon, Y., and Zhang, Y.. Improving sketch reconstruction accuracy using linear least squares method. In Internet Measurement Conference, pages 273–278, 2005.Google Scholar
Lee, L. and Ting, H.. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In ACM Symposium on Principles of Database Systems, pages 290–297, 2006.Google Scholar
Li, P.. Very sparse stable random projections, estimators and tail bounds for stable random projections. Technical Report cs.DS/0611114, ArXiV, 2006.Google Scholar
Li, Y., Long, P., and Srinivasan, A.. Improved bounds on the sample complexity of learning. Journal of Computer and System Sciences, 62(3):516527, 2001.Google Scholar
Liberty, E.. Simple and deterministic matrix sketching. In ACM SIGKDD, pages 581–588, 2013.Google Scholar
Lipton, R. J.. Fingerprinting sets. Technical Report CS-TR-212-89, Princeton, 1989.Google Scholar
Lu, Y., Montanari, A., Dharmapurikar, S., Kabbani, A., and Prabhakar, B.. Counter braids: a novel counter architecture for per-flow measurement. In ACM SIGMETRICS, pages 121–132, 2008.Google Scholar
Lumbroso, J. O.. How Flajolet processed streams with coin flips. Technical Report 1805.00612, ArXiV, 2018.Google Scholar
Manjhi, A., Shkapenyuk, V., Dhamdhere, K., and Olston, C.. Finding (recently) frequent items in distributed data streams. In IEEE International Conference on Data Engineering, pages 767–778, 2005.Google Scholar
Manku, G. S., Rajagopalan, S., and Lindsay, B. G.. Approximate medians and other quantiles in one pass and with limited memory. In ACM SIGMOD International Conference on Management of Data, pages 426–435, 1998.Google Scholar
Manku, G. S., Rajagopalan, S., and Lindsay, B. G.. Random sampling techniques for space efficient online computation of order statistics of large datasets. In ACM SIGMOD International Conference on Management of Data, pages 251–262, 1999.Google Scholar
Matoušek, J.. Tight upper bounds for the discrepancy of halfspaces. Discrete and Computational Geometry, 13:593601, 1995.Google Scholar
McGregor, A., Vorotnikova, S., and Vu, H. T.. Better algorithms for counting triangles in data streams. In ACM Symposium on Principles of Database Systems, pages 401–411, 2016.Google Scholar
McIlroy, D.. Development of a spelling list. Technical report, Bell Labs, 1982.Google Scholar
Metwally, A., Agrawal, D., and Abbadi, A. E.. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Transactions on Database Systems, 31(3):10951133, 2006.Google Scholar
Misra, J. and Gries, D.. Finding repeated elements. Science of Computer Programming, 2:143152, 1982.Google Scholar
Mitzenmacher, M.. Bloom Filters, pages 252–255. Springer, 2009.Google Scholar
Mitzenmacher, M. and Upfal, E.. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.CrossRefGoogle Scholar
Mitzenmacher, M. and Varghese, G.. Biff (Bloom filter) codes: fast error correction for large data sets. In IEEE International Symposium on Information Theory, pages 483–487, 2012.CrossRefGoogle Scholar
Molloy, M.. Cores in random hypergraphs and Boolean formulas. Random Structures and Algorithms, 27(1):124135, 2005.CrossRefGoogle Scholar
Monemizadeh, M. and Woodruff, D. P.. 1-pass relative-error lp-sampling with applications. In ACM-SIAM Symposium on Discrete Algorithms, pages 1143–1160, 2010.Google Scholar
Morris, R.. Counting large numbers of events in small registers. Communications of the ACM, 21(10):840842, 1977.Google Scholar
Moser, S. and Chen, P. N.. A Student’s Guide to Coding and Information Theory. Cambridge University Press, 2012.CrossRefGoogle Scholar
Motwani, R. and Raghavan, P.. Randomized Algorithms. Cambridge University Press, 1995.Google Scholar
Mount, D. and Arya, S.. ANN: library for approximate nearest neighbor searching. Technical report, University of Maryland, 2010.Google Scholar
Munro, J. I. and Paterson, M. S.. Selection and sorting with limited storage. Theoretical Computer Science, 12:315323, 1980.Google Scholar
Nelson, J. and Nguyen, H. L.. OSNAP: faster numerical linear algebra algorithms via sparser subspace embeddings. In IEEE Conference on Foundations of Computer Science, pages 117–126, 2013.Google Scholar
Nelson, J. and Nguyen, H. L.. Sparsity lower bounds for dimensionality reducing maps. In ACM Symposium on Theory of Computing, pages 101–110, 2013.Google Scholar
Nelson, J. and Woodruff, D.. Fast Manhattan sketches in data streams. In ACM Symposium on Principles of Database Systems, pages 99–110, 2010.CrossRefGoogle Scholar
O’Donnell, R., Wu, Y., and Zhou, Y.. Optimal lower bounds for locality-sensitive hashing (except when q is tiny). ACM Transactions on Computation Theory, 6(1):5, 2014.Google Scholar
Pach, J. and Tardos, G.. Tight lower bounds for the size of epsilon-nets. Journal of the American Mathematical Society, 26:645658, 2013.Google Scholar
Pagh, R.. Compressed matrix multiplication. In ITCS, pages 442–451, 2012.Google Scholar
Pavan, A., Tangwongsan, K., Tirthapura, S., and Wu, K.. Counting and sampling triangles from a graph stream. PVLDB, 6(14):18701881, 2013.Google Scholar
Pavan, A. and Tirthapura, S.. Range-efficient counting of distinct elements in a massive data stream. SIAM Journal on Computing, 37(2):359379, 2007.Google Scholar
Pham, N. and Pagh, R.. Fast and scalable polynomial kernels via explicit feature maps. In ACM SIGKDD, pages 239–247, 2013.Google Scholar
Pike, R., Dorward, S., Griesemer, R., and Quinlan, S.. Interpreting the data: parallel analysis with sawzall. Dynamic Grids and Worldwide Computing, 13(4):277298, 2005.Google Scholar
Razborov, A. A.. On the distributional complexity of disjointness. Theoretical Computer Science, 106(2):385390, 1992.Google Scholar
Sarlós, T.. Improved approximation algorithms for large matrices via random projections. In IEEE Conference on Foundations of Computer Science, pages 143–152, 2006.Google Scholar
Särndal, C.-E., Swensson, B., and Wretman, J.. Model Assisted Survey Sampling. Springer, 1992.Google Scholar
Schechter, S. E., Herley, C., and Mitzenmacher, M.. Popularity is everything: a new approach to protecting passwords from statistical-guessing attacks. In 5th USENIX Workshop on Hot Topics in Security, pages 1–8, 2010.Google Scholar
Schmidt, J. P., Siegel, A., and Srinivasan, A.. Chernoff–Hoeffding bounds for applications with limited independence. In ACM-SIAM Symposium on Discrete Algorithms, pages 331–340, 1993.Google Scholar
Schweller, R., Li, Z., Chen, Y., Gao, Y., Gupta, A., Zhang, Y., Dinda, P. A., Kao, M.-Y., and Memik, G.. Reversible sketches: enabling monitoring and analysis over high-speed data streams. IEEE Transactions on Networks, 15(5):10591072, 2007.Google Scholar
Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A. J., and Vishwanathan, S. V. N.. Hash kernels for structured data. Journal of Machine Learning Research, 10:26152637, 2009.Google Scholar
Shrivastava, N., Buragohain, C., Agrawal, D., and Suri, S.. Medians and beyond: new aggregation techniques for sensor networks. In ACM SenSys, Pages 239–249, 2004.Google Scholar
Simpson, O., Seshadhri, C., and McGregor, A.. Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution. In IEEE International Conference on Data Mining, pages 979–984, 2015.Google Scholar
Srinivasan, A.. Improving the discrepancy bound for sparse matrices: better approximations for sparse lattice approximation problems. In ACM-SIAM Symposium on Discrete Algorithms, pages 692–701, 1997.Google Scholar
Suri, S., Tóth, C. D., and Zhou, Y.. Range counting over multidimensional data streams. Discrete and Computational Geometry, 26(4):633655, 2006.Google Scholar
Szegedy, M.. The DLT priority sampling is essentially optimal. In ACM Symposium on Theory of Computing, pages 150–158, 2006.Google Scholar
Szegedy, M. and Thorup, M.. On the variance of subset sum estimation. In European Symposium on Algorithms, pages 75–86, 2007.Google Scholar
Talagrand, M.. Sharper bounds for Gaussian and empirical processes. The Annals of Probability, 22(1):2876, 1994.Google Scholar
Team, D. P.. Learning with privacy at scale. Apple Machine Learning Journal, 1(8):125, December 2017.Google Scholar
Thorup, M.. Even strongly universal hashing is pretty fast. In ACM-SIAM Symposium on Discrete Algorithms, pages 496–497, 2000.Google Scholar
Thorup, M.. Equivalence between priority queues and sorting. Journal of the ACM, 54(6):127, 2007.CrossRefGoogle Scholar
Thorup, M. and Zhang, Y.. Tabulation based 4-universal hashing with applications to second moment estimation. In ACM-SIAM Symposium on Discrete Algorithms, pages 615–624, 2004.Google Scholar
Ting, D.. Count-min: optimal estimation and tight error bounds using empirical error distributions. In ACM SIGKDD, pages 2319–2328, 2018.Google Scholar
Tirthapura, S. and Woodruff, D. P.. Rectangle-efficient aggregation in spatial data streams. In ACM Symposium on Principles of Database Systems, pages 283–294, 2012.Google Scholar
Tirthapura, S. and Woodruff, D. P.. A general method for estimating correlated aggregates over a data stream. Algorithmica, 73(2):235260, 2015.Google Scholar
Tridgell, A. and Mackerras, P.. The rsync algorithm. Technical Report TR-CS-96-05, Department of Computer Science, The Australian National University, 1996.Google Scholar
Tsang, I. W., Kwok, J. T., and Cheung, P.-M.. Core vector machines: fast SVM training on very large data sets. Journal of Machine Learning Research, 6:363392, 2005.Google Scholar
Vapnik, V. N. and Chervonenkis, A. Y.. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, 16:264280, 1971.CrossRefGoogle Scholar
Venkataraman, S., Song, D. X., Gibbons, P. B., and Blum, A.. New streaming algorithms for fast detection of superspreaders. In Network and Distributed System Security Symposium, pages 149–166, 2005.Google Scholar
Vitter, J. S.. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):3757, March 1985.Google Scholar
Wang, J., Liu, W., Kumar, S., and Chang, S.-F.. Learning to hash for indexing big data: a survey. Proceedings of the IEEE, 104(1):3457, 2016.Google Scholar
Wang, L., Luo, G., Yi, K., and Cormode, G.. Quantiles over data streams: an experimental study. In ACM SIGMOD International Conference on Management of Data, pages 737–748, 2013.Google Scholar
Whang, K. Y., Vander-Zanden, B. T., and Taylor, H. M.. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 15(2):208, 1990.Google Scholar
Woodruff, D.. Optimal space lower bounds for all frequency moments. In ACM-SIAM Symposium on Discrete Algorithms, pages 167–175, 2004.Google Scholar
Woodruff, D. P.. Low rank approximation lower bounds in row-update streams. In Advances in Neural Information Processing Systems, pages 1781–1789, 2014.Google Scholar
Woodruff, D. P.. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1–2):1157, October 2014.Google Scholar
Woodruff, D. P. and Zhang, Q.. Tight bounds for distributed functional monitoring. In ACM Symposium on Theory of Computing, pages 941–960, 2012.Google Scholar
Woodruff, D. P. and Zhang, Q.. Subspace embeddings and p -regression using exponential random variables. In Conference on Learning Theory, pages 546–567, 2013.Google Scholar
Yu, H., Agarwal, P. K., Poreddy, R., and Varadarajan, K. R.. Practical methods for shape fitting and kinetic data structures using coresets. Algorithmica, 52(3):378402, 2008.Google Scholar
Zarrabi-Zadeh, H.. An almost space-optimal streaming algorithm for coresets in fixed dimensions. Algorithmica, 60(1):4659, 2011.Google Scholar
Zhang, Q., Pell, J., Canino-Koning, R., Howe, A. C., and Brown, C. T.. These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. PLoS ONE, 9(7):113, July 2014.Google Scholar
Zhang, Y., Singh, S., Sen, S., Duffield, N., and Lund, C.. Online identification of hierarchical heavy hitters: algorithms, evaluation and applications. In Internet Measurement Conference, pages 101–114, 2004.Google Scholar
Zhao, Q., Ogihara, M., Wang, H., and Xu, J.. Finding global icebergs over distributed data sets. In ACM Symposium on Principles of Database Systems, pages 298–307, 2006.Google Scholar
Zolotarev, V. M.. One dimensional stable distributions, volume 65 of Translations of Mathematical Monographs. American Mathematical Society, 1983.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • References
  • Graham Cormode, University of Warwick, Ke Yi, Hong Kong University of Science and Technology
  • Book: Small Summaries for Big Data
  • Online publication: 22 October 2020
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • References
  • Graham Cormode, University of Warwick, Ke Yi, Hong Kong University of Science and Technology
  • Book: Small Summaries for Big Data
  • Online publication: 22 October 2020
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • References
  • Graham Cormode, University of Warwick, Ke Yi, Hong Kong University of Science and Technology
  • Book: Small Summaries for Big Data
  • Online publication: 22 October 2020
Available formats
×