Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-v9fdk Total loading time: 0 Render date: 2024-11-18T05:01:18.423Z Has data issue: false hasContentIssue false

15 - Parallel Graph-Based Semi-Supervised Learning

from Part Three - Alternative Learning Settings

Published online by Cambridge University Press:  05 February 2012

Jeff Bilmes
Affiliation:
University of Washington
Amarnag Subramanya
Affiliation:
Google Research, Mountain View, CA, USA
Ron Bekkerman
Affiliation:
LinkedIn Corporation, Mountain View, California
Mikhail Bilenko
Affiliation:
Microsoft Research, Redmond, Washington
John Langford
Affiliation:
Yahoo! Research, New York
Get access

Summary

Semi-supervised learning (SSL) is the process of training decision functions using small amounts of labeled and relatively large amounts of unlabeled data. In many applications, annotating training data is time consuming and error prone. Speech recognition is the typical example, which requires large amounts of meticulously annotated speech data (Evermann et al., 2005) to produce an accurate system. In the case of document classification for internet search, it is not even feasible to accurately annotate a relatively large number of web pages for all categories of potential interest. SSL lends itself as a useful technique in many machine learning applications because one need annotate only relatively small amounts of the available data. SSL is related to the problem of transductive learning (Vapnik, 1998). In general, a learner is transductive if it is designed for prediction on only a closed dataset, where the test set is revealed at training time. In practice, however, transductive learners can be modified to handle unseen data (Sindhwani, Niyogi, and Belkin, 2005; Zhu, 2005a). Chapter 25 in Chapelle, Scholkopf, and Zien (2007) gives a full discussion on the relationship between SSL and transductive learning. In this chapter, SSL refers to the semi-supervised transductive classification problem.

Let xX denote the input to the decision function (classifier), f, and yY denote its output label, that is, f : XY. In most cases f(x) = argmaxy∈Yp(y|x).

Type
Chapter
Information
Scaling up Machine Learning
Parallel and Distributed Approaches
, pp. 307 - 330
Publisher: Cambridge University Press
Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alexandrescu, A., and Kirchhoff, K. 2007. Graph-Based Learning for Statistical Machine Translation. In: Proceeding of the Human Language Technologies Conference (HLT-NAACL).Google Scholar
Arya, S., and Mount, D. M. 1993. Approximate Nearest Neighbor Queries in Fixed Dimensions. In: ACM-SIAM Symposium on Discrete Algorithms (SODA).Google Scholar
Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., and Wu, A. 1998. An Optimal Algorithm for Approximate Nearest Neighbor Searching. Journal of the ACM.CrossRefGoogle Scholar
Balcan, M.-F., and Blum, A. 2005. A PAC-Style Model for Learning from Labeled and Unlabeled Data. Pages 111–126 of: COLT.Google Scholar
Baluja, S., Seth, R., Sivakumar, D., Jing, Y., Yagnik, J., Kumar, S., Ravichandran, D., and Aly, M. 2008. Video Suggestion and Discovery for YouTube: Taking Random Walks through the View Graph. Pages 895–904 of: Proceeding of the 17th International conference on World Wide Web. ACM.CrossRefGoogle Scholar
Belkin, M., Niyogi, P., and Sindhwani, V. 2005. On Manifold Regularization. In: Proceedings of the Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar
Bengio, Y., Delalleau, O., and Roux, N. L. 2007. Label Propagation and Quadratic Criterion. In: Semi-Supervised Learning. Cambridge, MA: MIT Press.Google Scholar
Bertsekas, D. 1999. Nonlinear Programming. Athena Scientific.Google Scholar
Bie, T. D., and Cristianini, N. 2003. Convex Methods for Transduction. Pages 73–80 of: Advances in Neural Information Processing Systems 16. Cambridge, MA: MIT Press.Google Scholar
Bilmes, J. A. 1998. A Gentle Tutorial on the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report ICSI-TR-97-021. University of Berkeley.Google Scholar
Bishop, C. (ed). 1995. Neural Networks for Pattern Recognition. New York: Oxford University Press.Google Scholar
Blitzer, J., and Zhu, J. 2008. ACL 2008 Tutorial on Semi-supervised Learning. http://ssl-acl08.wikidot.com/.
Blum, A., and Chawla, S. 2001. Learning from Labeled and Unlabeled Data Using Graph Mincuts. Pages 19–26 of: Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann.Google Scholar
Chapelle, O., Scholkopf, B., and Zien, A. 2007. Semi-Supervised Learning. Cambridge, MA: MIT Press.Google Scholar
Collobert, R., Sinz, F., Weston, J., Bottou, L., and Joachims, T. 2006. Large Scale Transductive SVMs. Journal of Machine Learning Research.Google Scholar
Corduneanu, A., and Jaakkola, T. 2003. On Information Regularization. In: Uncertainty in Artificial Intelligence.Google Scholar
Delalleau, O., Bengio, Y., and Roux, N. L. 2005. Efficient Non-parametric Function Induction in Semi-Supervised Learning. In: Proceedings of the Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar
Dempster, A. P., Laird, N. M., Rubin, D. B., et al. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39(1), 1–38.Google Scholar
Deshmukh, N., Ganapathiraju, A., Gleeson, A., Hamaker, J., and Picone, J. 1998 (November). Resegmentation of Switchboard. Pages 1543–1546 of: Proceedings of the International Conference on Spoken Language Processing.
Evermann, G., Chan, H. Y., Gales, M. J. F., Jia, B., Mrva, D., Woodland, P. C., and Yu, K. 2005. Training LVCSR Systems on Thousands of Hours of Data. In: Proceedings of ICASSP.Google Scholar
Frey, B. J., and Dueck, D. 2007. Clustering by Passing Messages between Data Points. Science, 315(5814), 972.CrossRefGoogle ScholarPubMed
Friedman, J. H., Bentley, J. L., and Finkel, R. A. 1977. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Transaction on Mathematical Software, 3.Google Scholar
Garcke, J., and Griebel, M. 2005. Semi-supervised Learning with Sparse Grids. In: Proceedings of the 22nd ICML Workshop on Learning with Partially Classified Training Data.Google Scholar
Godfrey, J., Holliman, E., and McDaniel, J. 1992 (March). SWITCHBOARD: Telephone Speech Corpus for Research and Development. Pages 517–520 of: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1.Google Scholar
Goldman, S., and Zhou, Y. 2000. Enhancing Supervised Learning with Unlabeled Data. Pages 327–334 of: Proceedings of the 17th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann.Google Scholar
Greenberg, S. 1995. The Switchboard Transcription Project. Technical Report, The Johns Hopkins University (CLSP) Summer Research Workshop.Google Scholar
Greenberg, S., Hollenback, J., and Ellis, D. 1996. Insights into Spoken Language Gleaned from Phonetic Transcription of the Switchboard Corpus. Pages 24–27 of: ICSLP.Google Scholar
Haffari, G.R., and Sarkar, A. 2007. Analysis of Semi-supervised Learning with the Yarowsky Algorithm. In: UAI.Google Scholar
Hosmer, D. W. 1973. A Comparison of Iterative Maximum Likelihood Estimates of the Parameters of a Mixture of Two Normal Distributions under Three Different Types of Sample. Biometrics.CrossRefGoogle Scholar
Huang, X., Acero, A., and Hon, H. 2001. Spoken Language Processing. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
Jebara, T.,Wang, J., and Chang, S.F. 2009. Graph Construction and b-Matching for Semi-supervised Learning. In: International Conference on Machine Learning.Google Scholar
Joachims, T. 2003. Transductive Learning via Spectral Graph Partitioning. In: Proceedings of the International Conference on Machine Learning (ICML).Google Scholar
Karlen, M., Weston, J., Erkan, A., and Collobert, R. 2008. Large Scale Manifold Transduction. In: International Conference on Machine Learning, ICML.CrossRefGoogle Scholar
Lawrence, N. D., and Jordan, M. I. 2005. Semi-supervised Learning via Gaussian Processes. In: Neural Information Processing Systems.Google Scholar
Malkin, J., Subramanya, A., and Bilmes, J.A. 2009 (September). On the Semi-Supervised Learning of Multi-Layered Perceptrons. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH).Google Scholar
McLachlan, G. J., and Ganesalingam, S. 1982. Updating a Discriminant Function on the Basis of Unclassified Data. Communication in Statistics: Simulation and Computation.CrossRefGoogle Scholar
Nadler, B., Srebro, N., and Zhou, X. 2010. Statistical Analysis of Semi-supervised Learning: The Limit of Infinite Unlabelled Data. In: Advances in Neural Information Processing Systems (NIPS).Google Scholar
Ng, A., and Jordan, M. 2002. On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes. In: Advances in Neural Information Processing Systems (NIPS).Google Scholar
Nigam, G. 2001. Using Unlabeled Data to Improve Text Classification. Ph.D. thesis, CMU.
Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA: Morgan Kaufmann.Google Scholar
Scudder, H. J. 1965. Probability of Error of Some Adaptive Pattern-Recognition Machines. IEEE Transactions on Information Theory, 11.CrossRefGoogle Scholar
Seeger, M. 2000. Learning with Labeled and Unlabeled Data. Technical Report, University of Edinburgh, UK.Google Scholar
Shi, J., and Malik, J. 2000. Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.Google Scholar
Sindhwani, V., and Selvaraj, S. K. 2006. Large Scale Semi-Supervised Linear SVMs. In: SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR.CrossRefGoogle Scholar
Sindhwani, V., Niyogi, P., and Belkin, M. 2005. Beyond the Point Cloud: From Transductive to Semi-supervised learning. In: Proceedings of the International Conference on Machine Learning (ICML).CrossRefGoogle Scholar
Subramanya, A., and Bilmes, J. 2008. Soft-Supervised Text Classification. In: EMNLP.CrossRefGoogle Scholar
Subramanya, A., and Bilmes, J. 2009a. Entropic Regularization in Non-parametric Graph-Based Learning. In: NIPS.Google Scholar
Subramanya, A., and Bilmes, J. 2009b. The Semi-supervised Switchboard Transcription Project. In: Interspeech.Google Scholar
Subramanya, A., and Bilmes, J. 2011. Semi-Supervised Learning with Measure Propagation. Journal of Machine Learning Research.Google Scholar
Subramanya, A., Bartels, C., Bilmes, J., and Nguyen, P. 2007. Uncertainty in Training Large Vocabulary Speech Recognizers. In: Proceedings of the IEEE Workshop on Speech Recognition and Understanding.Google Scholar
Szummer, M., and Jaakkola, T. 2001. Partially Labeled Classification with Markov Random Walks. In: Advances in Neural Information Processing Systems, vol. 14.Google Scholar
Talukdar, P. P., and Crammer, K. 2009. New Regularized Algorithms for Transductive Learning. In: European Conference on Machine Learning (ECML-PKDD).Google Scholar
Tomkins, A. 2008. Keynote Speech. CIKM Workshop on Search and Social Media.
Tsang, I.W., and Kwok, J. T. 2006. Large-Scale Sparsified Manifold Regularization. In: Advances in Neural Information Processing Systems (NIPS) 19.Google Scholar
Tsuda, K. 2005. Propagating Distributions on a Hypergraph by Dual Information Regularization. In: Proceedings of the 22nd International Conference on Machine Learning.Google Scholar
Vapnik, V. 1998. Statistical Learning Theory. New York: Wiley.Google Scholar
Vazirani, V. V. 2001. Approximation Algorithms. New York: Springer.Google Scholar
Wang, F., and Zhang, C. 2006. Label Propagation through Linear Neighborhoods. Pages 985–992 of: Proceedings of the 23rd International Conference on Machine Learning. New York: ACM.Google Scholar
White, N. 1986. Theory of Matroids. Cambridge University Press.CrossRefGoogle Scholar
Woess, W. 2000. Random Walks on Infinite Graphs and Groups. Cambridge Tracts in Mathematics 138. New York: Cambridge University Press.Google Scholar
Zhu, X. 2005a. Semi-Supervised Learning Literature Survey. Technical Report 1530. Computer Sciences, University of Wisconsin–Madison.Google Scholar
Zhu, X. 2005b. Semi-Supervised Learning with Graphs. Ph.D. thesis, Carnegie Mellon University.
Zhu, X., and Ghahramani, Z. 2002a. Learning from Labeled and Unlabeled Data with Label Propagation. Technical Report, Carnegie Mellon University.Google Scholar
Zhu, X., and Ghahramani, Z. 2002b. Towards Semi-supervised Classification with Markov Random Fields. Technical Report CMU-CALD-02-106. Carnegie Mellon University.Google Scholar
Zhu, X., and Goldberg, A.B. 2009. Introduction to Semi-supervised Learning. Morgan & Claypool.Google Scholar
Zhu, X., Ghahramani, Z., and Lafferty, J. 2003. Semi-supervised Learning using Gaussian Fields and Harmonic Functions. In: Proceedings of the International Conference on Machine Learning (ICML).Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×