Hostname: page-component-848d4c4894-r5zm4 Total loading time: 0 Render date: 2024-07-04T16:39:37.644Z Has data issue: false hasContentIssue false

Bi-cross validation of spectral clustering hyperparameters

Published online by Cambridge University Press:  24 April 2020

Sioan Zohar*
Affiliation:
Photon Data and Controls Systems, Linac Coherent Light Source, SLAC National Accelerator Laboratory, 2575 Sand Hill Rd, Menlo Park, California94025, USA
Chun Hong Yoon
Affiliation:
Photon Data and Controls Systems, Linac Coherent Light Source, SLAC National Accelerator Laboratory, 2575 Sand Hill Rd, Menlo Park, California94025, USA
*
a)Author to whom correspondence should be addressed. Electronic mail: zohar.sioan@gmail.com

Abstract

One challenge impeding the analysis of terabyte scale X-ray scattering data from the Linac Coherent Light Source (LCLS) is determining the number of clusters required for the execution of traditional clustering algorithms. Here, we demonstrate that the previous work using bi-cross validation to determine the number of singular vectors directly maps to the spectral clustering problem of estimating both the number of clusters and hyperparameter values. Applying this method to LCLS X-ray scattering data enables the identification of dropped shots without manually setting boundaries on detector fluence and provides a path toward identifying rare and anomalous events.

Type
Proceedings Paper
Copyright
Copyright © International Centre for Diffraction Data 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Chung, K.-M., Kao, W.-C., Sun, C.-L., Wang, L.-L., and Lin, C.-J. (2003). “Radius margin bounds for support vector machines with the RBF kernel,” Neural Comput. 15, 2643.CrossRefGoogle ScholarPubMed
Damiani, D., Dubrovin, M., Gaponeneko, I., Kroeger, W., Lane, T., Mitra, A., O'Grady, C., Salnikov, A., SanchezGonzalez, A., Schneider, D. et al. (2016). “Linac Coherent Light Source data analysis using psana,” J. Appl. Crystallogr. 49, 672.CrossRefGoogle Scholar
Droste, S., Shen, L., White, V. E., Diaz-Jacobo, E., Coffee, R., Zohar, S., Reid, A. H., Tavella, F., Minitti, M. P., Turner, J. J., Gumerlock, K. L., Fry, A. R., and Coslovich, G. (2019). “High-sensitivity X-ray optical cross-correlator for next generation free-electron lasers,” CLEO: OSA Technical Digest (Optical Society of America, 2019), pp. SF3I–7. https://www.osapublishing.org/abstract.cfm?uri=CLEO_SI-2019-SF3I.7Google Scholar
Fu, W. and Perry, P. O. (2019). “"Estimating the number of clusters using cross-validation." J. Comput. Graph. Stat., 112.Google Scholar
Fujita, A., Takahashi, D. Y., and Patriota, A. G. (2014). “A non-parametric method to estimate the number of clusters,” Comput. Stat. Data Anal. 73, 27.CrossRefGoogle Scholar
Higley, D. J., Reid, A. H., Chen, Z., Guyader, L. L., Hellwig, O., Lutman, A. A., Liu, T., Shafer, P., Chase, T., Dakovski, G. L., Mitra, A., Yuan, E., Schlappa, J., Durr, H. A., Schlotter, W. F., and Stohr, J. (2019). “Ultrafast X-ray induced changes of the electronic and magnetic response of solids due to valence electron redistribution.” Preprint, arXiv:1902.04611.Google Scholar
Hong, K., Cho, H., Schoenlein, R. W., Kim, T. K., and Huse, N. (2015). “Element-specific characterization of transient electronic structure of solvated Fe (II) complexes with time-resolved soft X-ray absorption spectroscopy,” Acc. Chem. Res. 48, 2957.CrossRefGoogle ScholarPubMed
Ishikawa, T., Aoyagi, H., Asaka, T., Asano, Y., Azumi, N., Bizen, T., Ego, H., Fukami, K., Fukui, T., Furukawa, Y. et al. (2012). “A compact X-ray free-electron laser emitting in the sub-ångström region,” Nature Phonotonics 6, 540C.CrossRefGoogle Scholar
Kupitz, C., Olmos, J. L. Jr., Holl, M., Tremblay, L., Pande, K., Pandey, S., Oberthür, D., Hunter, M., Liang, M., Aquila, A. et al. (2017). “Structural enzymology using X-ray free electron lasers,Structural Dynamics 4, 044003.CrossRefGoogle Scholar
Lloyd, S. P. (1982). “Least squares quantiation in PCM,” IEEE Trans. Inf. Theory 28, 129.CrossRefGoogle Scholar
Mezzadri, F. (2006). “How to generate random matrices from the classical compact groups,” Notices Am. Math. Soc. 54, 592.Google Scholar
Nogly, P., Weinert, T., James, D., Carbajo, S., Ozerov, D., Furrer, A., Gashi, D., Borin, V., Skopintsev, P., Jaeger, K. et al. (2018). “Retinal isomerization in bacteriorhodopsin captured by a femtosecond x-ray laser,” Science 361, eaat0094.CrossRefGoogle ScholarPubMed
Oliphant, T. E. (2006). A Guide to NumPy, Vol. 1 (Trelgol Publishing, USA).Google Scholar
Oliphant, T. E. (2007). “Python for scientific computing,” Comput. Sci. Eng. 9, 10.CrossRefGoogle Scholar
Owen, A. B. and Perry, P. O. (2009). “Bi-cross-validation of the SVD and the nonnegative matrix factorization,” Ann. Appl. Stat. 3, 564.CrossRefGoogle Scholar
Pedregosa, F. et al. (2011). “Scikit-learn: machine learning in python,” J. Mach. Learn. Res. 12, 2825.Google Scholar
Perry, P. O. (2009). “Cross-validation for unsupervised learning.” Preprint, arXiv:0909.3052.Google Scholar
Schoenlein, R., Boutet, S., Minitti, M., and Dunne, A. (2017). “The Linac Coherent Light Source: recent developments and future plans,” Appl. Sci. 7, 850.CrossRefGoogle Scholar
Spence, J. C. (2017). “Outrunning damage: electrons vs X-rays – timescales and mechanisms,” Struct. Dyn. 4, 044027.CrossRefGoogle ScholarPubMed
Sugar, C. A. and James, G. M. (2003). “Finding the number of clusters in a dataset,” J. Am. Stat. Assoc. 98, 750.CrossRefGoogle Scholar
Thayer, J., Damiani, D., Ford, C., Gaponenko, I., Kroeger, W., O'Grady, C., Pines, J., Tookey, T., Weaver, M., and Perazzo, A. (2016). “Data systems for the Linac Coherent Light Source,” J. Appl. Crystallogr. 49, 13631369.CrossRefGoogle Scholar
Tibshirani, R. and Walther, G. (2005). “Cluster validation by prediction strength,” J. Comput. Graph. Stat. 14, 511.CrossRefGoogle Scholar
Tibshirani, R., Walther, G., and Hastie, T. (2001). “Estimating the number of clusters in a data set via the gap statistic,” J. R. Stat. Soc. Ser. B Stat. Methodol. 63, 411.CrossRefGoogle Scholar
Van Der Walt, S., Colbert, S. C., and Varoquaux, G. (2011). “The NumPy array: a structure for efficient numerical computation,” Comput. Sci. Eng. 13, 22.CrossRefGoogle Scholar
Von Luxburg, U. (2007). “A tutorial on spectral clustering,” Stat. Comput. 17, 395.CrossRefGoogle Scholar
Von Luxburg, U. (2010). “Clustering stability: an overview,” Found. Trends Mach. Learn. 2, 235.Google Scholar
Yang, J., Zhu, X., Wolf, T. J., Li, Z., Nunes, J. P. F., Coffee, R., Cryan, J. P., Gühr, M., Hegazy, K., and Heinz, T. F. et al. (2018). “Imaging CF3I conical intersection and photodissociation dynamics with ultrafast electron diffraction,” Science 361, 64.CrossRefGoogle ScholarPubMed
Yoon, C. H., Schwander, P., Abergel, C., Andersson, I., Andreasson, J., Aquila, A., Bajt, S., Barthelmess, M., Barty, A., and Bogan, M. J., et al. (2011). “Unsupervised classification of single-particle X-ray diffraction snapshots by spectral clustering,” Opt. Express 19, 16542.CrossRefGoogle ScholarPubMed
Zohar, S. and Turner, J. J. (2019). “Multivariate analysis of x-ray scattering using a stochastic source,” Opt. Lett. 44, 243.CrossRefGoogle ScholarPubMed