Cryo-EM (Cryogenic Electron Microscopy) is one of the most important research methods of current structural biology. Compared to the traditional X-ray crystallographic (Nuclear Magnetic Resonance) with the complexity requirements of the biomolecular sample, the frozen electron microscope technology provides a simpler, intuitive way, making many crystals. Difficult protein molecular structure is analyzed, this major scientific breakthrough, and therefore won the 2017 Nobel Chemical Award   . Since 2013, a Ph.D. has analyzed a transient receptor potential ion channel (TRPV1) structure of a near-atomic resolution of 3.4 å, in recent years, many scientists have solved a large number of super High resolution biological macromolecule   , quickly promoted the development of structural biology. Since the biological sample is usually low-dose imaging, but the electron microscope data noise is complex, even if the data collected by direct electronic camera DDD (Direct Detector Device Camera), the signal to noise ratio is often very low, and each biological macromolecular structure is The reconstruction requires a large number of high-quality two-dimensional single-particle freezing electron microscopy, and therefore, the collection of images often requires experienced researchers for a long time. . In order to solve this problem, scientists have developed a variety of automated procedures for image screening    , where the British Mrc Laboratory of Molecular Biology Sjrs Schereres and its team developed Relion The AUTO PICK semi-automatic single particle selection is widely used as the AUTO PICK semi-automatic single particle selected by artificial selection as a training sample , and the automatic single-automated single-segmentation of the Zhang et al. ZHANG et al. Zhang et al. The selection program Pixer also reached the same good effect as Relion . With the continuous maturity of machine learning technology, scientists have developed a variety of artificial intelligence technology.Single particle image recognition   , but for a cryo-electroless electron microscope image, the accuracy of single particle image recognition is still low. In response to the above problems, this paper uses a method of combining limited Bolzmann Machine and bilateral filter BF (Bilateral Filter), by optimizing the quality of single particle images, and realizing automation classification to improve two-dimensional The identification efficiency of a single particle image also provides a new idea for the screening of subsequent refrigerated electron microscopic single particles.
2. Method and Principle
2.1. Frozen electron microscope single particle image pretreatment
2.1.1. Bilateral filtering
Freezing electron microscope single particle image usually Characteristics of low signal-to-noise ratio  (
Fig. 1 (a)), directly applied to limited Bolzmanne neural network to train recognition . If a higher threshold is set, the number of different angle projected images required for 3D structural reconstruction will be reduced; in turn, a large number of false positive impurity images are introduced, and the two-dimensional image classification of single particle images will be introduced. Calculation error in the three-dimensional structure reconstruction process. To solve this problem, we introduce a bilateral filter to prepare the image. Since the bilateral filter can accurately retain the edge feature information  of the single particles in the image, it is also possible to effectively filter out most of the single-granular images. The noise is fitted with a demand for single particle image information  ( Fig. 1 (b)).
. (A) Cryo-Em Single Particle Photo of Transient RecePtor Potential Causes Subfamily V Member 1 (TRPV1); (b) Cryo-Em Single Particle Photo of TRPV1After Bilateral Filter; (c) Cryo-Em Single Particle Photo of Trpv1 After Bilateral Filter and Histogram Equalization Fig. 1 . (a) Transient receptor potential Ionic Channel Protein V Member 1 (Transient Receptor Potential Caused Subfamily V MEMBER 1) Freezer single particle image; (b) Optimized TRPV1 refrigerated electron microscope single particle image after the bilateral filter; (c) passes through bilateral filters TRPV1 refrigerated electron microscope single particle image
2.1.2. Histogram equalization Since the distribution of grayscale value in single particle image information is more concentrated Adopt conventional linear contrast adjustment makes images lost more detail features. To solve this problem, we use the histogram equalization techniques  ( (C)). This technique improves the accuracy of single-particle image feature identification by maintaining the original brightness of the image and enhances the contrast of the image. 2.2. Limited Bolzmann Network Structure
Limited Bolzman is a model based on energy-minimized state design, including a visible layer and a hidden The layer is contained, and there is a characteristic of the lamination, and the interlayer is fully connected. The visual layer is usually a data input layer, and the implicit layer can be considered as a feature extraction layer. The training objective of the entire network is to make the network parameters stabilize, ie the minimum system energy is minimized . When we use images as a training sample, each pixel point of the image corresponds to a unit of the restricted Bolzmann network-visual layer, and the unit of the hidden layer can be considered a visible layer input unit. Abstract features, connection visibility and implicit layers are weight matrices. The numerical solution of the weight matrix is to transmit a large number of input samples forward and reverse conducting multiple iterations through the network, and utilize the calculation results between each iteration.The difference is corrected until the reconstruction error of the entire network tends to be smooth (see
Fig. 2 ). At this time, each set of training data samples can be reconstructed by the hidden layer feature and the weight matrix after training, and the construction of the Bolzmann network is achieved.
Figure 3 .
Fig. 3 . Supervisory Limited Bolzmann Data Classification Process
Restricted Bolzmann networks can achieve identification and classification of images through supervision learning or non-supervision learning. The network convergence of supervisory learning is faster and the calculation results are also more accurate. The principle is to add a label unit in the input unit (see
Fig. 3 ), when calculating the network convergence, consider the state in which the entire network energy under different labels, also That is, the category corresponding to different tags. After the network training is completed, each input a new sample image can be obtained separately according to the label, and when the entire network energy is the lowest, this sample can be submitted to the corresponding label according to the network calculation result, thereby achieving supervision The classification function of learning . In order to improve the identification accuracy of the data, this paper intends to train the neural network in a multi-label supervision study, with a small amount of confidence, a small amount of confidence, and realizes a false positive or impurity image in a freezing electron microscope single particle image. The elimination, thereby increasing the accuracy of the three-dimensional model to build the data required  .
3. Results and Discussion
3.1. Optimization of bilateral filtering and histogram equalization to refrigerated electron microscopy In order to verify bilateral filters and histogram equalization to refrigeration electron microscopy The effectiveness of particle image information enhancement, we used the analog refrigerated electron microscopic single-particle image data generation method  used by YAO et al., Respectively, the original image under the same signal noise ratio is tested with the pre-processed image. Generating an analog single-particle cryo-EM images first need to download the required PDB file, after using the software package Xmipp xmipp_phantom_transform center correction and for converting the program xmipp_volume_from_pdb PDB files electron density map, the resulting electron density maps PDB may utilize xmipp_angular_project_library The program generates a projection file, and finally adds different intensity noise signals  as needed. Among them, the test results under the signal to noise ratio of 0.0625 (see
Fig. 4 ) show that the unprocessed image is classified through the trained RBM network, it is correct The rate is only 68.52%; and the pre-treated image is only 99.95% under the same conditions, the correct rate is much higher than the unprocessed image. When the signal to noise ratio is lower, the RBM network is not processed. The data identification is lower.
. (A) Simulated Single Particle Images of SPCAS9 SNR = 0.0625; (b) preprocessed SIMULATED SINGLE Particle Images of SPCAS9 SNR = 0.0625 Fig. 4 . (a) Simulates single particles SPCas9 single-particle projection image, signal-to-noise ratio is 0.0625; (B ) The pretreated analog single particle SPCas9 single-granular projected image,The signal to noise ratio is 0.0625
3.2. Identification of the simulated SPCas9 protein single particle image
Considering the signal-to-noise ratio of the actual refrigerated electron microscope single particle image may be lower, for testing Bolzmann network recognition accuracy in different signal-to-noise comparison, we selected different signal-to-noise ratio single-grained SPCas9 protein simulation structural projection images and blank noise images under equivalent conditions as training set, then equivalent quantity The analog data is tested as a test set (see
) to verify the effectiveness of this method, the method of generating the analog image is the same as 3.1. We get the following data:
. Identification Accuracy of Simulate SPCAS9 and NOISE DATA Under Different SnR
. The recognition accuracy of simulation of single-particles and noise under different signal-to-noise ratio
It can be seen that along with The decline in the signal-to-noise ratio, the recognition accuracy has a decrease, under the condition of the signal to noise ratio of 0.0312, the recognition accuracy is 98.14%; when the signal-to-noise ratio is as low as 0.0156; the recognition accuracy is 95.42%; in the extreme extreme Under only 0.0078, the signal-to-noise ratio is only 92.46%, while at this time, the naked eye has not been recognized in the image in the image without pretreatment. Fig. 5
. (A) Simulated Single Particles 0.0625 to 0.0078; (B Preprocessed Simulate Single Particl(123)
. (a) analog single-grained SPCas9 single particle projection image, signal-to-noise ratio from 0.0625 to 0.0078; (B) pretreated analog single particles SPCas9 single particle projection image, signal-to-noise ratio from 0.0625 to 0.0078
3.3. Removal receptor potential ion channel protein type V member 1 refrigeration electron microscope Single particle identification and three-dimensional rectification
After the analog data set inspection, we decided to use real frozen electron microscopic data to further experiment, we selected electron microscope public image database (Electron Microscopy Public Image Archive, referred to as Empiar The TRPV1 original frozen electron microscope image number is numbered 10005, using the PARSED tool to perform the initial automatic selection of particles , from 871 pieces of pictures, 147,256 single particle images are obtained as classification data sets. In addition, we also randomly selected 100 original frozen electron microscopy images, using Relion, single image screening and two-dimensional image classification, and select a single particle image under the first few categories of confidence letter as positive. Training set, while the classification is delayed and the appearance is significantly a single particle image of bubble or impurities as a negative training set. During the construction of the training set, since the single particle image quality is difficult to conduct independent quality evaluation, the classification of the single-grain image has not yet established a clear standard, and the Relion is the most widely used application application. One of the software, its classification method has a higher confidence, so it can be used as a training sample reference. The selected single-grained training sample has a total of 10,418, including 7771 positive samples, accounting for 74.59%, negative 2647, accounting for 25.41%. Subsequently, the RBM network is subsequently trained as the training set, and the preliminary screening is classified after the training is completed, and the preliminary screeing is classified, and the single-granular image precise identification of high confidence is further achieved.
After RBM network identificationAfter classification, 32,403 positive single particle results were obtained. Since Relion could not use a single-selection single particle image to be reconstructed, we use the CryosParc platform  two-dimensional classification function for further screening, and like we only choose two The first 20 classes with a high reliability in dimensions (see
. (A) Based on Cryosparc Freezing Electroractor Image Processing Platform The maximum top 20 transient receptor potential ion channel protein type V member 1 single particle image projection 2D classification map; (b) single particle image 2D classification Probability distribution histogram in the best class; (c) Transient receptor potential ion channel protein type V member 1 single particle image projection angle distribution map Reconstructing single particles three-dimensional structure In the process, the continuity distribution of the number of ordinal images of the single-particle projection image is allocated to different angles, and the resolution of the final three-dimensional model is also directly affected. Among them,
Fig. 6 (c) Indicates that the distribution of different angle dispatches is much larger, and there is less blue representation. It can be seen that in the continuous interval, the angular distribution of single particle images is basically continuous, Note that the single-particle projection image distribution selected in this method is uniform, and there is no status that cannot be identified.
. (AD) Projections of 3D Reconstruction Model of Trpv1 At DiffeLels; (e)
. (AD) Transient receptor potential ion channel protein subclass V member 1 three-dimensional reconstruction model projection; (e ) GSFSC resolution graph 4. Conclusion In order to verify bilateral filters, histogram equalization and restricted Bolzmann machine in refrigerated electron microscopic single particle image recognition Application, this paper first verifies the validity of the method under the simulation of high noise conditions; then, on this basis, this paper uses the actual application of the real TRPV1 refrigerated electron microscopic single particle image data, and through further three-dimensional structure Structure verifies the feasibility of the method. Compared with the 35,645 non-MotionCorr single-particle images obtained by 35,645 non-MotionCorr single-particle images , in the case where the original image file of some original frozen electron microscopy cannot completely download the deletion of data, the final weight A three-dimensional structure having a resolution of 3.63 å. The method described herein illustrates the related technologies of computer graphics and machine learning, which can effectively realize the identification and screening of a single particle image of the refrigerated electron microscope, based on the guarantee of certain accuracy, and greatly improve work efficiency, also Provide a new idea for the processing and optimization of the two-dimensional image of the freezing electron microscope. Fund Project The work of this thesis received funding for the National Natural Science Foundation of China (No. 31971377). Notes *
CHENG, Y. (2018) Single-Particle Cryo-EM-HOW DID IT GET Here and WHERE WILL IT Go. Science, 361, 876-880. HTTPS: //doi.org/10.1126/science.aat4346
HTTPS: / /Doi.org/10.1038/nature12822
Huai, C., Li, G., YAO, R., Zhang, Y., Cao, M., Kong, L., JIA, C., YUAN, H., CHEN, H. AND LU, D. (2017) Structural Insights Into Dna Cleavage Activation of Crispr-Cas9 System. NaturE Communications, 8, 1375.
BAI, R., YAN, C., WAN, R., LEI, J. And Shi, Y. (2017) Structure Of The Post-Catalytic SpliceOSome From Saccharmyces Cerevisiae. Cell, 171, 1589-1598.