Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 17 May 2019

Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks

  • Felipe Oviedo   ORCID: 1   na1 ,
  • Zekun Ren 2   na1 ,
  • Shijing Sun 1 ,
  • Charles Settens 1 ,
  • Zhe Liu 1 ,
  • Noor Titan Putri Hartono   ORCID: 1 ,
  • Savitha Ramasamy 3 ,
  • Brian L. DeCost 4 ,
  • Siyu I. P. Tian   ORCID: 1 ,
  • Giuseppe Romano 1 ,
  • Aaron Gilad Kusne 4 &
  • Tonio Buonassisi 1 , 2  

npj Computational Materials volume  5 , Article number:  60 ( 2019 ) Cite this article

22k Accesses

166 Citations

16 Altmetric

Metrics details

  • Characterization and analytical techniques
  • Computational methods
  • Synthesis and processing

X-ray diffraction (XRD) data acquisition and analysis is among the most time-consuming steps in the development cycle of novel thin-film materials. We propose a machine learning-enabled approach to predict crystallographic dimensionality and space group from a limited number of thin-film XRD patterns. We overcome the scarce data problem intrinsic to novel materials development by coupling a supervised machine learning approach with a model-agnostic, physics-informed data augmentation strategy using simulated data from the Inorganic Crystal Structure Database (ICSD) and experimental data. As a test case, 115 thin-film metal-halides spanning three dimensionalities and seven space groups are synthesized and classified. After testing various algorithms, we develop and implement an all convolutional neural network, with cross-validated accuracies for dimensionality and space group classification of 93 and 89%, respectively. We propose average class activation maps, computed from a global average pooling layer, to allow high model interpretability by human experimentalists, elucidating the root causes of misclassification. Finally, we systematically evaluate the maximum XRD pattern step size (data acquisition rate) before loss of predictive accuracy occurs, and determine it to be 0.16° 2 θ , which enables an XRD pattern to be obtained and classified in 5.5 min or less.


High-throughput material synthesis and rapid characterization are necessary ingredients for inverse design and accelerated material discovery. 1 , 2 X-ray diffraction (XRD) is a workhorse technique to determine crystallography and phase information, including lattice parameters, crystal symmetry, phase composition, density, space group, and dimensionality. 3 This is achieved by comparing XRD patterns of candidate materials with the measured or simulated XRD patterns of known materials. 4 Despite its indispensable utility, XRD is a common bottleneck in materials characterization and screening loops: 1 hour is typically required for XRD data acquisition with high angular resolution, and another 1–2 hours are typically required for Rietveld refinement by an expert crystallographer, assuming the possible crystalline phases are known. It is widely recognized that machine learning methods have potential to accelerate this process; however, practical implementations have thus far focused on well-established materials, 5 , 6 , 7 require combinatorial datasets spanning among various phases, 8 , 9 or require large datasets, 5 , 10 whereas material screening using the inverse design paradigm often involves less-studied materials, spanning multiple classes of different material/phase compositions, and smaller prototype datasets.

Typically, experimental XRD pattern data are analyzed by obtaining descriptors such as peak shape, height, and position. Matching descriptors of the test pattern to known XRD patterns in crystalline databases allows the identification of the compound of interest. 4 Refinement methods such as Rietveld refinement and Pawley refinement have been used for decades to analyze experimental XRD patterns. 4 For novel compounds in thin-film form, however, the use of Rietveld refinement is limited due to the lack of reference patterns in material databases, as well as unknown film textures. The direct-space method, statistical methods, and the growth of single crystals have been used to obtain crystal symmetry information for novel materials, 7 , 11 , 12 , 13 , 14 but the significant iteration time, feature engineering, human expertise, and knowledge of specific material required makes these methods impractical for high-throughput experimentation, where sample characterization rates are of the order of one material per minute or faster, 2 , 15 explored over various material families.

An alternative approach consists in using machine learning methods to obtain more robust spectral descriptors and quickly classify crystalline structure based on peak location and shape in the XRD pattern. Breakthrough methods have been developed for the similar problem of phase attribution in combinatorial alloys, 16 , 17 but only few studies have been developed for solution-processed material screening, such as perovskite screening, where phase attribution is usually not as important as correct classification of materials into groups according to crystal parameters. The most successful methods 10 , 18 for material screening use convolutional neural networks (CNNs) trained with hundreds of thousands of XRD powder patterns simulated with data from the Inorganic Crystalline Structure Database (ICSD). Further CNN and other deep learning algorithms have been employed to obtain crystalline information for other kinds of diffraction data. 19 , 20 , 21 In a couple studies, noise-based data augmentation, a common technique of image preprocessing for machine learning, has been used to avoid overfitting in a broader kind of X-ray characterization problems 20 , 22 , 23 and more broadly in other fields such as Tramission Electron Microscopy imaging; 21 however, the augmentation procedure has not been based in physical knowledge of actual experimental samples. Furthermore, the best-performing machine learning methods developed up-to-date for XRD analysis do not allow any kind of interpretation by the experimentalists, 10 hindering improvements of experimental design.

While similar approaches produce good results for crystal structure classification, we have found that applying them to high-throughput characterization of novel solution-processed compounds is generally not practical, given the limited access to large datasets of clean, preprocessed, relevant, XRD spectra. Furthermore, most materials of interest developed in high-throughput synthesis loops are thin-film materials. The preferred orientation of the crystalline planes in thin-films causes their experimental XRD patterns to differ from the thousands of simulated XRD powder patterns available in most databases. 24 , 25 Thin-film compounds usually will present spectrum shifting and periodic scaling of peaks in preferred orientations, reducing the accuracy of machine learning models trained with powder data, 8 , 9 , 26 even in the cases when noise-based data augmentation techniques are used. 18

Considering these challenges, we propose a supervised machine learning framework for rapid crystal structure identification of novel materials from thin-film XRD measurements. For this work, we created a library of 164 XRD patterns of thin-film halide materials extracted from the >100,000 compounds available in the ICSD 27 ; these 164 XRD patterns include lead halide perovskite 28 , 29 and lead-free perovskites-inspired materials. 30 These XRD patterns were manually classified among different crystal dimensionalities using ICSD information. Based on this small dataset of relevant XRD powder patterns extracted from the ICSD and an additional 115 experimental XRD patterns, we propose a model-agnostic, physics-informed data augmentation to generate a suitable and robust training dataset for thin-film materials, and subsequently test the space group and dimensionality classification accuracy of multiple machine learning algorithms. A one-dimensional (1D) implementation of an “ all convolutional neural network ” (a-CNN) 31 is proposed, implemented and identified as the most accurate and interpretable classifier for this problem. We propose a new way to use class activation maps (CAMs), 32 computed from the weight distribution of a global average pooling layer and adapted to the context of our problem, to provide interpretability of root causes of classification success or failure to the experimentalist. Subsequently, the effect of the augmented dataset size and the XRD pattern granularity is investigated. Our proposed methodology could be applied to other crystal descriptors of thin-film materials, such as lattice parameters or atomic coordinates, as long as labeled information is available.

Our contributions can be summarized as: (a) development of physics-informed data augmentation for thin-film XRD, which successfully addresses the sparse/scarce data problem, breaching the gap between the thousands of XRD patterns in crystalline databases and real thin-film materials, (b) development of highly interpretable, highly accurate, a-CNN for XRD material screening, (c) proposal of average class activation maps as a feasible interpretability tool in CNNs trained on spectral data.

Results and discussion

Framework for rapid xrd classification.

The framework developed for rapid classification of XRD thin-film patterns according to crystal descriptors is shown in Fig. 1a . The methodology makes use of both experimental and simulated XRD patterns to train a machine learning classification algorithm. A simulated dataset is defined by extracting crystal structure information from the ICSD. The experimental dataset consists of a set of synthesized samples, which are manually labeled for training and testing purposes. The datasets are subjected to data augmentation based on the three spectral transformations shown in Fig. 2 .

figure 1

General framework and a-CNN architecture. a Schematic of our X-ray diffraction data classification framework, with physics-informed data augmentation. b Schematic of the best-performing algorithm, our all convolutional neural network

figure 2

Physics-informed data augmentation: schematic of the physics-informed data augmentation algorithm, which accounts for the particularities of thin-film XRD spectra, as described by the subsequent transformations stated in Eqs. 1 – 3

The crystal descriptors of interest, space group and crystal dimensionality , are chosen because of their importance for material screening in accelerated material development. In many inorganic material systems, the crystalline dimensionality—i.e., a generalization of the crystalline symmetry into 0-dimensional (0D), 1D, 2D, or 3D symmetry— constitutes a figure of merit for experimental material screening as it correlates with observed charge-transport properties. 33 In perovskites and perovskites-inspired materials, for instance, 3D crystalline structures have been shown to have good carrier-transport properties for solar cells and light-emitting diode applications, 33 , 34 whereas 3D–2D mixtures have been found to have greater stability in lead halide perovskites than pure-phase 3D crystals. 35 With further detail, the space group number describes the standardized symmetry group of a configuration in space, classifying crystal symmetries into 230 groups. Identifying the space group number of a sample provides crystal information beyond dimensionality, including atomic bonding angles and relative distances, which are believed to be of importance for predicting material properties. 36 In this specific study, the framework relies on the relation between XRD patterns and the crystal descriptors of interest. For example, among perovskite-inspired materials for photovoltaic applications, 3D cubic lead halide perovskites of multiple compositions show distinct features in their XRD patterns compared with 2D layered bismuth perovskites. 37 , 38

Typically, the powder XRD pattern is used to identify space group through Rietveld refinement, but the compression of crystalline 3D crystallographic information into a 1D diffraction pattern causes the space group to be impossible to determine unambiguously in certain low-symmetry phases, independently of the measurement technique. 39 In this work, the space groups of interest are able to be determined from XRD information only.

To better account for noise measurement and the physical difference between randomly oriented powder patterns and experimental thin-film patterns, the patterns were subjected to a process of data augmentation based on domain knowledge. Subsequently, both augmented experimental and simulated XRD pattern datasets are used for testing, training and cross-validation of machine learning algorithms.

Figure 1b shows the architecture of the final a-CNN, which is proposed and identified as the best-performing machine learning algorithm in subsequent sections.

Experimental measurement and labeling of XRD patterns

The experimental dataset consists of 75 XRD patterns for dimensionality classification, summarized in Supplementary Table S1 , and 88 XRD patterns for space group classification, summarized in Supplementary Table S2 . A total of 115 unique labeled XRD patterns are considered among both datasets. For this work, perovskites-inspired 3D materials based on lead halide perovskites (space group \(Pm\bar 3m\) ), tin halide perovskite ( I 4/ mcm ), cesium silver bismuth bromide double perovskite ( \(Fm\bar 3m\) ), bismuth and antimony halide 2D ( \(P\bar 3m1\) , \({\it Pc}\) , \({P2}_{1}{\!/\!a}\) ), and 0D ( P 6 3 / mmc ) perovskite-inspired materials are synthesized and used as training and testing dataset. The details of the synthesis and characterization methodology are described in great detail in our experimental study. 40 Supplementary Fig. S1 shows the t-Distributed Stochastic Neighbor Embedding representation of the XRD patterns labeled with dimensionality and space group, providing evidence of the complexity of the classification problem.

In this case, Rietveld refinement is restricted due to the unknown preferred orientation and texture of the thin-film samples. In consequence, the XRD patterns are subjected to peak indexing and the dimensionality and space group are labeled based on ICSD data.

Data mining and simulation of XRD patterns

The simulated training dataset consists of 164 compounds extracted from ICSD with a similar composition, expected crystal symmetry, and space group as the synthesized materials of interest. All the possible single, double, ternary, and quaternary combinations of the elements of interest were extracted during database mining. The compositions of all the materials of interest along with labeled dimensionality and space group information is available in Supplementary Table S3 . The fundamental crystal descriptors extracted from the material database are used to simulate XRD powder patterns with random crystalline orientations, as explained in the Methods section.

Data augmentation based on domain knowledge

Experimental thin-film XRD patterns vary greatly compared to simulated, idealized, randomly oriented XRD patterns. Due to expansions and contractions in the crystalline lattice, XRD peaks shift along the 2 ϴ axis according to the specific size and location of the different elements present in a compound, while maintaining similar periodicity based on crystal space group. 3 , 8 , 26 , 41 In addition, for thin-film samples, the XRD pattern can be shifted due to strain in the film induced during the fabrication process. 42 Polycrystalline thin-films are also known to have preferred orientations along certain crystallographic planes. The preferred orientation is influenced by the crystal growth process and the growth substrate, 24 and is common for most solution-processing and vapor-deposition fabrication methods. Ideal random powders contain multiple grains without any preferred global orientations, thus all crystallographic orientations are represented evenly in the peak intensity and periodicity of the XRD pattern. As a consequence of their preferred orientations along crystallographic planes, thin-film XRD relative peak intensities are scaled up periodically in the preferred plane orientation, and scaled down periodically or even eliminated in the non-preferred orientations.

To increase the size and robustness of the limited training dataset and to account for these fundamental differences between real thin-films and simulated XRD powder patterns, we perform a three-step data augmentation procedure based on physical domain knowledge:

( 1 ) Peak scaling, ( 2 ) Peak elimination, and ( 3 ) Pattern shifting. These transformation are described in detail in Methods. Figure 2 summarizes the data augmentation steps and its effects on a representative pattern. Given the hyperparameters S and ε in Eqs. 1 – 3 in the Methods section, 2000 patterns are augmented from the simulated dataset, and 2000 patterns are augmented from experimentally measured spectra.

We choose to perform physics-informed data augmentation instead of explicit regularization for the following reasons: ( 1 ) data augmentation has been found to be more robust at avoiding overfitting than explicit regularization when using convolutional neural networks, 43 ( 2 ) data augmentation is model-agnostic, allowing our approach to successfully bridge the gap between experimental XRD patterns and thousands of XRD patterns available in databases without depending on a specific model that might not generalize well in all cases ( no-free-lunch theorem ), and ( 3 ) physics-informed data augmentation allows high interpretability, and is found to be more robust than traditional noise-based data augmentation approaches (Supplementary Table S6 ).

Classification results and a-CNN

Preprocessed, augmented experimental data, and augmented simulated data are fed into various supervised machine learning algorithms for training and testing purposes. The best-performing algorithm is evaluated. The XRD patterns are classified into three crystal dimensionalities (0D, 2D, and 3D) and seven space groups ( \(Pm\bar 3m\) , I 4/ mcm , \(Fm\bar 3m,P\bar 3m1\) , \({Pc}\) , P 2 1 / a , and P 6 3 / mmc ).

For this purpose, we represent the XRD pattern as either a vector or a time series. For each kind of data representation, different classification algorithms are considered. Using a vector representation of the XRD pattern, the following classification methods are tested: Naive Bayes, k -Nearest Neighbors, Logistic Regression, Random Forest, Decision Trees, Support Vector Machine, Gradient Boosting Decision Trees, a Fully Connected Deep Neural Network, and an a-CNN with a global pooling layer. 44 , 45 , 46 The XRD patterns are also analyzed as a time series with a normalized Dynamic Time Warping (DTW) distance metric 47 combined with a k -Nearest Neighbors classification algorithm, which was found in literature as the most adequate metric for measuring similarity among metal-alloy XRD spectra. 7 , 26

The problem of novel material development is inherently a multi-class classification problem, in which the classes for training and testing purposes can often be imbalanced as some material families are better characterized than others (e.g., lead-based perovskites are better represented in material databases than newer lead-free perovskites). 38 Common metrics for binary classification such as accuracy might not be the most adequate in this context. 45 The final choice for adequate metrics depends on the relative importance of false positive and false negatives in minority and majority classes, according to the goals of the experimentalist. For method development in this work, we consider the following metrics: subset accuracy , defined as the number of correctly classified patterns among all test patterns, and F 1 score. F 1 score in this problem can be interpreted as the weighted harmonic mean of precision and recall; the closer it is to 1.0, the higher the classifier’s precision and recall. 45 Intuitively, precision is the ability of the classifier not to produce a false positive, whereas recall is the ability of a classifier to find all the true positives. An F 1 metric is calculated for each class label, and it is combined into an overall score by taking either the micro or the macro average of the individual scores. The macro average calculates the mean of the metrics of all the individual classes, hence treating all classes equally. The micro average adds the individual contribution of all samples to compute the overall metric.

When there is class imbalance, accuracy and F 1 micro score characterize the classifier’s performance over all classes, whereas F 1 macro emphasizes the accuracy on infrequent classes. 45 Thus, a natural choice for high-throughput experiments across multiple material classes seems to be accuracy/F 1 micro score, except in those cases when we are especially interested in analyzing an infrequent material class, being F 1 macro a more representative metric in that case. In this work, we choose to report the classification accuracy and both F 1 micro and F 1 macro , whereas recall and precision results are included in the Supplementary Information (Supplementary Tables S4 and S5 ).

We measure the performance of the dimensionality and space group classification methods based on three different approaches of splitting the training and testing datasets:

Case 1: Exclusively simulated XRD patterns are used for testing and training. Fivefold cross-validation is performed.

Case 2: The simulated XRD patterns are used for training, and the experimental patterns for known materials are used for testing.

Case 3: All of the simulated data and 80% of the experimental data are used for training, and 20% of the experimental data are used for testing. Fivefold cross-validation is performed.

Each one of the training/testing cases mentioned earlier are tested for crystal dimensionality and space group prediction accuracy and micro / macro F 1 score. The results are reported in Table 1 . In each cell, the crystal dimensionality classification metric is reported first, followed by the metric for space group classification. Case 1, presenting fivefold cross-validation results of the simulated dataset, has the highest accuracy as it does not predict any experimental data and thus is free of experimental errors for both crystal descriptors. Case 2 performs the experimental prediction solely based on simulated patterns, thus having the lowest accuracy. Finally, Case 3 has a significant higher accuracy than Case 2 for both crystal dimensionality and space group prediction. F 1 scores follow these trends as well.

In general, the model’s accuracy and F 1 score is lower for space group classification than that for crystal dimensionality classification. This discrepancy is caused by the lower number of per-class labeled examples for space group classification compared to crystal dimensionality classes. Class imbalance can also systematically affect the training performance of the classifier, to avoid this issue, we performed an oversampling test with synthetic training data according to, 48 , 49 and observed little discrepancy of accuracy between the balanced and imbalanced datasets after fivefold cross-validation.

The use of experimental data as part of the training set increases the model’s accuracy and robustness. This fact can be explained by the high variability of experimental thin-film XRD patterns, even after data preprocessing. The relatively high accuracy with the relatively small number of experimental samples (on the order of 10–10 2 ) confirms the potential of our data augmentation strategy to yield high predictive accuracies even with small datasets. Supplementary Table S6 in the Supplementary Information compares our strategy with traditional noise-based augmentation approaches, and shows an average increase of classification accuracy of >12% absolute.

Naturally, the F 1 macro score is systematically lower than the F 1 micro score, reflecting the impact of misclassification of those dimensionality and space group classes with less training examples. However, the F 1 macro score is still fairly high for most classifiers. This fact reflects the importance of adequate experimental design to achieve good generalization among classes.

For all three test cases, the a-CNN classifier performs better than any other classification technique. The 1D a-CNN architecture implemented is composed of three 1D convolutional layers, with 32 filters each, and strides and kernel sizes of 8, 5, and 3 units, respectively. The activation function between layers is ReLu . A global average pooling layer 50 (acting as a weak regularizer) and a final dense layer with softmax activation is used. The loss function minimized is binary cross entropy. We use early stopping with a batch size of 128 during training, and use the Adam optimizer algorithm to minimize the loss function. The CNN is implemented in Keras 2.2.1 with the Tensorflow background. Figure 1b contains a schematic of the proposed a-CNN architecture.

Our a-CNN architecture, in contrast with other CNNs, does not have max pooling layers between convolutional layers, and also lacks a set of dense layers in the final softmax classification layer. These modifications, in contrast with the architectures used in, 10 , 18 significantly reduces the number of parameters in the neural network, allows faster and simpler training, and are less prone to overfitting. Another advantage of our implementation is the possibility to extract CAMs from the weight distribution of the global average pooling layer. This characteristic of the network, properly adapted to our problem, allows us to visualize how the classified XRD patterns are mapped to the weight distribution at the end of the a-CNN. The results are further discussed in the interpretability subsection.

The a-CNN trained after data augmentation has an accuracy of >93 and 89% for crystal dimensionality and space group classifications, respectively. As far as we know, the accuracy is among the highest described in literature for space group classification algorithms, comparable to those trained with thousands of ICSD patterns and manual labeling by human experts, 10 , 51 and is also comparable to similar approaches in other kinds of diffraction data. 8 , 19 The neural network seems to be the most adequate method for high-throughput synthesis and characterization loops, as it also performs relatively well in terms of algorithm speed and in conditions of class imbalance. In the future, our methodology can be extended to other materials systems, and may include other crystal descriptors as predicted outputs, such as lattice parameters and atomic coordinates.

Furthermore, the a-CNN performs better than the traditional k -nearest neighbors method using DTW. In our test case and dataset, the differences between thin-film and powder spectra seem not to be captured properly by DTW alone. Arguably, DTW could be more useful if a larger XRD thin-film pattern dataset is available for k -Nearest Neighbors classification, or if it exists greater similarity between XRD patterns of the same class, allowing the DTW warping path to be better captured within the DTW window under consideration. 26 CNNs have been found to perform better than DTW for classification of time series, which is consistent with our results. 52

Effect of augmented dataset size

The size of the dataset is critical for obtaining a high accuracy and F 1 score. To explore the effect of augmented dataset size, the a-CNN accuracy was computed for various combinations of augmented experimental patterns (i.e., number of augmented XRD spectra originated from the 88 measured spectra, varying S and ε in Eqs. 1 – 3 ) and augmented simulated dataset sizes (i.e., number of augmented XRD spectra originating from the 164 simulated ICSD spectra). Figure 3 a, b summarizes this sensitivity analysis for Case 3 training/testing conditions for dimensionality and space group classification. Twenty different fivefold cross-validation runs were performed to calculate the 1-standard deviation error bars for each data point.

figure 3

Accuracy of the a-CNN model with augmented data: line plot showing mean Case 3 a-CNN accuracy, as a function of the number of augmented patterns (Methods, Eqs. 1 – 3 ) included in the training set, for ( a ) dimensionality classification and ( b ) space group classification. The x axis shows augmented experimental data (based on the original experimental XRD patterns), and the legend shows simulated data (based on the 164 simulated powder diffraction patterns obtained from the ICSD). The error bars correspond to one standard deviation from the mean

In general, as the size of the experimental and augmented datasets increase, the mean accuracy quickly approaches the asymptotic accuracy reported in Table 1 . This trend reaffirms and quantifies the importance of data augmentation for the predictive accuracy of our model.

The critical augmented dataset size seems to be around 700 augmented spectra. The model’s accuracy is more sensitive to the augmented experimental dataset size, likely because most of the dataset variance comes from the experimental XRD patterns. The data augmentation of the simulated dataset causes the accuracy to grow monotonically in Fig. 3b ; however, this trend is not satisfied in the case of Fig. 3a , where no augmented simulated data seems on average to perform the best. We hypothesize that augmenting simulated data could actually introduce excessive noise to the model, hampering classification when the number of possible classes is small.

Figure 3 illustrates that if no data augmentation is used (i.e., the origin, 0, 0), the predictive accuracy could be below 50% for space group and below 70% for dimensionality. Our physics-informed data augmentation directly increase accuracy by up to 23% in the case of dimensionality and 19% in the case of space group classification. This result reinforces the need for data augmentation for sparse/scarce datasets, as is typical with early-stage material development.

Impact of data coarsening

To evaluate the trade-offs between accuracy and XRD acquisition speed, we investigate how data coarsening of the XRD pattern impacts the accuracy of ML algorithm prediction. In Fig. 4 , we report Case 3 accuracy with increasing 2 ϴ angle step size. The baseline step size of the 2 ϴ scan in our XRD patterns is 0.04°. Data coarsening is performed by selectively removing the data with increasing step sizes and rerunning the augmentation and classification algorithms. For crystal dimensionality and space group classification, the highest accuracies are achieved at 0.04–0.08°, whereas 85%+ accuracy is achieved when the 2 ϴ step size is 0.16° or less for both cases. Using the larger step size, the XRD pattern acquisition time can be reduced by 75%, allowing the full spectra to be measured and classified in <5.5 min with our tool setup.

figure 4

Reducing XRD pattern acquisition time: simulation of the trade-off between XRD pattern acquisition rate and predictive accuracy. Accuracies for crystal dimensionality and space group predictions are estimated by coarsening the XRD spectrum 2 ϴ step size for Case 3 conditions. The error bars correspond to one standard deviation from the mean

Interpretability using CAMs

Class activation maps (CAM) are representations of the weights in the last layer of a CNN, before performing classification. A CAM for a certain class and pattern indicates the main discriminative regions (in our case, peak and series of peaks in the XRD pattern) that the network uses to identify that class. 32 Details of the CAM computation are included in the Methods section.

A similar approach has been followed to interpret and improve object recognition in images and videos using 2D CNNs. 32 A single pattern or image produces a unique CAM revealing the main discriminative features. In addition, in the context of our problem, we propose to generalize CAMs to all training samples within a class by averaging over each of the computed CAM weights for all training samples within a class, as explained in the Methods section. This averaging procedure is justified as the location and periodicity of discriminative peaks within a class varies only slightly along all labeled samples in the training set, and can be extended to many spectral measurement techniques. This average CAM allows to visualize the main discriminative features of an XRD pattern that were used to classify all the training data belonging to a certain class.

By comparing the CAM for a single pattern with the average CAM for given class, we can identify the root causes of correct and incorrect classification by the a-CNN. Figure 5 illustrates this procedure for XRD patterns of space group classes Class 2 ( P 2 1 / a ) and Class 6 ( \(Pm\bar 3m\) ). Figure 5a, b shows the average CAM maps of Class 6 and Class 2 space groups. Figure 5c shows the CAM of an individual, correctly classified XRD pattern. If we compare the individual CAM with Class 6 Average CAM, we can see that the neural network is identifying the same reference pattern as most of Class 6 samples, which translates into classification as Class 6. In contrast, Fig. 5d shows an incorrectly classified XRD pattern, which was determined to belong to Class 6 by the CNN, when in reality it belongs to Class 2. A comparison of the individual CAM with the average CAMs of classes 6 and 2 reveals that the misclassified CAM is more similar to Fig. 5a than Fig. 5b . A closer look to the misclassified patterns, shows that the periodicity of peaks before 30° and the relative lack of peaks between 30° and 50° (likely caused by mixed phases), are causing the misclassification.

figure 5

Class activation maps for misclassification interpretability: class activation maps (CAMs) generated by the a-CNN architecture, representing space group classification of Class 2 ( P 2 1 / a ) and Class 6 ( \(Pm\bar 3m\) ). ( a , b ) correspond to maps generated by averaging all training samples in a certain class, whereas ( c , d ) correspond to the CAMs of correctly classified and incorrectly classified individual patterns, respectively. The correctly predicted pattern of ( c ) is explained by the similarity of its CAM to the average CAM of Class 6; whereas, the incorrectly prediction ( d ) can be explained by comparing its CAM with the average CAMs of classes 2 and 6

Several representative misclassification cases are identified and included in the Section VII of Supplementary Information. In our work, by comparing the CAM of wrongly classified cases with the average CAM of all training data within certain class, we conclude there are three main causes for failed classification. The first cause is the mixture of phases in the sample, which increases the chance of the CAM of a particular pattern differing from the average class CAM. The second cause is lack of XRD patterns in the per-class training data. We only have <5 patterns in certain space group classes, thus the average CAM of minority classes has only a limited number of discriminative features compared to majority classes, increasing the testing error. The third cause of misclassification is missing peaks, or too few peaks, present in the XRD pattern; because of the lack of enough discriminative features, the CAM of the incorrectly labeled pattern could be ambiguously similar to two or more average class CAMs, causing the misclassification.

After the experimentalist identifies these root causes of error, they can be mitigated by increasing the number of experimental training points for certain classes or increasing the phase purity of the material coming from the synthesis and solution-processing procedure.

Summary of contributions

In this work, we develop a supervised machine learning framework to screen novel materials based on the analysis of their XRD spectra. The framework is designed specifically for cases when only sparse datasets are available, e.g., early-stage high-throughput material development and discovery loops. Specifically, we propose a physics-informed data augmentation method that extends small, targeted experimental and simulated datasets, and captures the possible differences between simulated XRD powder patterns and experimental thin-film XRD patterns. A few thousand augmented spectra are found to increase our classification accuracy from <60 to 93% for dimensionality and 89% for space group. The F 1 macro score is also over 0.85 for various algorithms, reflecting the model’s capacity to deal with significantly imbalanced classes.

When trained with both augmented simulated and experimental XRD spectra, a-CNNs are found to have the highest accuracy/ F 1 macro score among the many supervised machine learning methods studied. Our proposed a-CNN architecture allows high performance and interpretably through CAMs. The use of our proposed average CAMs, allow to identify the root cause of misclassification, and allow the design of a robust experiment. Furthermore, we find that the neural network model tolerates coarsening of the training data, providing future opportunities for online learning, i.e., the on-the-fly adaptive adjustment of XRD measurement parameters by taking feedback from machine learning algorithms. 15

Our approach can be extended to XRD-based high-throughput screening of any kind of thin-film materials, beyond perovskite and perovskite-inspired materials. Since most material databases lack information for various kinds of novel materials, the data augmentation approach to tackle data scarcity can be broadly applied. The underlying difference between XRD patterns of thin-films and powders is common to a broad range of materials and is commonly seen in most thin-film characterization experiments. The high interpretability of our approach could allow future work in semi-supervised or active learning, allowing the CAM maps to guide manual XRD refinement or actual XRD experiments. The framework may also be extended beyond XRD classification, to any spectrum containing information-rich features that require classification (XPS, XRF, PL, mass spectroscopy, etc.). The advantages of our approach include: ( 1 ) interpretable error analysis; ( 2 ) a data augmentation strategy that enables fast and accurate classification, even with small and imbalanced datasets.

Measurement and preprocessing of experimental XRD patterns

The XRD patterns for each sample are obtained by using a parallel beam, X-ray powder diffraction Rigaku SmartLab system 53 with 2ϴ angle from 5 to 60 ° with a step size of 0.04 ° . The tool is configured in a symmetric setup. We preprocess the raw XRD patterns to reduce the experimental noise and the background signal. For this purpose, the background signal is estimated and subtracted along the 2ϴ axis, and the spectrum is smoothed conserving the peak width and relative peak size applying the Savitzky-Golay filter. 54 A representative example of a preprocessed and raw XRD spectra is included in Supplementary Fig. S2 .

Simulation of powder XRD patterns

The powder XRD simulations are carried out with Panalytical Highscore v4.7 software based on the Rietveld algorithm implementation by Hill and Howard. 55 , 56 The XRD crystal is assumed to have fully random grain orientations. The unit cell lattice parameters, atomic coordinates, atomic displacement parameters, and space group information are considered for the structure factor calculation in the Rietveld model.

Physics-informed data augmentation

Suppose we describe the series of peaks in an XRD pattern by a discrete function \(\left( {2\theta } \right):I \to {\Bbb R}^ +\) , which maps a set of discrete angles I to positive real numbers \({\mathbb{R}^+}\) corresponding to peak intensities. We augment the data through the following sequential process of transformations \(\Phi _{S,c}^1,\Phi _S^2\) , and \(\Phi _\varepsilon ^3\) :

Random peak scaling is applied periodically along the 2 ϴ axis to account for different thin-film preferred orientations. A subset of random peaks at periodic angles S is scaled by factor c , such that

Random peak elimination (with a different randomly selected S than Eq. 1 ) is applied periodically along the 2 ϴ axis, to account for different thin-film preferred orientations, such that

Pattern shifting by small random value ε along the 2 ϴ direction to allow for different material compositions and film strain conditions, such that

We perform global average pooling in the last convolutional layer. The pooling results are defined as \(\mathop {\sum }\limits_x f_k\left( x \right)\) where f k ( x ) is the activation of unit k at convolution location x . The pooling results \(\mathop {\sum }\limits_x f_k\left( x \right)\) are subsequently fed into the softmax classifier. The class confidence score, thus can be computed as:

We define \(M_C\left( x \right) = \mathop {\sum}\nolimits_k {w_k^c} f_k\left( x \right)\) , as the CAM for class C , where \(w_k^C\) is the weight of unit k of the last convolution layer of class C. Therefore, Sc can be rewritten as:

M C ( x ) directly indicates the importance of the activation at the discrete location x for a given XRD pattern in class C .

In the context of XRD and other spectral measurements, the location of relevant intensity peaks is likely to be displaced only little from sample to sample within a certain class. Thus, the CAMs for all samples within a class are similar, and could be averaged to obtain the discriminative features during activation. Thus, we can define average CAMs for class C as \(\bar M_C\left( x \right)\) :

where n corresponds to the training patterns labeled as class C.

Data availability

The experimental dataset analyzed during the current study is available in the following GitHub repository: [ ]. The simulated and labeled data that support the findings of this study is available from the Inorganic Crystalline Structure Database (ICSD), but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of ICSD.

Code availability

The codes used for preprocessing, data augmentation, and classification are available in the GitHub repository [ ]. The classification algorithms are implemented using scikit-learn 0.20 45 and the fastdtw python library. The a-CNN is implemented using Keras with the Tensorflow background.

Tabor, A., Roch, D. & Saikin, L. Lawrence Berkeley National Laboratory recent work title accelerating the discovery of materials for clean energy in the era of smart automation. Nat Rev Mater . (2018).

Article   CAS   Google Scholar  

Correa-Baena, J.-P. et al. Accelerating materials development via automation, machine learning, and high-performance computing. Joule 2 , 1410–1420 (2018).

Dinnebier, R. E. Powder Diffraction: Theory and Practice . (RSC Publ, Cambridge, 2009).

Rietveld, H. M. A profile refinement method for nuclear and magnetic structures. J. Appl. Crystallogr. 2 , 65–71 (1969).

Carr, D. A., Lach-hab, M., Yang, S., Vaisman, I. I. & Blaisten-Barojas, E. Machine learning approach for structure-based zeolite classification. Microporous Mesoporous Mater. 117 , 339–349 (2009).

Baumes, L. A., Moliner, M., Nicoloyannis, N. & Corma, A. A reliable methodology for high throughput identification of a mixture of crystallographic phases from powder X-ray diffraction data. CrystEngComm 10 , 1321–1324 (2008).

Baumes, L. A., Moliner, M. & Corma, A. Design of a full-profile-matching solution for high-throughput analysis of multiphase samples through powder X-ray diffraction. Chem. - A Eur. J. 15 , 4258–4269 (2009).

Stanev, V. et al. Unsupervised phase mapping of X-ray diffraction data by nonnegative matrix factorization integrated with custom clustering. npj Comput. Mater . 4 , 43 (2018).

Kusne, A. G., Keller, D., Anderson, A., Zaban, A. & Takeuchi, I. High-throughput determination of structural phase diagram and constituent phases using GRENDEL. Nanotechnology 26 , 444002 (2015).

Park, W. B. et al. Classification of crystal structure using a convolutional neural network. IUCrJ . 4 , 486–494 (2017).

Park, W. B., Singh, S. P., Yoon, C. & Sohn, K. S. Combinatorial chemistry of oxynitride phosphors and discovery of a novel phosphor for use in light emitting diodes, Ca 1.5 Ba 0.5 Si 5 N 6 O 3 :Eu 2+ . J. Mater. Chem. C. 1 , 1832–1839 (2013).

Rybakov, V. B., Babaev, E. V., Pasichnichenko, K. Y. & Sonneveld, E. J. X-ray mapping in heterocyclic design: VI. X-ray diffraction study of 3-(isonicotinoyl)-2-oxooxazolo[3,2-a]pyridine and the product of its hydrolysis. Crystallogr. Rep. 47 , 473–477 (2002).

Google Scholar  

Hirosaki, N., Takeda, T., Funahashi, S. & Xie, R. J. Discovery of new nitridosilicate phosphors for solid state lighting by the single-particle-diagnosis approach. Chem. Mater. 26 , 4280–4288 (2014).

Suram, S. K. et al. Automated phase mapping with AgileFD and its application to light absorber discovery in the V-Mn-Nb oxide system. ACS Comb. Sci. 19 , 37–46 (2017).

Ren, F. et al. Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments. Sci. Adv . 4 , eaaq1566 (2018).

Article   Google Scholar  

Bunn, J. K. et al. Generalized machine learning technique for automatic phase attribution in time variant high-throughput experimental studies. J. Mater. Res. 30 , 879–889 (2015).

Kusne, A. G. et al. On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets. Sci. Rep. 4 , 1–7 (2014).

Vecsei, P. M. et al. Neural network-based classification of crystal symmetries from X-ray diffraction patterns. arXiv preprint arXiv:1812.05625 (2018).

Ziletti, A., Kumar, D., Scheffler, M. & Ghiringhelli, L. M. Insightful classification of crystal structures using deep learning. Nat. Commun. 9 , 1–10 (2018).

Ke, T. W. et al. A convolutional neural network-based screening tool for X-ray serial crystallography. J. Synchrotron Radiat. 25 , 655–670 (2018).

Ziatdinov, M. et al. Deep learning of atomically resolved scanning transmission electron microscopy images: chemical identification and tracking local transformations. ACS Nano. 11 , 12742–12752 (2017).

Le Bras, R. et al. A computational challenge problem in materials discovery: synthetic problem generator and real-world datasets. Proceedings of the Twenty-Eighth AAAI Conference of Artifical Intelligence System , pp. 438–443 (2014).

LeBras, R. et al. Constraint reasoning and kernel clustering for pattern decomposition with scaling. in Principles and Practice of Constraint Programming – CP 2011 (ed. Lee, J.) 508–522 (Springer, Berlin, Heidelberg, 2011).

Järvinen, M. Application of symmetrized harmonics expansion to correction of the preferred orientation effect. J. Appl. Crystallogr. 26 , 525–531 (1993).

Fewster, P. F., Langford, J. I. & Fewster, P. F. Reports on progress in physics related content X-ray analysis of thin films and multilayers. 59 , 11 (1996).

Iwasaki, Y., Kusne, A. G. & Takeuchi, I. Comparison of dissimilarity measures for cluster analysis of X-ray diffraction data from combinatorial libraries. npj Comput. Mater. 3 , 1–8 (2017).

Belkly, A., Helderman, M., Karen, V. L. & Ulkch, P. New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design. Acta Crystallogr. Sect. B Struct. Sci. 58 , 364–369 (2002).

Eperon, G. E. et al. Formamidinium lead trihalide: a broadly tunable perovskite for efficient planar heterojunction solar cells. Energy Environ. Sci. 7 , 982–988 (2014).

Lee, M. M., Teuscher, J., Miyasaka, T., Murakami, T. N. & Snaith, H. J. Efficient hybrid solar cells based on meso-superstructured organometal halide perovskites. Sci. (80-.) 338 , 643–647 (2012).

Hoye, R. L. Z. et al. Perovskite-inspired photovoltaic materials: toward best practices in materials characterization and calculations. Chem. Mater. 29 , 1964–1988 (2017).

Springenberg, J. T. et al. Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806. (2014).

Zhou, B. et al. Learning deep features for discriminative localization. Proceedings of the IEEE conference on computer vision and pattern recognition . (2016).

Etgar, L. The merit of perovskite’s dimensionality; Can this replace the 3D halide perovskite? Energy Environ. Sci. 11 , 234–242 (2018).

Xiao, Z., Meng, W., Wang, J., Mitzi, D. B. & Yan, Y. Searching for promising new perovskite-based photovoltaic absorbers: the importance of electronic dimensionality. Mater. Horiz . (2017).

Zhang, T., Long, M., Liu, P., Xie, W. & Xu, J.-B. Stable and efficient 3D-2D perovskite-perovskite planar heterojunction solar cell without organic hole transport layer. Joule (2018).

Kurchin, R. C., Gorai, P., Buonassisi, T. & Stevanović, V. Structural and chemical features giving rise to defect tolerance of binary semiconductors. Chem. Mater. 30 , 5583–5592 (2018).

Baikie, T. et al. Synthesis and crystal chemistry of the hybrid perovskite (CH3NH3)PbI3 for solid-state sensitised solar cell applications. J. Mater. Chem. A 1 , 5628–5641 (2013).

Sun, S. et al. Synthesis, crystal structure, and properties of a perovskite-related bismuth phase, (NH 4 ) 3 Bi 2 I 9 . APL Mater. 4 , 031101 (2016).

Coelho, A. A. TOPAS-Academic, Version 6: technical reference. (Coelho Software. Brisbane, Australia, 2016).

Shijing Sun, et al. Accelerating photovoltaic materials development via high-throughput experiments and machine-learning-assisted diagnosis. Joule 2 , 1410–1420 (2018).

Ermon, S. et al. Pattern decomposition with complex combinatorial constraints: application to materials discovery. AAAI'15 Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence . (2014).

Zhao, J. et al. Strained hybrid perovskite thin films and their impact on the intrinsic stability of perovskite solar cells. Sci. Adv. 3 , eaao5616 (2017).

Hernández-García, A. & König, P. Data augmentation instead of explicit regularization. arXiv preprint arXiv:1806.03852 (2018).

Goodfellow, I., Bengio, Y., Courville, A. & Bengio, Y. Deep Learning . 1 (MIT Press, Cambridge, 2016).

Pedregosa, F. et al. Scikit-learn: machine learning in {P}ython. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

Friedman, J., Hastie, T. & Tibshirani, R. The Elements of Statistical Learning . 1 (Springer Series in Statistics, New York, NY, USA, 2001).

Salvador, S. & Chan, P. FastDTW: toward accurate dynamic time warping in linear time and space. Time 11 , 70–80 (2004).

Haixiang, G. et al. Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73 , 220–239 (2017).

Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 , 321–357 (2002).

Lin, M., Chen, Q. & Yan, S. Network in network. arXiv preprint arXiv:1312.4400. (2013).

Yoon, C. H. et al. Unsupervised classification of single-particle X-ray diffraction snapshots by spectral clustering. Opt. Express 19 , 16542 (2011).

Wang, Z., Yan, W. & Oates, T. Time series classification from scratch with deep neural networks: a strong baseline. Proc. Int. Jt. Conf. Neural Netw. 2017 , 1578–1585 (2017).

Kobayashi, S. & Inaba, K. X-ray thin-film measurement techniques. Mass Spectrosc. Equipped a Ski. Interface 28 , 8 (2012).

CAS   Google Scholar  

Press, W. H. & Teukolsky, S. A. Savitzky-Golay smoothing filters. Comput. Phys. 4 , 669 (1990).

Hill, R. J. & Howard, C. J. Quantitative phase analysis from neutron powder diffraction data using the Rietveld method. J. Appl. Crystallogr. 20 , 467–474 (1987).

Degen, T., Sadki, M., Bron, E., König, U. & Nénert, G. The HighScore suite. Powder Diffr. 29 , S13–S18 (2014).

Download references


We thank FengXia Wei (A*STAR), Jason Hattrick-Simpers (NIST), Max Hutchinson (Citrine Informatics), and Zachary T. Trautt (NIST) for helpful suggestions on our analysis. Prof. Juan Pablo Correa Baena at Georgia Tech for sharing various experimental XRD patterns. This work was supported by a TOTAL SA research grant funded through MITei (supporting the experimental XRD), the National Research Foundation (NRF), Singapore through the Singapore Massachusetts Institute of Technology (MIT) Alliance for Research and Technology’s Low Energy Electronic Systems research program (supporting the machine learning algorithm development), the Accelerated Materials Development for Manufacturing Program at A*STAR via the AME Programmatic Fund by the Agency for Science, Technology and Research under Grant No. A1898b0043 (for ML algorithm error analysis), and by the U.S. Department of Energy under the Photovoltaic Research and Development program under Award DE-EE0007535 (for code framework development). This work made use of the CMSE at MIT, which is supported by NSF award DMR-0819762.

Author information

These authors contributed equally: Felipe Oviedo, Zekun Ren

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, MA, 02139, USA

Felipe Oviedo, Shijing Sun, Charles Settens, Zhe Liu, Noor Titan Putri Hartono, Siyu I. P. Tian, Giuseppe Romano & Tonio Buonassisi

Singapore-MIT Alliance for Research and Technology, Singapore, 138602, Singapore

Zekun Ren & Tonio Buonassisi

Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore, 138632, Singapore

Savitha Ramasamy

National Institute of Standards and Technology, MS 8520, Gaithersburg, MD, 20899, USA

Brian L. DeCost & Aaron Gilad Kusne

You can also search for this author in PubMed   Google Scholar


F.O. and Z.R. contributed equally for this work. F.O., Z.R., S.S., and T.B. conceived of the research. C.S., F.O., and S.S. prepared the simulated XRD data. Z.R., F.O., C.S., B.L.D.C., and A.G.K. developed data augmentation. S.S., N.T.P.H., and C.S. collected the XRD data. F.O., Z.R., and G.R. developed and tested ML algorithms, with key intellectual contributions from R.S., Z.L., B.L.D.C., and A.G.K. F.O. and Z.R. implemented and tuned the final a-CNN architecture and adapted class activation maps to the context of the problem. F.O., Z.R., S.I.P.T., and T.B. wrote the manuscript, with input from all co-authors.

Corresponding authors

Correspondence to Felipe Oviedo , Zekun Ren or Tonio Buonassisi .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit .

Reprints and permissions

About this article

Cite this article.

Oviedo, F., Ren, Z., Sun, S. et al. Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks. npj Comput Mater 5 , 60 (2019).

Download citation

Received : 01 December 2018

Accepted : 25 April 2019

Published : 17 May 2019


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Adaptively driven x-ray diffraction guided by machine learning for autonomous phase identification.

  • Nathan J. Szymanski
  • Christopher J. Bartel
  • Gerbrand Ceder

npj Computational Materials (2023)

Knowledge-integrated machine learning for materials: lessons from gameplaying and robotics

  • Kedar Hippalgaonkar
  • Qianxiao Li
  • Tonio Buonassisi

Nature Reviews Materials (2023)

Gas phase multicomponent detection and analysis combining broadband dual-frequency comb absorption spectroscopy and deep learning

Communications Engineering (2023)

Automated classification of big X-ray diffraction data using deep learning models

  • Jerardo E. Salgado
  • Samuel Lerman
  • Niaz Abdolrahim

Creation of crystal structure reproducing X-ray diffraction pattern without using database

  • Seiji Kajita

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research paper on xrd

X-ray diffraction: instrumentation and applications


  • 1 a SCIENT by CROMATEC_PLUS SRL, Research Center for Instrumental Analysis , Bucharest , Romania.
  • PMID: 25831472
  • DOI: 10.1080/10408347.2014.949616

X-ray diffraction (XRD) is a powerful nondestructive technique for characterizing crystalline materials. It provides information on structures, phases, preferred crystal orientations (texture), and other structural parameters, such as average grain size, crystallinity, strain, and crystal defects. X-ray diffraction peaks are produced by constructive interference of a monochromatic beam of X-rays scattered at specific angles from each set of lattice planes in a sample. The peak intensities are determined by the distribution of atoms within the lattice. Consequently, the X-ray diffraction pattern is the fingerprint of periodic atomic arrangements in a given material. This review summarizes the scientific trends associated with the rapid development of the technique of X-ray diffraction over the past five years pertaining to the fields of pharmaceuticals, forensic science, geological applications, microelectronics, and glass manufacturing, as well as in corrosion analysis.

Keywords: X-ray diffraction; applications; theory.

Publication types

  • X-Ray Diffraction / instrumentation*
  • X-Ray Diffraction / methods*

Improved cellulose X-ray diffraction analysis using Fourier series modeling

  • Original Research
  • Published: 24 May 2020
  • Volume 27 , pages 5563–5579, ( 2020 )
  • Wenqing Yao 1   nAff2 ,
  • Yuanyuan Weng 3 &
  • Jeffery M. Catchmark 3  

6249 Accesses

119 Citations

Explore all metrics

This paper addresses two fundamental issues in the peak deconvolution method of cellulose XRD data analysis: there is no standard model for amorphous cellulose and common peak functions such as Gauss, Lorentz and Voigt functions do not fit the amorphous profile well. It first examines the effects of ball milling on three types of cellulose and results show that ball milling transforms all samples into a highly amorphous phase exhibiting nearly identical powder X-ray diffraction (XRD) profiles. It is hypothesized that short range order within a glucose unit and between adjacent units survives ball milling and generates the characteristic amorphous XRD profiles. This agrees well with cellulose I d-spacing measurements and oligosaccharide XRD analysis. The amorphous XRD profile is modeled using a Fourier series equation where the coefficients are determined using the nonlinear least squares method. A new peak deconvolution method then is proposed to analyze cellulose XRD data with the amorphous Fourier model function in conjunction with standard Voigt functions representing the crystalline peaks. The impact of background subtraction method has also been assessed. Analysis of several cellulose samples was then performed and compared to the conventional peak deconvolution methods with common peak fitting functions and background subtraction approach. Results suggest that prior peak deconvolution methods overestimate cellulose crystallinity.

Graphic abstract

research paper on xrd

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

research paper on xrd

Ahvenainen P, Kontro I, Svedström K (2016) Comparison of sample crystallinity determination methods by X-ray diffraction for challenging cellulose I materials. Cellulose 23:1–14

Article   Google Scholar  

Astley OM, Chanliaud E, Donald AM, Gidley MJ (2001) Structure of acetobacter cellulose composites in the hydrated state. Int J Biol Macromol 29:193–202.

Article   CAS   PubMed   Google Scholar  

Azubuike CP, Rodríguez H, Okhamafe AO, Rogers RD (2012) Physicochemical properties of maize cob cellulose powders reconstituted from ionic liquid solution. Cellulose 19:425–433.

Article   CAS   Google Scholar  

Bansal P (2011) Computational and experimental investigation of the enzymatic hydrolysis of cellulose. Georgia Institue of Technology, Atlanta

Google Scholar  

Bansal P, Hall M, Realff MJ et al (2010) Multivariate statistical analysis of X-ray data from cellulose: a new method to determine degree of crystallinity and predict hydrolysis rates. Bioresour Technol 101:4461–4471.

Barnette AL, Lee C, Bradley LC et al (2012) Quantification of crystalline cellulose in lignocellulosic biomass using sum frequency generation (SFG) vibration spectroscopy and comparison with other analytical methods. Carbohydr Polym 89:802–809.

Bates S, Zografi G, Engers D et al (2006) Analysis of amorphous and nanocrystalline solids from their X-Ray diffraction patterns. Pharm Res 23:2333–2349.

Buerger M, Klein GE (1945) Correction of X-ray diffraction Intensities for Lorentz and Polarization Factors. J Appl Phys 16:408–418

Ciolacu D, Ciolacu F, Popa VI (2011) Amorphous cellulose—structure and characterization. Cellul Chem Technol 45:13

CAS   Google Scholar  

Cullity BD (1978) Elements of x-ray diffraction. Addison-Wesley Pub. Co., Reading

Delhez R, De Keijser TH, Mittemeijer EJ (1982) Determination of crystallite size and lattice distortions through X-ray diffraction line profile analysis. Fresenius Z Für Anal Chem 312:1–16

Delhez R, de Keijser TH, Langford JI, Louër D, Mittemeijer EJ, Sonneveld, EJ (1993) Crystal imperfection broadening and peak shape in the Rietveld method. In: Young RA (ed) The Rietveld Method. Oxford University Press, p 132

Dinnebier RE (2008) Powder diffraction: theory and practice. Royal Society of Chemistry, Cambridge

Book   Google Scholar  

Dumitriu S (2004) Polysaccharides: structural diversity and functional versatility, 2nd edn. CRC Press, Boca Raton

Fang L, Catchmark JM (2014) Structure characterization of native cellulose during dehydration and rehydration. Cellulose 21:3951–3963.

Fawcett TG, Crowder CE, Kabekkodu SN et al (2013) Reference materials for the study of polymorphism and crystallinity in cellulosics. Powder Diffr 28:18–31

Fink H-P, Philipp B, Paul D et al (1987) The structure of amorphous cellulose as revealed by wide-angle X-ray scattering. Polymer 28:1265–1270.

Fourier J (1822) Theorie analytique de la chaleur, par M. Fourier. chez Firmin Didot, pere et fils

French AD (2014) Idealized powder diffraction patterns for cellulose polymorphs. Cellulose 21:885–896.

French AD, Santiago Cintrón M (2013) Cellulose polymorphy, crystallite size, and the Segal Crystallinity Index. Cellulose 20:583–588.

Garvey CJ, Parker IH, Simon GP (2005) On the interpretation of X-Ray diffraction powder patterns in terms of the nanostructure of cellulose I fibres. Macromol Chem Phys 206:1568–1575.

Guo J, Catchmark JM (2012) Surface area and porosity of acid hydrolyzed cellulose nanowhiskers and cellulose produced by Gluconacetobacter xylinus. Carbohydr Polym 87:1026–1037.

He J, Cui S, Wang S (2008) Preparation and crystalline analysis of high-grade bamboo dissolving pulp for cellulose acetate. J Appl Polym Sci 107:1029–1038.

Hermans PH, Weidinger A (1946) On the diffusely diffracted radiation in the X-ray diagrams of cellulose fibres. Recl Trav Chim Pays-Bas 65:620–623.

Hult E-L, Iversen T, Sugiyama J (2003) Characterization of the supermolecular structure of cellulose in wood pulp fibres. Cellulose 10:103–110.

Ioelovich M, Leykin A, Figovsky O (2010) Study of cellulose paracrystallinity. BioResources 5:1393–1407.

Jmol (2016) An open-source Java viewer for chemical structures in 3D. . Accessed 25 June 2016

Ju X, Bowden M, Brown EE, Zhang X (2015) An improved X-ray diffraction method for cellulose crystallinity measurement. Carbohydr Polym 123:476–481.

Klug HP, Alexander LE (1974) X-Ray Diffraction procedures: for polycrystalline and amorphous materials, 2nd edn. Wiley, New York

Kroon-Batenburg LM, Kruiskamp PH, Vliegenthart JF, Kroon J (1997) Estimation of the persistence length of polymers by MD simulations on small fragments in solution. Application to cellulose. J Phys Chem B 101:8454–8459

Langford JI, Wilson AJC (1978) Scherrer after sixty years: a survey and some new results in the determination of crystallite size. J Appl Crystallogr 11:102–113

Lanson B (1997) Decomposition of experimental X-ray diffraction patterns (profile fitting); a convenient way to study clay minerals. Clays Clay Miner 45:132–146

Ling Z, Wang T, Makarem M, Santiago Cintrón M, Cheng HN, Kang X, Bacher M, Potthast A, Rosenau T, King H, Delhom CD, Nam S, Edwards JV, Kim SH, Xu F, French AD (2019) Effects of ball milling on the structure of cotton cellulose. Cellulose 26:305–328

Luca Lutterotti MAUD (2020) Material analysis using diffraction. . Accessed 14 Feb 2020

Madsen IC, Scarlett NVY, Kern A (2011) Description and survey of methodologies for the determination of amorphous content via X-ray powder diffraction. Z Für Krist Cryst Mater 226:944–955

McCusker LB, Von Dreele RB, Cox DE et al (1999) Rietveld refinement guidelines. J Appl Crystallogr 32:36–50

Mittemeijer EJ, Welzel U (2008) The “state of the art” of the diffraction analysis of crystallite size and lattice strain. Z Für Krist 223:552–560

Nada Stubičar IŠ (1998) An X-Ray Diffraction study of the crystalline to amorphous phase change in cellulose during high-energy dry ball milling. Holzforschung 52:455–458.

Nelson ML, O’Connor RT (1964) Relation of certain infrared bands to cellulose crystallinity and crystal latticed type. Part I. Spectra of lattice types I, II, III and of amorphous cellulose. J Appl Polym Sci 8:1311–1324.

Park S, Baker JO, Himmel ME et al (2010) Cellulose crystallinity index: measurement techniques and their impact on interpreting cellulase performance. Biotechnol Biofuels 3:10.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Park S, Johnson DK, Ishizawa CI et al (2009) Measuring the crystallinity index of cellulose by solid state 13C nuclear magnetic resonance. Cellulose 16:641–647.

Pecharsky V, Zavalij P (2008) Fundamentals of powder diffraction and structural characterization of materials, 2nd edn. Springer, Berlin

Pires L, de Figueiredo F, Ferreira F (2014) The Rietveld method as a tool to quantify the amorphous amount of microcrystalline cellulose. J Pharm Sci 103:1394.

Rodgers JL, Nicewander WA (1988) Thirteen ways to look at the correlation coefficient. Am Stat 42:59–66.

Schwanninger M, Rodrigues JC, Pereira H, Hinterstoisser B (2004) Effects of short-time vibratory ball milling on the shape of FT-IR spectra of wood and cellulose. Vib Spectrosc 36:23–40.

Segal L, Creely JJ, Martin AE, Conrad CM (1959) An empirical method for estimating the degree of crystallinity of native cellulose using the X-Ray diffractometer. Text Res J 29:786–794.

Stankovic L, Dakovic M, Thayaparan T (2013) Time-frequency signal analysis with applications. Artech House Publishers, Norwood, MA

Suortti P (1993) Bragg reflection profile shape in X-ray powder diffraction patterns. In: Young RA (ed) The Rietveld method. Cambridge University Press, Cambridge, pp 167–185

Swenson HA, Schmitt CA, Thompson NS (1965) Comparison of the configuration of β-1,4 linked hexosans and wood xylan pentosans by means of the eizner-ptitsyn viscosity equations. J Polym Sci Part C Polym Symp 11:243–252.

Taylor LS, Zografi G (1998) The quantitative analysis of crystallinity using FT-Raman spectroscopy. Pharm Res 15:755–761.

Teeäär R, Serimaa R, Paakkarl T (1987) Crystallinity of cellulose, as determined by CP/MAS NMR and XRD methods. Polym Bull 17:231–237.

Wada M, Okano T, Sugiyama J (1997) Synchrotron-radiated X-ray and neutron diffraction study of native. Cellulose 4:221–232.

Ward K (1950) Crystallinity of cellulose and its significance for the fiber properties. Text Res J 20:363–372.

Warren BE (1990) X-Ray diffraction, Reprint edn. Dover Publications, New York

Watanabe K, Tabuchi M, Morinaga Y, Yoshinaga F (1998) Structural features and properties of bacterial cellulose produced in agitated culture. Cellulose 5:187–200.

Young RA (ed) (1995) The Rietveld Method. In: OUP/International Union of Crystallography|International Union of Crystallography Monographs on Crystallography, pp. 5

Zhang Y, Inouye H, Crowley M et al (2016) Diffraction pattern simulation of cellulose fibrils using distributed and quantized pair distances. J Appl Crystallogr 49:2244–2248.

Zhang S, Winter WT, Stipanovic AJ (2005) Water-activated cellulose-based electrorheological fluids. Cellulose 12:135–144.

Download references


The research work in this paper is supported by the Center for Lignocellulose Structure and Formation, an Energy Frontier Research Center funded by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Award Number DE-SC0001090, and by the National Institute of Food and Agriculture, U.S. Department of Agriculture, Pennsylvania Agricultural Experiment Station 4602 under Accession Number 1009850. We thank two colleagues at Penn State University, Liza Wilson from Department of Biology for the training of ball milling and Nichole Wonderling from Materials Research Institute for the training and discussion on XRD analysis.

Author information

Wenqing Yao

Present address: Instrumentation and Control Division, Beijing Institute of Nuclear Engineering, 117 West 3rd Ring Road North, Haidian District, Beijing, China

Authors and Affiliations

Department of Mechanical and Nuclear Engineering, The Pennsylvania State University, 226 Agricultural Engineering, University Park, PA, 16802, USA

Department of Agricultural and Biological Engineering, The Pennsylvania State University, 226 Agricultural Engineering, University Park, PA, 16802, USA

Yuanyuan Weng & Jeffery M. Catchmark

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Jeffery M. Catchmark .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (XLSX 150 kb)

Rights and permissions.

Reprints and permissions

About this article

Cite this article.

Yao, W., Weng, Y. & Catchmark, J.M. Improved cellulose X-ray diffraction analysis using Fourier series modeling. Cellulose 27 , 5563–5579 (2020).

Download citation

Published : 24 May 2020

Issue Date : July 2020


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Amorphous cellulose
  • Ball milling
  • Crystallinity index
  • Crystal size
  • Fourier series
  • Peak deconvolution
  • X-ray diffraction
  • Find a journal
  • Publish with us
  • Track your research

Journals Logo

1. Introduction

2. analysis of simulated data from the icsd and csd, 3. perspectives on automated analysis, 4. conclusions, 5. code and data availablility and supporting information, supporting information.

research paper on xrd

research papers \(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Open Access

Automated prediction of lattice parameters from X-ray powder diffraction patterns

a Materials Science and Engineering, Stanford University, Stanford, CA94305, USA, and b SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA * Correspondence e-mail: [email protected] , [email protected]

A key step in the analysis of powder X-ray diffraction (PXRD) data is the accurate determination of unit-cell lattice parameters. This step often requires significant human intervention and is a bottleneck that hinders efforts towards automated analysis. This work develops a series of one-dimensional convolutional neural networks (1D-CNNs) trained to provide lattice parameter estimates for each crystal system. A mean absolute percentage error of approximately 10% is achieved for each crystal system, which corresponds to a 100- to 1000-fold reduction in lattice parameter search space volume. The models learn from nearly one million crystal structures contained within the Inorganic Crystal Structure Database and the Cambridge Structural Database and, due to the nature of these two complimentary databases, the models generalize well across chemistries. A key component of this work is a systematic analysis of the effect of different realistic experimental non-idealities on model performance. It is found that the addition of impurity phases, baseline noise and peak broadening present the greatest challenges to learning, while zero-offset error and random intensity modulations have little effect. However, appropriate data modification schemes can be used to bolster model performance and yield reasonable predictions, even for data which simulate realistic experimental non-idealities. In order to obtain accurate results, a new approach is introduced which uses the initial machine learning estimates with existing iterative whole-pattern refinement schemes to tackle automated unit-cell solution.

Keywords: analysis automation ; machine learning ; powder diffraction ; indexing .

In this work, we focus on the problem of automatic analysis of powder X-ray diffraction (PXRD) data using a combination of machine learning (ML) and classical pattern-fitting approaches. A PXRD pattern is the result of three-dimensional atomic structure information condensed to a single dimension. The observed peaks are determined by the unit-cell structure (lattice parameters), symmetry constraints (space group) and atomic positions within the unit cell. The goal of a PXRD experiment, then, is the determination of these parameters. This experiment is particularly well suited to ML approaches, as the data can be readily simulated from known parameters, large databases of previously solved structures are available and patterns can be simulated on the basis of purely hypothetical materials.

(i) Stability to noise. ML methods learn patterns that are characteristic of particular lattice parameters. This opens the possibility of leveraging prior knowledge of crystal systems to condition predictions on noisy, overlapped and multiple-phase data.

(ii) Speed. The prediction process can be run in real time during experimental data acquisition, as inference with pre-trained models is faster than the time required to acquire data.

2.1. ICSD and CSD combined data

2.1.1. training and testing on modified data.

In both cases, random intensity modulation and zero shifting clearly have little effect on the model prediction. Here, the performance, as quantified by MAPE, is similar to the prediction on perfectly clean data. The results for the intensity modulation experiment indicate that the models are correctly learning the physics that lattice parameter prediction should be dependent on peak location and not intensity.

We note that the method for modifying the intensities does not entirely capture the process of preferred orientation effects in PXRD patterns. It is possible that a more physically realistic model for intensity modulation, which is dependent on Miller indices, might be detrimental to the ML performance. However, we can be confident that, at least for small modulations in intensity, the ML performance should be relatively stable. Although outside the scope of the present work, to study the effect of strong orientation effects it will be important to incorporate a more realistic physical model.

Overall, the results of this analysis of possible experimental modifications show that it is essential to incorporate appropriate modified data into ML training sets. Furthermore, proper modification strategies can substantially recover lost predictive power.

2.2. Percentage within bound metric

3.1. volumes of parameter search space, 3.2. whole-pattern fitting using ml initial guess, 3.2.1. example 1: case study of a high-symmetry system.

In this example, while ML+ Lp-Search yields a reduction in search space and is faster than default Lp-Search , the corresponding volume reduction and speedup are modest. Furthermore, in both cases, the minimizations converge to the true answer in every minimization. This is an unsurprising result as the cubic system is generally easy to index since the search space is one dimensional.

3.2.2. Example 2: case study of a low-symmetry system

In this case, ML+ Lp-Search is much faster than default Lp-Search and converges every time. As expected, the tighter the ML bound, the faster the ML+ Lp-Search method reaches convergence. This speedup can be directly attributed to the large reduction in search space using our initial ML estimates. Note that the ML was useful for this problem even in the absence of predictions for α , β and γ . We expect that future work on accurate prediction of lattice angle parameters will further significantly accelerate the method.

3.2.3. Example 3: case study of a dominant zone system

3.3. quantifying necessary bounds for lp-search.

Our results indicate that ML predictions within 1–5% of the true lattice parameters are likely to converge automatically in a short number of Lp-Search iterations. The intuition for choosing a small number of Lp-Search iterations was to formulate the problem as an ML-guided local optimization problem which can be run on a single local CPU. However, it is certainly possible to use a greater number of minimizations. For a larger number of minimizations, we expect that the probability of convergence will increase monotonically for every bound threshold. Such an approach would enable more successful attempts for lower thresholds. Nevertheless, we hope that this analysis will be helpful in setting a concrete target for improvements on the approach presented here. For fast high-throughput experiments, we expect fully automated methods such as ML+ Lp-Search to be quite valuable for indexing data with a single dominant phase.

3.4. Application to synchrotron data

The ability to refine a unit cell without human supervision will help drive future work in the optimization of materials. In this work, we help build the framework necessary to realize this goal. By training deep convolutional neural networks on nearly a million unique PXRD patterns drawn from across the chemical spectrum, we are able to provide estimates of unit-cell lattice parameters for each crystal system. In doing so, we analyze key experimental non-idealities that might affect ML predictions and conclude that the presence of multiple phases, baseline noise and peak broadening are particularly damaging. Incorporating these experimental conditions into the training is absolutely necessary and, in many cases, can improve model prediction and stability.

Finally, we have demonstrated that the initial parameter estimations can lead to a substantial reduction in search space volume around the true lattice parameters. We believe that these results, by themselves, are useful to the powder diffraction community due to their ease of integration with conventional indexing techniques such as singular value decomposition and the dichotomy method. Furthermore, we demonstrate that these estimates can be directly passed to whole-pattern refinement schemes in order to solve the unit cell for cases automatically. Future work on the angle prediction problem will probably increase the success rate for lower-symmetry materials. The significant reduction in search space volume enabled by lattice parameter prediction brings a corresponding acceleration of such whole-pattern refinement schemes. Full solutions are achievable on timescales suitable for feedback into the experimental process. Accelerated and fully automated analysis pipelines are a prerequisite for Bayesian optimization or reinforcement learning approaches, which will allow for the exploration of high-dimensional and complex materials parameter spaces becoming common for operando or in situ experimentation.

Models for all modification experiments, the SSRL Beamline 2-1 data set and scripts to generate Lp-Search input files can be accessed at . Please contact the ICSD and CSD for access to the simulated structures described in this paper.

Additional analysis, methods and data. DOI:


SRC gratefully acknowledges very helpful discussions with Alan Coelho regarding the Lp-Search algorithm.

Funding information

Use of the Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, is supported by the US Department of Energy, Office of Science, Office of Basic Energy Sciences under contract No. DE-AC02-76SF00515.

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence , which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

  • Deans' InSPIRe Seed Funding Program
  • Instrumentation List
  • Facilities Locations
  • Higher Education Rates
  • Government and Commercial
  • Submit Publications
  • Become a User
  • Become a Client
  • Reserve an Instrument
  • BNRF Instruments
  • BNRF Location
  • Mass Spectroscopy
  • CRF Instruments
  • CRF Location
  • EMF Instruments
  • EMF Location
  • MCF Instruments
  • MCF Location
  • Other WVU Cores
  • Different SEM manufacturers to visit WVU
  • New Keyence VHX-7000N digital microscope in the SRF cleanroom
  • X-ray Photoelectron Spectroscopy (XPS) 101
  • Keyence: New Elemental Analysis for Material Characterization

X-ray Diffraction (XRD) 101

  • Where to Start
  • Proposal Text for PIs (DOCX)
  • Core Info sheets
  • Aamer Mahmood
  • Marcela Redigolo
  • Trieu Nguyen

Thursday, December 21, 2023

Wednesday, FEB. 07, 2024, 10 AM - 12 PM (EST)

Dr. Qiang Wang


  1. How To Analyse XRD Data / Plot / Graph in Research Paper? Experimental Paper Skills

    research paper on xrd

  2. (PDF) XRD for science and technology

    research paper on xrd

  3. XRD peak analysis of: (a,b) TEOS and (c,d) TMMS dip coated paper

    research paper on xrd

  4. X-ray diffraction (XRD) of chromatography paper Whatman no. 1 and

    research paper on xrd

  5. X-ray diffraction (XRD) analysis of A)

    research paper on xrd

  6. XRD patterns of a paper matrices immobilized with α-Fe2O3 NPs and b

    research paper on xrd


  1. ZOMRA(A.L.A ft EL CASTRO ) Kids / لمحة REACTION 🔥🔥🔥

  2. Cotton vs Paper Air Filters

  3. Devra Dhodi Chatna Ba Arkestra

  4. PSA Complete Solved Past Paper 2023

  5. another great song

  6. Crack the LSAT: All You Need to Know About the Law Exam in 2023


  1. (PDF) X-Ray Diffraction: Instrumentation and Applications

    X-ray diffraction (XRD) is a powerful nondestructive technique for characterizing crystalline materials. It provides information on structures, phases, preferred crystal orientations (texture),...

  2. X-Ray Diffraction

    XRD is the principal technique used to investigate crystal structures in composite materials in a nondestructive way: as known, different crystal phases can lead to different material properties. From:Nanofiber Composites for Biomedical Applications, 2017 Related terms: Nuclear Magnetic Resonance Nanoparticle Scanning Electron Microscopy

  3. (PDF) XRD for science and technology

    Feb 1977 J MOL STRUCT W.E.S. View Current applications of X-ray powder diffraction in the pharmaceutical industry Article Sep 2012 L.J. Chyall View X-ray diffraction studies and...

  4. X-Ray Diffraction: Instrumentation and Applications

    X-ray diffraction (XRD) is a powerful nondestructive technique for characterizing crystalline materials. It provides information on structures, phases, preferred crystal orientations (texture), and other structural parameters, such as average grain size, crystallinity, strain, and crystal defects.

  5. X-ray diffraction

    X-ray diffraction articles from across Nature Portfolio. X-ray crystallography is a technique used to reveal how the atoms of crystalline solids are arranged, relying upon the diffraction of X-ray ...

  6. (PDF) XRD and Materials Analysis

    ... The tiniest FA particles, which are one to ten microns in size and correspond to a tenth of the largest FA particle, are available. The mineralogy of materials was investigated by the authors...

  7. Fast and interpretable classification of small X-ray diffraction

    High-throughput material synthesis and rapid characterization are necessary ingredients for inverse design and accelerated material discovery. 1, 2 X-ray diffraction (XRD) is a workhorse...

  8. X-ray diffraction: instrumentation and applications

    X-ray diffraction (XRD) is a powerful nondestructive technique for characterizing crystalline materials. It provides information on structures, phases, preferred crystal orientations (texture), and other structural parameters, such as average grain size, crystallinity, strain, and crystal defects. X-ray diffraction peaks are produced by ...

  9. Structural characterization of polycrystalline thin films by X-ray

    X-ray diffraction (XRD) techniques are powerful, non-destructive characterization tool with minimal sample preparation. XRD provides the first information about the materials phases, crystalline structure, average crystallite size, micro and macro strain, orientation parameter, texture coefficient, degree of crystallinity, crystal defects etc. XRD analysis provides information about the bulk ...

  10. X-Ray Diffraction (XRD)

    A comparison of the brightness of modern sources of X-rays where (a) shows the spectral brightness of some third-generation synchrotron sources and laboratory-used X-ray tube sources.The vertical dashes denote the brightness of the characteristic radiation lines of standard rotating-anode tubes (Cu Kα and Mo Kα) of various powers; the squares indicate the brightness of the characteristic ...

  11. A review of basic crystallography and x-ray diffraction applications

    1 Introduction Atoms in materials are arranged into crystal structures and microstructures [ 1 ]. Periodic arrangement of atoms depends strongly on external factors such as temperature, pressure, and cooling rate during solidification.

  12. X Ray Diffraction Analysis

    X-Ray diffraction analysis (XRD) is a nondestructive technique that provides detailed information about the crystallographic structure, chemical composition, and physical properties of a material [48]. It is based on the constructive interference of monochromatic X-rays and a crystalline sample.

  13. Tutorial on Powder X‐ray Diffraction for Characterizing Nanoscale Materials

    P owder X-ray diffraction (XRD) is a common character- ization technique for nanoscale materials. Analysis of a sample by powder XRD provides important information that is complementary to various microscopic and spectro-scopic methods, such as phase identi cation, sample purity, fi crystallite size, and, in some cases, morphology.

  14. Applied Sciences

    X-ray diffraction (XRD) is a proven, powerful technique for determining the phase composition, structure, and microstructural features of crystalline materials. The use of machine learning (ML) techniques applied to crystalline materials research has increased significantly over the last decade.

  15. Machine Learning Automated Analysis of Enormous Synchrotron X-ray

    X-ray diffraction (XRD) data analysis can be a time-consuming and laborious task. Deep neural network (DNN) based models trained with synthetic XRD patterns have been proven to be a highly efficient, accurate, and automated method for analyzing common XRD data collected from solid samples in ambient environments. However, it remains unclear whether synthetic XRD-based models can be effective ...

  16. Tutorial on Powder X-ray Diffraction for Characterizing Nanoscale

    Powder X-ray diffraction (XRD) is a common characterization technique for nanoscale materials. Analysis of a sample by powder XRD provides important information that is complementary to various microscopic and spectroscopic methods, such as phase identification, sample purity, crystallite size, and, in some cases, morphology.

  17. DeepXRD, a Deep Learning Model for Predicting XRD spectrum from

    The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at with additional details about the score and the social media presence for the given article. ... Herein we propose a deep learning algorithm for predicting the XRD ...

  18. Improved cellulose X-ray diffraction analysis using Fourier series

    Abstract This paper addresses two fundamental issues in the peak deconvolution method of cellulose XRD data analysis: there is no standard model for amorphous cellulose and common peak functions such as Gauss, Lorentz and Voigt functions do not fit the amorphous profile well. It first examines the effects of ball milling on three types of cellulose and results show that ball milling transforms ...

  19. Minerals

    X-ray diffraction (XRD) is an important and widely used material characterization technique. With the recent development in material science technology and understanding, various new materials are being developed, which requires upgrading the existing analytical techniques such that emerging intricate problems can be solved.

  20. The Characterization of Papers by X-Ray Diffraction (XRD): Measurement

    The Characterization of Papers by X-Ray Diffraction (XRD): Measurement of Cellulose Crystallinity and Determination of Mineral Composition - ScienceDirect Article preview Abstract References (39) Cited by (53) Journal of the Forensic Science Society Volume 23, Issue 4, October 1983, Pages 313-321 Original Papers

  21. The Characterization of Papers by X-Ray Diffraction (XRD): Measurement

    J.F.S.S. ORIGINAL PAPERS The Characterization of Papers by X-Ray Diffraction (XRD): Measurement of Cellulose Crystallinity and Determination of Mineral Composition H. A. FONER Geological Survey of Israel, 30 Malkhe Tisrael St., Jerusalem 95 501, Israel and N. ADAN Criminal Identification Department, Israel Police Headquarters, Jerusalem, Israel ...

  22. Automated prediction of lattice parameters from X-ray powder

    A key step in the analysis of powder X-ray diffraction (PXRD) data is the accurate determination of unit-cell lattice parameters. This step often requires significant human intervention and is a bottleneck that hinders efforts towards automated analysis.

  23. PDF X-Ray Powder Diffraction Method Development and Validation for the

    X-Ray Powder Diffraction (XRD) is a technique used in forensic science to analyze various types of trace evidence, such as glass, paint and drugs. XRD has been shown to be a ... This paper will describe the method development for the use of this instrument to distinguish a counterfeit from an authentic product.

  24. Shared Research Facilities

    X-ray diffraction (XRD) is a versatile and powerful technique used for the characterization of crystalline materials, offering insights into their crystallographic structure, chemical composition, and physical properties at the atomic or molecular level. ... then at the Material Science Division at Argonne National Laboratory (ANL) until 2016 ...

  25. Influence of polymer on fouling and heat transfer on metal surfaces

    This paper investigated the characteristics of polymer-containing wastewater fouling and the heat flux reduction caused by fouling. Then, the mechanism of HPAM affecting the fouling deposition rate was discussed by XRD and conductivity analysis. ... steel. This paper provides new insights into the study of anti-fouling research in oilfield heat ...