Principal Component Analysis

Principal Component Analysis  (PCA) is performed by the Single Value Decomposition (SVD) algorithm. Note that constant variables are automatically excluded from the PCs calculation. The PC significance can be evaluated by different criteria, e.g. the average eigenvalue criterion. Namely, if the PC eigenvalue is greater than the average eigenvalue (1), it is considered significant.

 

Note that PCs calculation is allowed only if at least 5 molecules and 2 variables have been loaded. Moreover, the maximal number of variables for PCA is fixed at 1000.

 

The labels for PCs are automatically generated by the program and are of type PCxx, where xx represents the order of principal component significance. For example, PC01 means the first principal component of the selected variables.

 

The window for calculating Principal Component Analysis can be opened by clicking 'Principal Component Analysis' in the 'Analysis' menu or by clicking the 'PCA' icon in the Dragon main window. This window enables the user to perform Principal Component Analysis (PCA) on the calculated molecular descriptors.

selection of descriptors for calculating PCA
analysis of PCA results

 

Note that:

PCA is performed by the Single Value Decomposition (SVD) algorithm on the correlation matrix; this corresponds to autoscale data.
molecular descriptors with standard deviation lower than 0.0001 are automatically excluded from the PCA calculation;
the maximum number of retained principal components is 20;
PCA is calculated only if at least 5 molecules have been loaded and only for molecules containing Hydrogen atoms;
PCA can be calculated on a maximum of 1000 descriptors;
when calculating PCA on data with missing values, Dragon substitutes the missing values with the mean values of the corresponding descriptors.