Export options

In the tab 'Options' of the window 'Export descriptors', several options for the output file/files can be set.

 

window_export_options

 

File options

If the user decides to generate several output files, one for each selected descriptor block or sub-block, the name of the output files will be automatically generated by the program on the basis of the format specified by the user. This format can include a text file name and different tags in any position within the file name. These tags will help the user to recognize the corresponding blocks and sub-blocks in the name of the output files.

 

Tags for file name format for blocks

%b: encodes the descriptor block ID;
%n: encodes the descriptor block name;

 

Tags for file name format for sub-blocks

%b: encodes the descriptor block ID;
%s: encodes the descriptor sub-block ID;
%n: encodes the descriptor block name;
%m: encodes the descriptor sub-block name.

 

Dragon default format is:

%b - %n.txt (for blocks)
%b-%s - %n - %m.txt (for sub-blocks)

 

Dragon default tags will generate output file names as, for instance, 3-2 Topological indices – Distance-based indices.txt, which collects only the selected molecular descriptors of the sub-block n. 2 Distance-based indices of the block n. 3 Topological indices.

 

Note that if no tags have been specified, Dragon automatically saves files according to its default name formats. Moreover, if the option 'One separate file for each block (sub-block)' has been selected and a name has been specified, which includes no tags for block and sub-block recognition, then just one file will be created containing the descriptors of the last selected block (sub-block).

 

The checkbox 'Save only data matrix' allows exporting only numerical results, that is, the numerical values of the molecular descriptors without molecule and descriptor labels. If this checkbox would be enabled, then one may also decide to export descriptor labels into a different text file by checking the checkbox 'Save labels on separate file'.

 

Exclusion rules

Non informative descriptors can be excluded from the output file. In order to exclude descriptors from the saving procedure, the user can select one of the following options:

Exclude descriptors with constant values: if checked, descriptors with all values equal will not be saved;
Exclude descriptors with constant and near-constant variables: if checked, descriptors with all values equal and descriptors with only one value different from the remaining ones will not be saved;
Exclude descriptors with standard deviation less than: if checked, descriptors with standard deviation less than a threshold (default 0.0001) will not be saved;
Exclude descriptors with at least one missing value: if checked, descriptors that were not calculated for at least one molecule will not be saved;
Exclude descriptors with all missing values: if checked, descriptors that were not calculated for all the molecules will not be saved;
Exclude descriptors with (abs) pair correlation larger than or equal to: if checked, it allows the exclusion of one of the two descriptors with a correlation coefficient equal to or larger than the selected threshold value. The threshold can be input by the user (default 0.95). For each pair of correlated descriptors, the one showing the largest pair correlation with all the other descriptors will be automatically selected for exclusion in a iterative way.
Apply exclusion options also to external variables: if checked, all the selected exclusion rules will be also applied to external variables. This option is enabled only if external variables have been previously loaded.

 

The list of excluded variables is automatically stored in the 'Log' tab of the 'Status window'.

 

Note that:

When saving molecular descriptors, one has to consider that constant and near-constant descriptors have no or little information and, thus, they cannot be useful for QSAR or similarity/diversity analysis. Since the pair correlation criterion aims to delete variables with redundant information, it may be useful when a variable reduction of the descriptor set is required. Note that the pair correlation criterion can be time demanding.

 

Missing Values

The code to be used for representing missing values in the output files must be defined in the menu 'General settings' under the tab 'General'. It can be accessed by clicking 'Settings' in the main menu bar.