Re-Alignment of Sutherland data sets

Re-Alignment of Sutherland Data Sets
A Re-Alignment test was performed on the aligned ligands from the Sutherland data set
collection.1 For that purpose, we applied random translations and rotations to the molecular
structures of the eight datasets. Subsequently, molecules were re-aligned with COSMOsim3D
each dataset, using the same superposition templates as used by Sutherland.
While the alignments reported by Sutherland were compiled in a supervised fashion, the
COSMOsim3D alignment was completely automated. To assess the quality of the alignment,
it was used to build a 3D-QSAR model with COSMOsar3D. The overall performance of 3DQSAR models based on the COSMOsim3D alignment is almost identical to the performance of
the models based on the supervised alignment. The overall standard deviation of test set
residuals increases by 0.02, while the other 3 indicators differ by 0.01 from those achieved on
the original alignment. Thus, there is no significant loss of performance if a fully automated
COSMOsim3D alignment is used instead of the supervised alignment.
(1) Sutherland, J. J.; O’Brien, L. A.; Weaver, D. F. A. J. Med. Chem. 2004, 47, 5541-5554.
Re-Alignment of Sutherland Data Sets
ACE
114 compounds
(76 training, 38 test)
AChE
111 compounds
(74 training, 37 test)
BZR
163 compounds
(98 training, 49 test,
16 inactive)
COX2
322 compounds
(188 training, 94 test,
40 inactive)
Original
Sutherland
alignment
COSMOsim3D
alignment
Original Sutherland alignment compared to the unsupervised COSMOsim3D alignment for DHFR, GPB,
THERM and THR datasets. Alignment was done with grid size of 1 Å and 200 random starts. Molecules
were rendered with PyMOL.
Re-Alignment of Sutherland Data Sets
DHFR
397 compounds
(237 training, 124 test,
36 inactive)
GPB
66 compounds
(44 training, 22 test)
THERM
76 compounds
(51 training, 25 test)
THR
88 compounds
(59 training, 29 test)
Original
Sutherland
alignment
COSMOsim3D
alignment
Original Sutherland alignment compared to the unsupervised COSMOsim3D alignment for ACE, AChE, BZR
and COX2 datasets. Alignment was done with grid size of 1 Å and 200 random starts. Molecules were
rendered with PyMOL.