Sunday, December 3, 2017

TS_conf_search: Conformer search for transition states

Here's som prototype code for performing a conformer search for a transition state.  The idea is that you have found a TS for small model of the system and now you want to attach floppy ligands, generate different conformers for these ligands, and MMFF minimise them while keeping the core (model) frozen.  Each conformer is then used as an initial guess for a TS search, perhaps preceded by a constrained minimisation using the QM method.

Usage
To use the program type "python TS_conf_search ts.sdf small_model.sdf  5".

ts.sdf is an sdf file that contains a guess structure of the TS you want to do a conformational search for.  The coordinates of the small model-part of the structures does not have to match that of the small model and it doesn't even have to be very TS-like but it has reflect the bonding in the small model. Alternatively you can provide a SMILES string of the molecule.

small_model.sdf is an sdf file with with the optimized coordinates for the small model TS.  You can use any name for the file as long as the extension is sdf. TS_conf will only use the coordinates of the non-hydrogen atoms, unless the word "keepHs" is part of the file name.  If you use keepHs then the hydrogen atoms have to have corresponding  hydrogen atoms in the big model (see examples below)

5 is the number conformers you want to generate and you can use any integer you want here.  The default is 20.

What TS_conf_search does
TS_conf_search uses the ConstrainedEmbed function implemented in RDKit, which generates a conformer of a molecule, finds the atoms in the molecule corresponding to the small model, and then performs a FF minimization in which the Cartesian coordinates of these atoms are constrained to match the Cartesian coordinates of the small model.  If the small model contains two separate molecules then intramolecular energy terms are ignored.

sdf files
sdf files contain the Cartesian coordinates of the atoms but also explicitly defines the bond orders (single, double, etc) and any formal charges.  It is important the the bond orders in the ts.sdf and small_model.sdf files match. sdf files are basically the same as mol files, but if your program generates mol files be sure to change the file extension to sdf.  You can use OpenBabel to generate sdf files from nearly any other format (however, see next section).



Examples (and when to use keepHs)
The GitHub repository contains two Diels-Alder examples, where the small models are the PM3 optimized transition states for R = H, and the big model is R = CH2CH2CH2CH3.

I created the small model sdf files by converting the GAMESS output files to sdf using Open Babel.  However, the bond order didn't correspond to that shown in the picture above, so I manually edited the bond orders in the sdf file and remved the "RAD line" that OpenBabel created.

For Diels_Alder1 we type 'python TS_conf_search.py "Diels_Alder1;ClC=C.C=C(CCCC)C=C" Diels_Alder1_small.sdf 5' i.e. we build the TS guess from the SMILES string of the molecule.

For Diels_Alder2 we type "python TS_conf_search.py Diels_Alder2.sdf Diels_Alder2_small_keepHs.sdf 5". Here I got Diels_Alder2.sdf by loading the small model in to Avogadro, adding the CH2CH2CH2CH3 chain in some arbitrary conformation, and minimising the  structure.

In the second example, I use keepHs because the dienophile (ethene) only has two heavy atoms, which is not enough to define the orientation of the plane of the molecule relative to the diene. Because I use keepHs I have removed the hydrogen corresponding to the R group, so that all the hydrogens in the small model can be matched to hydrogens in the large model.

While Diels-Alder2 provides one example where keepHs is useful.  Another is where hydrogens atoms are an integral part of the mechanism, such as in hydrogen or proton transfer TSs.

Results


I used each of the geometries as a starting guess for a TS search using GAMESS where I calculated the Hessian to start with (hess=calc) and after every fifth step (ihrep=5).

In the first example none of the five runs found the correct TS, while for the latter example all 5 worked. When I re-ran the first example with python TS_conf_search.py "Diels_Alder1;ClC=C.C=C(CCCC)C=C" Diels_Alder1_small_keepHs.sdf 5 it worked fine.

Not using keepHs has the advantage that one doesn't have to make different small model sdf files when different hydrogens are replaced by substituents. But in this case an extra constrained minimisation step appears to be needed to get the hydrogen atoms placed correctly.



This work is licensed under a Creative Commons Attribution 3.0 Unported License.

No comments: