Friday, March 22, 2013

Manuscript review: In Silico Screening of 393 Mutants Facilitates Enzyme Engineering of Amidase Activity in CalB

+Martin Hediger's paper was submitted to PLoS ONE February 20th and three reviews came back yesterday and can be found below.

General comment: Given limited computational resources one can either approximately evaluate many mutants or rigorously evaluate a few.  Many studies have already done the latter, so we choose the former.  However, Reviewer 1 and 2 want us to do both.

Here are my immediate reactions to specific issues:

Reviewer #1

1. "-) enzymatic efficiency is characterized by kcat/KM; authors only consider kcat"

From our previous paper: "... like in most computational studies of enzyme catalysis, substrate binding-affi nity is not considered"

2. "-) lots of arbitrary manual input is required, e.g. certain atoms are fixed upon observing large motions, certain barriers are discarded due to perceived shape of barrier, etc. These observations are clearly a sign that the procedure is not robust."

See point 5.

3. "-) authors do not show that calculations indeed identified transition states. A vibrational analysis should be performed, and it should be shown that there's only 1 negative eigenvalue."

Adiabatic mapping, a common tool in QM/MM studies of enzyme catalysis, only produces estimates of the TS structure.  The highest point on the reaction profile is not a stationary point so avibrational analysis is not valid.

4. "-) no error analysis is presented. Given that PM6 produces large errors in calculated barrier heights, the authors should ensure that the conclusions are not due to computational artifacts"

We compare to experiment, the gold standard in science.

5. "-) no longe range effects are considered, no dynamics are considered. QM/MM literature shows that these are essential for obtaining proper barriers."

From our previous paper: "In order to make the method computationally feasible, relatively approximate treatments of the wave function, structural model, dynamics and reaction path are used. Given this and the automated setup of calculations, some inaccurate results will be unavoidable. However, the intend of the method is similar to experimental high through-put screens of enzyme activity where, for example, negative results may result from issues unrelated to the intrinsic activity of the enzyme such as imperfections in the activity assay, low expression yield, protein aggregation, etc. Just like its experimental counterpart our technique is intended to identify potentially interesting mutants for further study."


6. "The introduction is incredibly short (1 paragraph), and clearly not sufficient to summarize current efforts in the field (notably high level QM and QM/MM approaches). The authors could also use this space to contrast their approach (published in [1]) with other methods."

I don't think there are other computational high throughput methods like ours.


Reviewer #2
1. "The choice of this initial set is not clear to me from the manuscript (e.g. why only P38H and H neutral? Why not P38F, why A282G and not A281G, although both are in a position within the alcohol binding pocket to interact with the aromatic amide moiety of the potential substrate?)"

2. "I therefore miss the evaluation of e.g. position T40, S47, N106, T138, V190, L277, A281 which are located in direct contact with the substrate (<5A from the inhibitor HEE in 1LBS). For a systematic in sillico screening study these residues need to be included."

3. "Most puzzling to me is the choice of the combinatorial set L, consisting of six residues (G38, T103, W104, A141, I189, L278) which is assumed to contribute strongest to increased activity. How can this assumption rationalized, to reproduce how the authors selected this set of mutations. Is the described computational method used to predict this positions and respective substitutions?"


4. "Also the selection criteria of the experimentally investigated benchmarking set S is not clearly described in the manuscript. Here more details are needed to follow the authors strategy and to be able to transfer the described strategy to other enzymes."

Points 1-4: In this study we have automated the construction of mutants, not the selection of mutants.  The selection of single mutants is still done heuristically as is the case for nearly all rational enzyme design and this step must still be done by experts for each new protein.  However, once the selection is done our method can be used to efficiently screen these mutants and construct hundreds of combination mutants.


The selection criteria are described on page 4: "The point mutations are selected based on di fferent design principles. These are either introduction of structural rearrangements in the active site to change the binding site properties of the active site (residues P38, G39, G41 T42 T103) [1], introduction of space to accomodate the substrate (W104, L278, A282, I285, V286), introduction of dipolar interactions between the enzyme and the substrate (A132, A141, I189) [33] or reduction of polarity in the active site (D223)."


5. "The accuracy of the presented in sillico screening method should be discussed more appropriately, given the fact that by plotting the data in Table 1 a correlation coeficciant (r2) of only 0.0015 between experimental kcat and activity is determined. Such a plot should be included in the paper, since it is crucial for the interpretation."

As we mention page 5 "Given the approximations introduced to make the method sufficiently efficient, it is noted that the intent of the method is not a quantitative ranking of the reaction barriers, but to identify promising mutants for, and to eliminate non-promising mutants from, experimental consideration. Therefore only qualitative changes in overall activity are considered"

6. "For the use of semiempirical methods in the prediction of protein ligand binding energies it was recently reported, that the proper inclusion of solvation and dispersion correction can indeed increase accuracy (e.g. ...). This issue should be considered in the discussion, since the data is rather qualitative and considering the inherent error of the used method the overall activity could not correctly predicted for Dataset S."

See response to point 5.

7. "The assigned barrier cutoff of 12.5 kcal/mol is not well rationalized to hold as a general criteria for a potentially improving mutant"

As we note on page 5: We note that de fining the cutoff is done purely for a post hoc comparison of experimental and computed data. When using the computed barriers to identify promising experimental mutants, one simply chooses the N mutants with the lowest barriers, where N is the number of mutants a ffordable to do experimentally (e.g. 20 in the discussion of set L)."

8. "Without at least some of them, the predictive power of the presented approach cannot be evaluated clearly, given the non-correlation of the data in Set S. The detailed analysis of Set L in terms of barrier heights is not insightful at the present state of the study, since the experimental data is missing and most barriers are in the grey zone between 10 and 15 kcal/mol which could be either active or non-active according to the results in Set S."

The analysis highlights the fact that single mutants with high barriers should not necessarily be excluded when making multiple-mutants. Since testing all possible combinations of single mutants is computationally intractable this is an important design consideration.

9. "Overall the manuscript would benefit from more discussions about the accuracy and the aim of the method compared with current state of the art methods to predict enzyme activity and conformational space of protein mutants."

See response to point 5.

10. "For the use as in sillico screening method a more complete calculation than only the presented limited set of single mutants is needed." 

See response to point 1-4.

11. "Additional experimental determination of the activities of at least some of the predicted mutants in Set L would enhance the impact of the paper."

Additional experiments is not practically possible at present and "impact" is not a review consideration for PLoS ONE.


Reviewer #3

1. "A minor point is that some of the references might be lacking information (e.g. refs 27 and 28 do not look complete as written)." 

Check that

2. "Also, the authors might want to comment a little more on how amidase activity is achieved. There is wide debate about amidase versus esterase activity, and, while this is not the focus here, comparison with enzymes such as fatty acid amide hydrolase, which has been the subject of much modelling work, could be useful." 

Have to think about that.


----
PONE-D-13-07851
In Silico Screening of 393 Mutants Facilitates Enzyme Engineering of Amidase Activity in CalB
PLOS ONE

Dear Dr. Jensen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit, but is not suitable for publication as it currently stands. Therefore, my decision is "Major Revision." 

Two of the three reviewers have raised strong concerns on the quality of the paper and in particular about the method adopted that should be carefully and extensively addressed before the manuscript can be considered for publication.

We encourage you to submit your revision within forty-five days of the date of this decision. 

When your files are ready, please submit your revision by logging on to http://pone.edmgr.com/ and following the Submissions Needing Revision link. Do not submit a revised manuscript as a new submission. Before uploading, you should proofread your manuscript very closely for mistakes and grammatical errors. Should your manuscript be accepted for publication, you may not have another chance to make corrections as we do not offer pre-publication proofs.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. 

Please also include a rebuttal letter that responds to each point brought up by the academic editor and reviewer(s). This letter should be uploaded as a Response to Reviewers file.

In addition, please provide a marked-up copy of the changes made from the previous article file as a Manuscript with Tracked Changes file. This can be done using 'track changes' in programs such as MS Word and/or highlighting any changes in the new document. 

If you choose not to submit a revision, please notify us. 

Yours sincerely, 

xxx
Academic Editor
PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. We note from the title page of your manuscript that one or more of the authors are employed by a commercial company (Novozymes A/S). 

Please respond in the cover letter to declare the affiliation(s) to this company in the Competing Interests section of the online manuscript form, along with any other relevant declarations relating to employment, consultancy, patents, products in development or marketed products etc. If true, you should also confirm in the competing interests section that this does not alter your adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in our guide for authors http://www.PLOSone.org/static/editorial.action#competing by including the following statement: "This does not alter our adherence to all the PLOS ONE policies on sharing data and materials." Please note that we cannot proceed with consideration of your article until this has been declared. 

We can make any changes on your behalf.

Please be assured that it is the standard PLOS ONE policy to ask authors to declare any potential competing interests, for the purposes of transparency. This declaration does not affect the review process. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://www.PLOSone.org/static/editorial.action#competing

2. We note you include figure legends for figures 1-5. However, you have labelled two of your figures as 'Figure 4'. Please update one of these to 'Figure 5', so that it matches your figure legends.

3. Could you please remove Supplementary Figures 1, 2 and 3 from your manuscript file. You have correctly uploaded them as separate files, with the file type of Supporting Information. The supplementary figure legend should remain in the main manuscript file.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Partly

Reviewer #3: Yes

Please explain (optional).

Reviewer #1: -) enzymatic efficiency is characterized by kcat/KM; authors only consider kcat
-) lots of arbitrary manual input is required, e.g. certain atoms are fixed upon observing large motions, certain barriers are discarded due to perceived shape of barrier, etc. These observations are clearly a sign that the procedure is not robust.
-) authors do not show that calculations indeed identified transition states. A vibrational analysis should be performed, and it should be shown that there's only 1 negative eigenvalue.
-) no error analysis is presented. Given that PM6 produces large errors in calculated barrier heights, the authors should ensure that the conclusions are not due to computational artifacts
-) no longe range effects are considered, no dynamics are considered. QM/MM literature shows that these are essential for obtaining proper barriers.

Reviewer #2: In the manuscript entitled In silico screening of 393 mutants facilitates enzyme engineering of amidase activity in CalB by Martin R. Hediger, Luca De Vico, Allan Svendsen, Werner Besenmatter and Jan H. Jensen the authors present a combined experimental and computational study to validate their previously reported semiempirical Computational Methodology to Screen Activities of Enzyme Variants (arXiv:1203.2950v3 [physics.chem-ph] 5 Oct 2012).
In the manuscript the authors present a fast and efficient computational strategy to generate a promising CalB-variant for enhanced amide hydrolysis, a promiscuous activity of CalB where the WT shows only low activity, with computational times less than 12hrs per mutant.
The main hypothesis of this and the presceeding paper, is that the enzymatic activity is represented by kcat under substrate saturating conditions and a minimized local cluster model within 8A of the catalytic Ser 105 in vacuum is representative for calculating the barrier at PM6//MOZYME level for amide hydrolysis. As the most critical part of such a cluster model is the boundary treatment towards the solvent, which was not represented as continuum model (COSMO) due to computational limits in Mopac2009, some residues (S50, P133, Q156, L277, P280) at the surface have to be fixed during the calculations to obtain stable local minima.
The choice of point mutations (given in table 2) to be investigated was restricted to 14 residues in potential Van der Waals contact or adjecent to this substrate binding residues, which is plausible given their main hypothesis. To limit the combinatorial complexity they investigated only a single exchange or a restricted set for residues W104, A141 and I189. The choice of this initial set is not clear to me from the manuscript (e.g. why only P38H and H neutral? Why not P38F, why A282G and not A281G, although both are in a position within the alcohol binding pocket to interact with the aromatic amide moiety of the potential substrate?) and with the presented methodology it should be possible to screen the first and second shell more thouroughly for interesting point mutants. If I list all residues within 7A of the inhibitor in the underlying X-ray structure 1lbs, at least 31 residues of potential interest are found. I therefore miss the evaluation of e.g. position T40, S47, N106, T138, V190, L277, A281 which are located in direct contact with the substrate (<5A from the inhibitor HEE in 1LBS). For a systematic in sillico screening study these residues need to be included.
Most puzzling to me is the choice of the combinatorial set L, consisting of six residues (G38, T103, W104, A141, I189, L278) which is assumed to contribute strongest to increased activity. How can this assumption rationalized, to reproduce how the authors selected this set of mutations. Is the described computational method used to predict this positions and respective substitutions? 
Having selected a focused mutation subset, the key finding is that within this set of 424 hypothetical variants with up to fourfold mutants the reaction barriers can be semi-automatically derived, performing the mutation using PyMOL Mutagenesis Wizard. By using this computational prescreening with their previously developed PM6//MOZYME method and identified 278 mutants showing regular barriers lower than 19 kcal/mol. From this set the 20 mutants with lowest barriers are selected for experimental verification and the results are shown in table 5. Up to now the strategy is very convincing, although the selection of the combinatorial set L is not conclusive and reproducible described. Also the selection criteria of the experimentally investigated benchmarking set S is not clearly described in the manuscript. Here more details are needed to follow the authors strategy and to be able to transfer the described strategy to other enzymes.
The key results for set S given in table 1 are interpreted rather optimistic. In the Set S only 3 mutants indeed show a calculated barrier comparable to the WT (I would not call 7.3 lower than WT (7.5), compared to the experimental barrier of about 19kcal/mol (arXiv:1203.2950v3 [physics.chem-ph] 5 Oct 2012) without given a conclusive proof that the computational precision of PM6//MOZYME is within chemical accuracy (<1kcal/mol) as it can be reached using state of the art full QM/MM treatment up to CCSDT-level of theory. (High-accuracy computation of reaction barriers in enzymes Claeyssens, Frederik; Harvey, Jeremy N.; Manby, Frederick R.; et al. ANGEWANDTE CHEMIE-INTERNATIONAL EDITION Volume: 45 Issue: 41 Pages: 6856-6859 DOI: 10.1002/anie.200602711 2006 ) The accuracy of the presented in sillico screening method should be discussed more appropriately, given the fact that by plotting the data in Table 1 a correlation coeficciant (r2) of only 0.0015 between experimental kcat and activity is determined. Such a plot should be included in the paper, since it is crucial for the interpretation. For the use of semiempirical methods in the prediction of protein ligand binding energies it was recently reported, that the proper inclusion of solvation and dispersion correction can indeed increase accuracy (e.g. Advanced Corrections of Hydrogen Bonding and Dispersion for Semiempirical Quantum Mechanical Methods Rezac, J ; Hobza, P JOURNAL OF CHEMICAL THEORY AND COMPUTATION Volume: 8 Issue: 1 Pages: 141-151 DOI: 10.1021/ct200751e 2012 and A Semiempirical Approach to Ligand-Binding Affinities: Dependence on the Hamiltonian and Corrections Mikulskis, P; Genheden, S ; Wichmann, K ; Ryde, U JOURNAL OF COMPUTATIONAL CHEMISTRY Volume: 33 Issue: 12 Pages: 1179-1189 DOI: 10.1002/jcc.22949 2012. This issue should be considered in the discussion, since the data is rather qualitative and considering the inherent error of the used method the overall activity could not correctly predicted for Dataset S. The assigned barrier cutoff of 12.5 kcal/mol is not well rationalized to hold as a general criteria for a potentially improving mutant and in Table 1 7 out of 10 mutants with lower activity than WT have a calculated barrier of < 12.9kcal/mol. 
On the other hand, predictive in sillico screening methods using QM calculations are a very promising tool to reduce the experimental screening effort in the future and the presented approach can be used in principle to identify variants showing up to one order of magnitude increased activity. According to the difficulty to predict enzyme activities in sillico (e.g. Evaluation and ranking of enzyme designs ; Gert Kiss, Daniela Röthlisberger, David Baker and KN Houk Protein Sci. 2010 September; 19(9): 1760-1773, ) , the presented results are very promising. Interestingly the investigated double mutants show cooperative effects and promising mutants cannot be predicted on single mutant data alone. Unfortunately no experimental activities are given for the 20 most promising candidates given in Table 5. Without at least some of them, the predictive power of the presented approach cannot be evaluated clearly, given the non-correlation of the data in Set S. The detailed analysis of Set L in terms of barrier heights is not insightful at the present state of the study, since the experimental data is missing and most barriers are in the grey zone between 10 and 15kcal/mol which could be either active or non-active according to the results in Set S. 
Overall the manuscript would benefit from more discussions about the accuracy and the aim of the method compared with current state of the art methods to predict enzyme activity and conformational space of protein mutants. For the use as in sillico screening method a more complete calculation than only the presented limited set of single mutants is needed. Additional experimental determination of the activities of at least some of the predicted mutants in Set L would enhance the impact of the paper.

Reviewer #3: This is an excellent paper, nicely demonstrating the utility of a practical computational modelling approach to the prediction of enzyme activity. The method uses semiempirical quantum chemical methods to model amidase reactivity in a lipase. This is thorough work of high quality. The paper is well written and the results are analysed and presented in appropriate detail. The results will be of wide interest. This is a demonstration of a method that will find real industrial application, as well as in other contexts. The paper is suitable for publication essentially as is. A minor point is that some of the references might be lacking information (e.g. refs 27 and 28 do not look complete as written). Also, the authors might want to comment a little more on how amidase activity is achieved. There is wide debate about amidase versus esterase activity, and, while this is not the focus here, comparison with enzymes such as fatty acid amide hydrolase, which has been the subject of much modelling work, could be useful. This could assist in future rational design (as well as in testing the quantum chemical methods by comparisons on related reactions.


2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: N/A

Reviewer #3: Yes


Please explain (optional).

Reviewer #1: -) there's no error analysis

Reviewer #2: (No Response)

Reviewer #3: (No Response)


3. Does the manuscript adhere to standards in this field for data availability?

Authors must follow field-specific standards for data deposition in publicly available resources and should include accession numbers in the manuscript when relevant. The manuscript should explain what steps have been taken to make data available, particularly in cases where the data cannot be publicly deposited.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes


Please explain (optional).

Reviewer #1: (No Response)

Reviewer #2: (No Response)

Reviewer #3: (No Response)


4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors below.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes


Please explain (optional).

Reviewer #1: (No Response)

Reviewer #2: (No Response)

Reviewer #3: (No Response)


5. Additional Comments to the Author (optional)

Please offer any additional comments here, including concerns about dual publication or research or publication ethics.

Reviewer #1: -) The introduction is incredibly short (1 paragraph), and clearly not sufficient to summarize current efforts in the field (notably high level QM and QM/MM approaches). The authors could also use this space to contrast their approach (published in [1]) with other methods.

Reviewer #2: Sorry, since there are no changes after resubmission, the same comments apply.

Reviewer #3: (No Response)


6. If you would like your identity to be revealed to the authors, please include your name here (optional).

Your name and review will not be published with the manuscript.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

Reviewer #3: (No Response)



[NOTE: If reviewer comments were submitted as an attachment file, they will be accessible only via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

Wednesday, March 20, 2013

New Manuscript: Interface of the polarizable continuum model of solvation with semi-empirical methods in the GAMESS program

Our recent endeavour to do 'science in the open' attracted one +Luca De Vico to provide some comments on the manuscript - some of his ideas were incorporated, so thanks for that Luca.

The paper is now submitted to PLoS ONE and you can find that version on arxiv.org. The code was submitted to (and accepted) into GAMESS so in the near future you can use that code too for your own projects. We sure will!

Thursday, March 14, 2013

Paper in progress: Interface of the polarizable continuum model of solvation with semi-empirical methods in the GAMESS program

Here's a paper by +Casper Steinmann+Anders Steen Christensen, etc we're getting ready to submit.  It's not quite done yet so any suggestions that you leave in the comments actually have a chance to be incorporated (you may even watch it happen in real time!).

Here's the abstract:
An interface between semiempirical methods and the polarized continuum model (PCM) of solvation successfully implemented into GAMESS following the approach by Chudinov et al. (Chem. Phys. 1992, 160, 41). The interface includes energy gradients and is parallelized. For large molecules such as ubiquitin a reasonable speedup (up to a factor of six) is observed for up to 16 cores. The SCF convergence is greatly improved by PCM for proteins compared to the gas phase.

PhD Position in Theoretical/Computational Chemistry at the University of Southern Denmark

A fully financed three-year PhD position in theoretical/computational chemistry is available for a highly motivated applicant at the Department of Physics, Chemistry and Pharmacy at the University of Southern Denmark starting from 1 June 2013. The position is financed through the Sapere Aude programme under the Danish Council for Independent Research.

The research project will consists of - in roughly equal parts – theory/algorithm development and computational modelling within molecular biophysics. The main focus will be on studying electronic processes in complex and biological environments, and the developed methods will rely extensively on quantum mechanical formulations. Of special interest will be models which effectively combine quantum mechanics and molecular mechanics aiming for a realistic description of very large bio-molecules including all physical relevant interactions.

The ideal candidate holds a Master’s degree or equivalent in theoretical (bio)physics, modelling, chemistry or computational science. The candidate should have knowledge in one of the following programming languages: C/C++, Python or Fortran. Experience in computational chemistry/physics is a prerequisite.

More information here.

Friday, March 8, 2013

New Manuscript, Molecule Calculator (MolCalc)


The Molecule Calculator (MolCalc) is a small web/server application +Jan Jensen and +I build for teaching purposes and also the subject of an article we just submitted to Journal of Chemical Education about.  If you haven't, then please check the Molecule Calculator on;

dgu.ki.ku.dk/molcalc

MolCalc is a web interface that allows anyone to build molecules and calculate molecular properties, easy. MolCalc is designed for teaching as opposed to research - specifically for assignments in which students build their own molecules and estimate their own molecular properties. In the newest version we have switched to JSmol, which results in the system is more stable and actually works on most tablets and phones.
Source, bug reports and feature request can be found on github.com/jensengroup/molcalc.

Abstract:
A new web-server called The Molecule Calculator (MolCalc) is presented.  The entry page is a molecular editor (JSmol) for interactive molecule building.  The resulting structure can then be used to estimate molecular properties such as heats of formation and other thermodynamic properties, vibrational frequencies and vibrational modes, and molecular orbitals and orbital energies.  These properties are computed using the GAMESS program at either the RHF/STO-3G (orbitals and orbital energies) or PM3 level of theory (all other properties) in a matter of seconds or minutes depending on the size of the molecule.  The results, though approximate, can help students develop a “chemical intuition” about how molecular structure affects molecular properties, without performing the underlying calculations by hand - a near impossible task for all but the simplest chemical systems.

The article can be found on arXiv; http://arxiv.org/abs/1303.1679

Also watch the introduction video for MolCalc 1.1;




More on the Molecule Calculator:
http://molecularmodelingbasics.blogspot.dk/2012/08/the-molecule-calculator.html
http://molecularmodelingbasics.blogspot.dk/2013/02/the-molecule-calculator-v11.html


Thursday, March 7, 2013

RMSD and MUE versus Correlation Coefficient: a simple illustration of the difference

Here is a simple illustration of what the root-mean-square deviation (RMSD), mean-unsigned error (MUE) and correlation coefficient ($r$) can tell you about your data.   Imagine that the $x$-axis is experimental data and the $y$-axis is computed data in some arbitrary units. (you can access the data here).

The blue dots represent a perfect correlation $(y=x)$ for which $r$ = 1 and RMSD = MUE = 0.

The red dots represent the function $y=2x-5$.  The RMSD = 2.9 and MUE = 2.5, and both  would seem to indicate a pretty crappy model.  However, $r$ = 1.0 indicating that there is a systematic error that can be fixed completely by a linear fit.  In this case, the MUE is an indicator of part of this systematic error that can be fixed by an offset $[x=\frac{1}{2}y+2.5]$.

The orange dots do not represent a linear function and clearly represent a worse model than red dots.  However, the RMSD = 2.6 and MUE = 2.1 are both slightly better the red model.  But, $r$ = 0.7 indicating that only part of the discrepancy can be fixed by a linear fit.

Indeed, a linear fit to the orange data $(x=1.2y-2.75)$ can only reduce the RMSD and MUE to 2.1 and 1.8, respectively.

The relationship between $r$ and the RMSD after a linear fit is$$RMSD_{fit}=\sigma_x\sqrt{1-r^2}$$where $\sigma_x$ is the standard deviation of the experimental data, which in this case is 2.9.  So knowing $r$ and $\sigma_x$ tells one immediately what the lowest possible RMSD value for a model is using a linear fit.

Also, you can think of $\sigma_x$ as the RMSD for the very simple model $y=\langle x \rangle$, i.e. the model simply returns the average value of the experimental data.  This is the maximum RMSD value for a linear fit (where $r$ = 0).

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License