Molecular shape descriptors for virtual screening of ligands?

  • 2 Replies

0 Members and 1 Guest are viewing this topic.


Offline Jarek Duda

  • Full Member
  • ***
  • 70
    • View Profile
I was thinking about designing molecular descriptors for the virtual screening purpose: such that two molecules have similar shape if and only if their descriptors are similar.
They could be used separately, or to complement e.g. some pharmacophore descriptors.

They should be optimized for ligands - which are usually elongated and flat.
Hence I thought to use the following approach:
- normalize rotation (using principal component analysis),
- describe bending - usually one coefficient is sufficient,
- describe evolution of cross-section, for example as evolving ellipse

Finally, the shape below is described by 8 real coefficients: length (1), bending (1) and 6 for evolution of ellipse in cross-section. It expresses bending and that this molecule is approximately circular on the left, and flat on the right:

Mathematica implementation:

Have you met something like that? Is it a reasonable approach?
I am comparing it with USR (ultrafast shape recognition) and (rotationally invariant) spherical harmonics - have you seen other approaches of this type?


Offline chiralSPO

  • Global Moderator
  • Neilep Level Member
  • *****
  • 1910
    • View Profile
Very interesting.

I am a synthetic chemist, and only have a rudimentary understanding of computational chemistry, so I may have missed some of the finer points.

I have a few questions:

- Is this intended as a method to compare specific conformers of molecules, or does it take into account many (or all possible) conformers?

- Is this intended as a predictive algorithm (in which you specify a shape, and it recommends molecules that fit), or does it only calculate shapes for input molecules?

- I'm not sure how useful it is to consider ligands as typically elongated or flat... (I assume you mean ligands that bind proteins, not ligands that bind metals...) Even if a large percentage of target molecules do fit into those categories, there are many extraordinary molecules that gain their activity due to their 3D structures (either highly geometrical or blobs), like the small molecules: Azithromycin (, Saxitoxin ( and Tetrodotoxin (

Does your calculation still work for these oddly shaped polycyclic molecules?


Offline Jarek Duda

  • Full Member
  • ***
  • 70
    • View Profile
Hi chiralSPO,

1) Indeed, the varying confirmations is a big problem for this kind of descriptors.
It is one of reasons I wanted to make it cheap to calculate (sums over atoms instead of e.g. integrating over electron density) - thanks of that we can generate descriptors for various conformations and treat them independently.

It would be great to include this conformation changes in the descriptors, what is very difficult as we would like to use a fixed length vector descriptor.
However, something could be done - for example generate ensemble of all conformations (you need to discretize the angles: e.g. every 15 deg), then store in the descriptor: the expected coefficients in this ensemble, and also variance of the coefficients in this ensemble.
For example both the expected length and how much can it change.
There is lots of opportunities for potential improvements ...

2). So the general approach of virtual screening is that they know which ligands activate the protein, which don't - and you would like to use some clustering to find more substances which activate it (potential drugs).

Sure, we could also use a complementary approach: find such a descriptor for a protein cavity and then look for molecules agreeing with this shape.

3) Sure there are various types of molecules - one of my conclusions is that you should design a dedicated descriptor for a given interesting family.
I was thinking mainly about the ones activating a protein as the most interesting potential drugs.
But I have to admit that I'm not a chemist and this is my first attempt to chemoinformatics - I imagined that this is a kind of a key and lock problem: ligand has to slide into a protein, so it should be elongated and flat.
But sure this is much more complicated - this is only a tool for fast initial screening. Then more accurate and costly screening phases should be applied.