Hi chiralSPO,

1) Indeed, the varying confirmations is a big problem for this kind of descriptors.

It is one of reasons I wanted to make it cheap to calculate (sums over atoms instead of e.g. integrating over electron density) - thanks of that we can generate descriptors for various conformations and treat them independently.

It would be great to include this conformation changes in the descriptors, what is very difficult as we would like to use a fixed length vector descriptor.

However, something could be done - for example generate ensemble of all conformations (you need to discretize the angles: e.g. every 15 deg), then store in the descriptor: the expected coefficients in this ensemble, and also variance of the coefficients in this ensemble.

For example both the expected length and how much can it change.

There is lots of opportunities for potential improvements ...

2). So the general approach of virtual screening is that they know which ligands activate the protein, which don't - and you would like to use some clustering to find more substances which activate it (potential drugs).

Sure, we could also use a complementary approach: find such a descriptor for a protein cavity and then look for molecules agreeing with this shape.

3) Sure there are various types of molecules - one of my conclusions is that you should design a dedicated descriptor for a given interesting family.

I was thinking mainly about the ones activating a protein as the most interesting potential drugs.

But I have to admit that I'm not a chemist and this is my first attempt to chemoinformatics - I imagined that this is a kind of a key and lock problem: ligand has to slide into a protein, so it should be elongated and flat.

But sure this is much more complicated - this is only a tool for fast initial screening. Then more accurate and costly screening phases should be applied.