Wednesday, 16 November 2011

structural biology - How many human proteins have a solved 3D structure?

PDB is a good resource for answering such questions, since it will let you filter results by many additional parameters. To count and extract 3D structures of human proteins:



  1. Open Advanced search tab of the PDB website.

  2. Select Biology -> Source organism from the menu.

  3. Type Homo sapiens (human).

  4. You can reduce redundancy by checking Remove Similar Sequences at n% identity below.

  5. Submit query.

To add further filters, click Refine Query with Advanced Search. There you can extract structures by deposition date, quality (eg. resolution or R-factors for structures solved by X-ray diffraction), ligands, enzyme classification, etc. (by checking Add Search Criteria)



Search for human proteins with removal of homologues with 90% identity cutoff fetches 7117 structures. The number of good quality X-ray protein structures (resolution < 2.5A) is currently 3964 (with the same identity cutoff).



You can then download the fetched list or create custom reports (menus below).



A good tool (also used by PDB) for generating non-redundant protein datasets is cd-hit.

No comments:

Post a Comment