Další strukturní databáze

Post on 15-Jan-2017

215 views 0 download

transcript

KFC/STBIStrukturní bioinformatika

04_databáze

Karel Berka

htttp://www.rcsb.org/pdb/

Databáze – není jich málo…

Primární strukturní databáze

• PDBe: Protein Data Bank in Europe– doplnění PDB z BMRB (NMR) a EMDB (EM)

• PDBsum :– shromažďuje další informace o struktuře

• PDBwiki: A community annotated knowledge base of biological molecular structures

– wikipedia o PDB strukturách

• NDB: Nucleic Acid Structure Database– databáze Nukleových struktur

• CSD: Cambridge Structural Database– databáze krystalů malých molekul – placená

• MODBASE: Database of Comparative Protein Structure Models– databáze modelů proteinů

Sekundární databáze

• SCOP: Structural Classification of Proteins– hledání strukturních rodin proteinů

• CATH: – hledání strukturních rodin proteinů

• GENE3D:– strukturní genomika

• 3Dee– Database of Protein Domain Definitions

• FSSP: – Based on exhaustive all-against-all 3D structure comparison of

protein structures currently in the Protein Data Bank (PDB)• DALI:

– Fold Classification based on Structure-Structure Assignments

PDBehttp://www.ebi.ac.uk/pdbe/• Souhrnná relační databáze macromolekulárních struktur

Example of an Atlas page, in this case for PDB entr y 1E9F.

Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.

PDBe

Navigačnímenu

sekvence anotovanáz dalších databází

Uniprot

CATH

Pfam

SCOP

Schematic overview of the process by which SIFTS fi les are generated (see text for details).

Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.

SIFTS format

Structure Integration with Function, Taxonomy and Sequence

PDBe – služby

http://www.ebi.ac.uk/pdbe-srv/msdmineSupports ad-hoc queries and data analysis based on the

relational PDBe databasePDBeMine

http://www.ebi.ac.uk/pdbe/olderado/Clustering information for NMR entries in the PDBOLDERADO

http://www.ebi.ac.uk/pdbe-as/PDBeValidateValidation and analysis of PDBe dataPDBeAnalysis

http://www.ebi.ac.uk/pdbe-as/PDBeTemplate/Search of local residue interactions in the PDBPDBeTemplate

http://www.ebi.ac.uk/msd-srv/ssm/Secondary Structure Matching (SSM) service for

comparing protein structures in 3DPDBeFold

http://www.ebi.ac.uk/msd-srv/prot_int/pistart.htmlSearch and analysis of Protein Interfaces, Surfaces

and AssembliesPDBePISA

http://www.ebi.ac.uk/pdbe-site/PDBeMotif/Query and analysis of structure, sequence motifs and

interactionsPDBeMotif

http://www.ebi.ac.uk/msd-srv/chempdbLigand search using the PDB reference dictionaryPDBeChem

http://www.ebi.ac.uk/pdbe-srv/emsearchSearch system for the EM DatabaseEMsearch

http://www.ebi.ac.uk/pdbe-srv/pdbeliteSearch system based on the relational PDBe databasePDBeLite

http://www.ebi.ac.uk/pdbe-srv/viewText-based and advanced PDB search toolPDBeView

http://www.ebi.ac.uk/pdbe-as/PDBeMapQuick/Quick access to cross-reference information to external

databases based on PDB IDPDBeMapQuick

http://www.ebi.ac.uk/pdbe-as/pdbStatusSearch system to query the status of PDB entriesPDBeStatus

http://www.ebi.ac.uk/pdbe/docs/biobar.htmlSearch system implemented as a toolbar application

for Mozilla browsersBIObar

A toolbar search application for Mozilla/Netscape or firefox browsers

http://biobar.mozdev.org/

Simple and quick retrieval of data from PDBe and 45 other Databases

Biobar

PDBeChem• „Ligandy” v PDB• Vázané molekuly (např. cukry,

lipidy, inhibitory, koenzymy and kofaktory)

• Unikátní 3 písmenný kód– atom, element type, connectivity,

bond orders, stereochemicalconfiguration

• Hledání dle– By ligand code

– By ligand name

– By formula– By non-stereo SMILE

– By stereo SMILE

– By exact stereo structure– By fingerprint similarity

– By fragment expression

Example of a graphically defined query that can be submitted to

PDBeMotif.

Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.

PDBeMotif• Hledání dle

a) Ligands and their 3D environment

b) protein families (SCOP, CATH, UNIPROT, EC-number)

c) protein secondary structures and different 3D motifs (PROSITE, beta turn, catalytic sites etc.)

d) protein Φ/Ψ angle sequences

• Výsledky:

a) Sequence multiple alignment

b) 3D multiple alignment of fragments, motifs and protein chains.

c) Interactions statistics

d) Motifs characteristics and properties distribution charts.

• Define search by ligand

• Define search by sequence motif (pattern)

• Define search by metal site geometry

• Define search by environment

• has same environment

• has similar environment

PDBe-site page

• Compare ligand environments.

• Analyze interactions between ligand and protein.

• Compare binding environment.

• Look for ligands within a certain environment.

• Superpose binding sites and ligands.• Predict what could bind that empty

pocket in your structure

What assembly can my structure have ?

PDBePisa

• PQS – protein quarternary structure

• velmi obtížné získat predikcí –krystalografie a EM

The new EMViewer 3D visualization Java applet is av ailable on the EMDB Atlas pages and allows interactive generation of isosurface represe ntations.

Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.

EMviewer

PDBsum

Schematic diagrams from the PDBsum ‘Protein page’ fo r entry 1a5z: lactate dehydrogenase from Thermatoga maritima (16).

Laskowski R A Nucl. Acids Res. 2009;37:D355-D359© 2008 The Author(s)

PDBSum

• Snaha mít všechny informace na jednom místě

• Dodatečné analýzy– schéma sekundárních

struktur– Ligplot

Extracts from the protein–protein interaction diagr ams in PDBsum for PDB entry 1mmo, a non-haem iron hydroxylase from Methylococcus capsul atus (17).

Laskowski R A Nucl. Acids Res. 2009;37:D355-D359© 2008 The Author(s)

PDBSum interfaces

NDB

NDB

• DNA• RNA

NDB3D struktura 2D struktura

RNAview

CSD

• The Cambridge Structural Database

• www.ccdc.cam.ac.uk• malé látky

• placená + pro výukové účely otevřený set 500 látek

600050730ProteinsPDB

5003555Nucleic AcidsNDB

40000488057Organics, Metal-OrganicsCSD

9000100200Inorganics & MineralsICSD

9000119600Metals, alloys, inorganicsCRYSTMET

za rokTotal (2009)co?DB

CSD - komponenty

WebCSD

Mercury• Mercury visualiser

– Crystal structure visualisation program by CCDC

• Free• Teaching subset embedded

A zpátky k proteinům...

Klasifikace struktur proteinů

Class:similar contents of secondary structures

Architecture (Fold):structural similarity

Superclass (Topology):probably same ancestor

• SCOP, CATH, FSSP, 3Dee

SCOP

• Structural Classification of Proteins• manual classification of protein structural domains based on

similarities of their amino acid sequences and three-dimensional structures.

• SCOP utilizes four levels of hierarchic structural classification:– class - general "structural architecture" of the domain– fold - similar arrangement of regular secondary structures but without

evidence of evolutionary relatedness– superfamily - sufficient structural and functional similarity to infer a

divergent evolutionary relationship but not necessarily detectablesequence homology

– family - some sequence similarity can be detected.

Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995). SCOP: a structural classification of proteins database for theinvestigation of sequences and structures. J. Mol. Biol. 247, 536-540.

CATH• manually-curated hierarchical

classification of protein domainstructures.

• více automatizované, než SCOP • Class

– secondary structure content• (mainly-alpha, mainly-beta,

mixed alpha/beta or 'fewsecondary structures');

• Architecture– general arrangement of the

secondary structuresirrespective of connectivitybetween them

• (e.g. alpha/beta sandwich);

• Topology (Fold)– connectivity of secondary

structures in the chain;• Homologous Superfamily

– domains that are believed to berelated by a commonancestor .

• S-levels– automated clustering based on

sequence identity.

CATH

GENE3D

• Gene3D – large collection of CATH protein domain

assignments for ENSEMBL genomes andUniprot sequences

– functional information, as well as taxonomicdistributions, multi-domain architectures andprotein-protein interaction (PPI) data.

FSSP - fold classificationwww2.embl-

ebi.ac.uk/dali/fssp/

structurallysuperimposedproteins by (DALI)

"Distance-matrix ALIgnment"

3Dee – domény

http://www.compbio.dundee.ac.uk/3Dee/Hierarchie jednotlivých domén

klastrování dle strukturní podobnosti

Dengler, U., Siddiqui, A. S. & Barton, G. J. (2001). Protein structural domains: Analysis of the 3Dee domains database. Proteins 42 , 332-344. Siddiqui, A. S., Dengler, U. & Barton, G. J. (2001). 3Dee: A database of protein structural domains. Bioinformatics 17, 200-201.

Databáze, na které se nedostalo...

• Relibase– protein-ligand interactions

• Modbase, SWISSModel repository, MMDB– databáze modelů

• MolMovdb– Macromolecular Motions database

• A spousta dalších většinou specifických pro daný problém– např. jen pro cytochromy P450

• CYPED, SuperCyp, Cytochrome P450 Homepage, Fungal CYP database, CYPallelles, Arabidopsis Cytochrome P450s, Cytochrome P450 Drug Interactions Table, a další.

• Pak nezbývá, než použít Google. :o)