1 PBBD – Protein Building Blocks Database
Motifs of super-secondary structure that are consistent with closed loop properties were collected manually using the following principles: 1) amino acid sequence length between 10 and 40 mers; 2) Cα-Cα distance between sequence extremities < 10 Å; 3) at least three apolar and/or hydrophobic amino acids contiguous with extremities; 4) structural compatibility with one of the more common folding motifs that are recurrent in proteins, such as type I’..IV’ αβ-barrel, β-hairpin, β-corner, Greek key motif, Rossmann fold, βαβ-motif, zinc finger, leucine zipper, helix turn helix, α-α corner, and α-hairpin. Moreover, elementary building blocks of different lengths, with internal conformations kept invariant, which possess HFUs on the N-terminal and C-terminal ends, were also included in the database. As well, we collected a broad set of intrinsic disordered regions that lack a defined three-dimensional shape and cannot be classified by any of the previous reported categories.
1.2 Database accuracy
Multiple alignments of the amino acid sequences showed a satisfactory parallel between closed loops structure and sequence: use of the Gonnet matrix determined separation of αβ-barrel sequences in three distinct strings, each corresponding quite well to the structural elements of these super-secondary motifs. Surprisingly, alignment of sequences corresponding to βαβ-motifs revealed a sufficiently clear separation between the secondary structures present. The substitution matrix used evidently discriminated between strings characterised prevalently by hydrophobic and/or non polar residues, corresponding preferentially to β-strands, and strings characterised by polar, acid and/or basic residues, corresponding to α-helix type structures. On the other hand, strings in which residues such as Gly, Pro and Asp prevail are often corresponded to hairpin of αβ-barrels, helix-turn-helix hairpins and flexible coils connecting secondary structures of βαβ motifs. Statistical analysis based on frequencies of residues within β-strands showed a predilection for amino acids such as Val, Ile and Leu. Dipeptides Val-Ile, Val-Leu and Leu-Ile were among those recurring with a high frequency and are therefore a good, though non univocal, fingerprint for identifying β-strands occurring inside closed loops. Among tripeptides, such as Ile-Val-Val, Leu-Val-Ile, and Ala-Leu-Val, observed with high frequency in β-strands, some belong to “prototype” motifs of prokaryotic closed loops already characterised in previous studies, while the frequencies of binary patterns revealed a clear prevalence of non polar residues. Analysis of residues recurring in α-helices showed a clear predominance of dipeptides such as Val-Ala, Ala-Leu, Leu-Lys, Arg-Glu and Ala-Ala, and of tripeptides X-Ala-Ala (X = Arg, Leu, Lys, Glu), Glu-Glu-Ala, Ala-Leu-Ala and Leu-Lys-Ala. On the other hand, binary patterns identified in α-helices often showed regular alternation of polar and non polar residues typical of amphipathic helices, whereas in other cases we observed short sequences of polar amino acids followed by a contiguous series of hydrophobic and/or non polar amino acids. Finally, hairpin folds typical of αβ-barrels and coils in βαβ and α-hairpin motifs revealed, as expected, a clear predominance of residues interrupting secondary structures (Pro, Gly, Asp) and motifs such as Gly-Arg, Gly-Ala-X (X = Ser, Ala, Asp, Arg, Gly, Glu, Lys) and Ala-Pro-X (X = His, Ser, Gly, Glu). Study of hydrophobic clusters revealed many patterns peculiar to the different types of secondary structures analysed: the trend of minimum and maximum values of the different populations observed corresponded quite well to the trend of mean values of the populations themselves, confirming the hypothesis formulated for data collection.