Sequence input
There are two ways to input a protein sequence:
I - If the protein is deposited in the UniProt database (either in SwissProt or TrEMBL) you can specify the accession
code or the ID of the protein in the "Enter SWISS-PROT/TrEMBL identifier or accession number" filed. The ANCHOR server
is always linked to newest version of UniProt. The header of the UniProt entry will be displayed as the title in the results page.
II - Type or cut and paste your sequence in the "paste the amino acid sequence" filed. The amino acid sequence must be in
the standard single letter code format. Spaces and other non-standard characters within the pasted sequence are permitted, however they
will be removed with the remaining sequence treated as a single continuous chain. If the first line starts with the ">" character
(e.g FASTA sequence headers) it will be used as the title in the results page. The minimum sequence length is 6 residues.
The recommended sequence format is
this:
>Name of the sequence
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIE
Multiple sequences
It is possible to input more than one sequence using the multiple sequence version of ANCHOR.
Sequences can be input in either multiple FASTA format or by supplying a list of UniProt IDs/ACs. It is also possible to
upload a file containing the sequences or a list of UniProt IDs/ACs. Motif searches are also supported and motifs can be uploaded
in a file as well. The output is provided in a text-only form and is available via a temporary link that is provided in the
results page and is also sent to the email address provided.
Motif search
The prediction of disordered binding regions can be complemented by motif searches. The motifs are specified by a
standard regular expression (read more about regular expression syntax here). The format is
motif [name]
where name is optional. There should be only one motif per line. For example:
F...W..[LIV] MDM2
[RK].L.{0,1}[FYLIVMP] CYCLIN_1
[PA][^P][^FYWIL]S[^P] USP7_1
A complete list of current ELM motifs from the ELM database
can be found here and a list of calmodulin binding motifs from the
Calmodulin Target Database can be found here converted into this format.
You can copy and paste it in the appropiate field, or when using ELM motifs it is possible to just specify
the name of the motif and exclude the pattern itself, hence instead of:
[RKY]..P..P LIG_SH3_1
P..P.[KR] LIG_SH3_2
...[PV]..P LIG_SH3_3
KP..[QK]... LIG_SH3_4
P..DY LIG_SH3_5
it is possible to write just:
LIG_SH3_1
LIG_SH3_2
LIG_SH3_3
LIG_SH3_4
LIG_SH3_5
Other motifs can also be specified. For example a motif to find proline-rich regions can be:
P+.?P{2,}.?P+ Poly-Proline
The server returns the starting and ending position of each hit of every motif searched together with the matched sequence. If the found motif
is a known true positive instance of an ELM then the UniProt ID of the protein containing that true positive hit is also returned.
If the graphical output mode is selected (see "Output type" section below), the results of the motif search are shown with colored boxes.
Known true positive hits are indicated by red boxes and the rest of the hits are indicated by orange boxes (see eg. p53 in the
Examples section).
Output type
Generate plot:
The graphical image (a png file) is generated using the JpGraph software. Large sequences
are chopped into smaller fragments, but the user can change the window size of this plot. The server generates a plot with the
profiles calculated by IUPred, a general disorder prediction method (in red), and ANCHOR, a prediction of disordered binding regions (in blue).
Underneath the profile, predicted binding regions are indicated by horizontal bars. The bar is shaded according to the prediction score. Regions
that are filtered out are marked by empty bars. If motifs were specified, the matching motifs are also indicated with colored boxes.
The text output is also appended.
Raw data only:
This offers a simple text output and composed of several parts. The first part returns the list of the predicted binding regions. If some regions
are filtered out, these are listed separately. The hits of the specified motifs are provided next. Finally, the prediction profile is returned.
For each residue, it specifies its sequential number, residues type, and its score to be in disordered binding regions. This score can be between 0
and 1. An additional column indicates predicted binding regions by 1, otherwise it is 0. This takes into account the results of filtering.
Filtering
Currently there are two filtering criteria. Short regions with length below 6 residues and regions with an average IUPred score below 0.1 are filtered out
(see the predictions for hemoglobin and glycophorin in the "Examples" section
for demonstration on the effect of the two filtering criteria).
Examples
6 sample runs of ANCHOR are provided here to demonstrate the application of the server.
|