BOLD Systems: FAQ

What kind of sequences does BOLD accept?

BOLD accepts sequences from more than 150 genetic markers including the 4 main DNA barcoding markers: COI-5P, ITS, matK, and rbcL as well as dozens of others. For a full list of markers accepted, or to inquire about adding new markers to the BOLD database, contact [email protected].

Can I take advantage of the tools on BOLD without creating an account?

Many tools including the BOLD Taxonomy Browser, the BOLD ID Engine, the BIN Database, and the BOLD Public Data Portal are all available publicly without creating an account. The BOLD Taxonomy Browser allows users to view information about the progress of DNA barcoding for any taxon on BOLD, while the Public Data Portal and BIN database allow users to view the actual records assembled in BOLD and used in scientific publications. Finally, the BOLD ID Engine allows users to compare their sequences against the DNA barcode library to infer specimen identifications.

Can I create an account and put my sequence on BOLD even if I am not part of the iBOL project?

Being a member of iBOL is not a pre-requisite for using BOLD Systems. BOLD Systems is an open access DNA barcoding workbench and anyone is welcome to use it to store and analyze data. All BOLD users have access to the same tools and resources whether they are part of iBOL or not.

Can I request access to private data?

BOLD adheres to stringent security policies to ensure the privacy of its users. However, one of the goals of BOLD is to promote data sharing and collaboration. As such, you may request access to private data from BOLD by sending a message to [email protected], and our support staff will contact the owner on your behalf. It is beneficial to include a description of what the data will be used for, as it will help data owners in their decision-making process.

What is the source of the identification in the BOLD ID Engine?

BOLD is a community-based website, so all identifications provided by the BOLD ID Engine are based on data submitted by other users of the system. In order to minimize uncertainty, the BOLD ID Engine offers four separate libraries for users to search from. These vary both in the number of records and type of data they possess. Read the descriptions offered for each library to make sure you select the most appropriate option for your search.

BOLD also offers historical database searches, so you can search as far back as 2009 and see the historical results for your search. This tool is especially useful for users trying to replicate results from previous years.

When a FASTA file is pasted into the ID Engine, BOLD compares it to all other records in the system, finding matching sequences to generate an identification. The accuracy of the identification depends heavily on the quality of the underlying data. As such, we encourage project managers to ensure the validity of the taxonomic identifications on their records.

I noticed an issue with one of the records in the BOLD Public Data Portal. Who should I contact about that?

If you notice an issue with one of the records on BOLD Public Data Portal, you can log into BOLD to add a comment or a tag to that particular record. The owner of the record will be notified of your tag and comment so they know to take action. If you do not have a BOLD account, we recommend that you register for one. Otherwise please contact [email protected] so our support staff can contact the record owner.

For Registered Users

What is a BOLD Process ID?

BOLD Process IDs are unique codes automatically generated for each new record added to a project. They serve to connect specimen information, such as taxonomy, collection data and images, to the DNA barcode sequence for that specimen.

BOLD Process IDs consist of a standard format including the project code and sequential numbers, followed by the year the record was added to the database. For example, the first record uploaded to project PROJ in 2012 would be assigned BOLD Process ID PROJ001-12 . This format ensures BOLD Process IDs are always unique in the system, as well as identifying the year the record was uploaded and the original project it was uploaded to.

I noticed my records have been flagged. What does it mean, and how can I remove the flag after I fix the issue?

There are several reasons a record may be flagged, including the detection of a contaminated sequence or a misidentified species. Flags serve two purposes: they act as alerts to inform project managers that an issue has been detected in their records, and they prevent a record from being included in the BOLD ID Engine and Taxonomy Browser. In some cases, changing the taxonomy of the sample or re-editing the sequence can resolve the flags. Once the issue has been resolved, project managers can contact [email protected] to have the flag removed from their record(s).

Why doesn't my sequence have a BIN assignment?

BIN assignments are based on sequence divergence, so a BIN may take between two and four days to be assigned to a record once a sequence has been uploaded. Not all sequences will receive a BIN assignment. Currently BINs are only assigned to records with COI sequences longer than 500bp that contain less than 1% ambiguous bases.

Why might the BOLD ID Engine not return with matches for my sequence?

If the BOLD ID Engine is not returning any identification matches to your sequence, there may be a few factors worth investigating. First, the genetic marker used must be supported by the database. BOLD currently supports COI for animal identifications, matK and rbcL for plant identifications and ITS for fungal identifications. Second, the sequenced region of the gene should match the marker used. For example, the barcode region for COI is located in the 5’ end. Although other gene regions may return results, most of the database is composed of sequences within the barcode region. Finally, you should ensure the length of your sequence is 180bp or longer. Short sequences and/or those containing a large number of ambiguous bases should be run in the full length database only.

If the above factors have been examined and no identifications are returned, please contact [email protected] and our support staff will be happy to assist you further.

How do I interpret the results of the Taxon ID Tree?

BOLD uses neighbour-joining trees which group sequences together by the number of amino acid or nucleotide differences. The arrangement of the specimens in the tree is based on sequence similarities, with the sequences that are most similar placed closer together on the tree, and with the branch length indicating the degree of similarity.

The percentage of similarity between sequences can be measured against the legend (usually 2%) where the longer the branch the more disparity between the sequences. It is often expected that specimens of the same species have more similar sequences and cluster closer together than specimens from different species.

Unexpected outcomes can reveal interesting findings, which could be associated with biologically relevant patterns, or they can reveal errors such as misidentification or contamination of a sample. For more information on how to build a Taxon ID Tree, and the parameters you can select to tailor your tree, please refer to the BOLD Handbook.

(Note: The BOLD Taxon ID Tree does not infer phylogenetic relationships. There may be many ways to interpret a tree, BOLD encourages that you to use your own discretion in making assumptions from the results).

Does the length of my sequence influence the shape of my Taxon ID Tree?

Short sequences may influence the shape of the Taxon ID Tree based on the alignment algorithm selected while building the tree. The BOLD Aligner is amino acid based, so instead of comparing the nucleotides between the sequences, it compares the amino acid translations. Using this alignment algorithm, short sequences are less likely to align correctly with longer sequences.

When building Taxon ID Trees containing short sequences (anything shorter than 200bp), it is recommended to use Muscle or Kalign algorithms. Independent of the alignment algorithm selected, whenever short sequences are included in an analysis, the results should be interpreted with caution.

What is a dataset and how is it different from a project?

A dataset is a virtual representation of records stored on BOLD. Records from multiple projects can be added to datasets allowing users to access the data while keeping the records in their original projects. Using datasets, records from multiple projects can be concatenated, analyzed, and even published without ever having to be moved from their original projects.

For example, if you are performing a three-year biodiversity study, you may wish to store the records on BOLD in projects based on the year they were collected. If you want to look at all of the Hymenoptera collected over the three years, you can add the appropriate records to a dataset. The records will stay organized in their year-based projects but you can access them all at once and even publish them to GenBank from the dataset. To simplify the publication process, you may also request a DOI (Digital Object Identifier) from BOLD for public datasets. The DOI can be incorporated into the publication so readers will have quick and easy access to the data.

When is the best time to submit my sequences to GenBank?

Records submitted through BOLD to GenBank will remain private on GenBank for one year to allow time for submitters to publish their findings before the records are publicly accessible. BOLD recommends all users submit their sequences to GenBank while preparing their manuscript for publication. Once the manuscript is published, users are encouraged to make their data publicly available in BOLD. This can be accomplished by visiting the Modify Project Properties page in the Project Console.

Once a record has been submitted to GenBank, can I modify it? Will updating a record on BOLD automatically update the record on GenBank?

BOLD regularly submits record updates to GenBank, and GenBank will incorporate these changes on a periodic basis. If an update is required within a specific time period, please contact [email protected] and our support staff will help facilitate the synchronization of BOLD records with GenBank records.

How long will my data remain private before they are included in the BOLD Public Data Portal? Is there any way I can make my data public sooner?

BOLD Systems serves as a workbench for users to organize, analyze, and publish data and its design and policies incorporate the fact that specimen identification can be a time consuming process. As such, there is no hard timeline for records to be made public.

Data records will not be released into the BOLD Public Data Portal unless project managers request to make them public via the project or dataset properties. Please visit the BOLD Handbook for instructions on how to do so.

Exceptions to these protocols occur in the iBOL project for sequences generated at the Canadian Center for DNA Barcoding (CCDB). Under this circumstance, data records are partially released to the BOLD Public Data Portal within 3 months of sequence upload, in compliance with the iBOL data release policy. Early release records have their taxonomy obscured (exposing only order) and have locality information obscured (exposing only GPS and country). Additionally, all data elements are exposed after 18 months of early release. Please visit the iBOL release policy for more information: iBOL Data Release Policy. If you would like to have your data released sooner, you may do so at anytime by making your project or dataset public and by submitting your data to GenBank.

How do I make a dataset or project available in the BOLD Public Data Portal?

Making a project or dataset public is a straightforward process and project managers can choose to do so at any time. This can be achieved by clicking on Modify Project Properties from within a project or dataset and clicking on the check box that says “Make this project publicly visible”. See the BOLD Handbook for more information. After clicking on save, your project will be publicly accessible through the workbench and the records will be available on BOLD Public Data Portal after it is updated.

Can I delete images from my records?

You may request to have your images deleted from BOLD at any time by sending an email with your request to [email protected] .

My trace file has a failed status. Can I upload new copies of the traces with the same name?

No, the trace files must be given a new name. Once a trace file has been uploaded to BOLD, whether it succeeded or failed, another trace file with an identical name cannot be uploaded. However, only a small change is required, such as adding a letter or number to the end of the naming scheme of the trace file.

When should I use a container project?

Data can be stored on BOLD using both data projects and container projects. Container projects do not store data records; they store the projects that contain the data records. These containers can be very useful when organizing your data on BOLD, especially if you are working with a large number of records. For example, if you are trying to barcode the freshwater fishes of Canada, you may decide to make a data project for each collection site and to place those projects in a container project for each province or territory. BOLD Systems also offers the ability to merge multiple projects for analysis.

Frequently Asked Questions

For New Users