乐动体育官方网站

What is tbl2asn?

乐动体育官方网站Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank. It uses many of the same functions as Sequin but is driven generally by data files. Tbl2asn generates .sqn files for submission to GenBank. Additional manual editing is not required before submission.

Tbl2asn is available by anonymous FTP. Copy the right version for your platform, then uncompress the file, rename it to "tbl2asn", and set the permissions, as necessary for the platform.

Additional details are provided in the GenBank Submission Handbook

6 types of input data files

REQUIRED

  1. Template file containing a text ASN.1 Submit-block object (suffix .sbt).
  2. Nucleotide sequence data in FASTA format (suffix .fsa).
  3. Feature Table (suffix .tbl). [Required only if including annotation]

OPTIONAL

  1. Quality Scores (suffix .qvl.)
  2. Protein sequence (suffix .pep). (These are rarely needed.)
  3. Source Table (suffix .src.)

Generating the .sqn file for submission

  • The minimum requirements to generate a Sequin file using tbl2asn are one .sbt file and one or more .fsa files.
  • The files are placed in a source directory and a series of command-line arguments are used to generate the .sqn files.
  • Tbl2asn will generate a .sqn for every .fsa file in the directory, plus any of the corresponding optional files that may be present. The other files must have the same file name prefix as their corresponding .fsa. (for example helicase.fsa and helicase.tbl).

Command Line Arguments

Typing "tbl2asn -乐动体育官方网站" will give the full list of command line arguments. Here is a partial list of commonly used arguments:

tbl2asn command line arguments
-pPath to the directory. If files are in the current directory -p. should be used.
-rPath for the resulting .sqn file(s) (if the -r argument is not used, the .sqn files will be saved in the source directory).
-tSpecifies the template file (.sbt). If the .sbt file is in a different directory the full path must be specified.
-iCreates single submission from indicated .fsa file in a directory of multiple .fsa files.
-aSpecifies the File type.
    r10k :Runs of 10+ Ns are gaps, 100 Ns are known length
    r10u :Runs of 10+ Ns are gaps, 100 Ns are unknown length
    s :FASTA Set (s Batch, s1 Pop, s2 Phy, s3 Mut, s4 Eco)
    l :FASTA+Gap Alignment
    z :FASTA with Gap Lines
    e :PHRAP/ACE
    d :FASTA Delta, di FASTA Delta with Implicit Gaps
    a :Any (default)
Sample command line: -a s
-jAllows the addition ofsource qualifiersthat will be the same for each submission. Example: -j "[organism=Saccharomyces cerevisiae] [strain=S288C]".
-V

乐动体育官方网站Verification (combine any of the following letters):

A summary file named errorlog.val is also created with the number, severity and type of errors found in all the .val files.
    v :Validates the data records. The output is saved to files with a .val suffix.
    b :Generates GenBank flatfiles with a .gbf suffix.
    r :Validates without Country Check

乐动体育官方网站Sample command line: -V vb

-kCDS Flags (combine any of the following letters):
    c :Instructs tbl2asn to annotate the longest open reading frame (ORF) if a .tbl file is not provided. The product name will be 'unknown' unless a product name is included in the FASTA definition, [product=xyz].
    m :Allows alternative start codons to be used in ORF searches.
    r :Allows Runon ORFs
Sample command line: -k c
-cCleanup (combine any of the following letters):
    f :Fix product names in specific categories of the Discrepancy Report. The output of changed product names is saved to files with a .fixedproducts suffix.
    x :Extend partial ends of features by one or two nucleotides to abut gaps or sequence ends.
    D :Correct Collection Dates (assume day first)
    d :Correct Collection Dates (assume month first)
Sample command line: -c fx
-yAdds a COMMENT to each submission. Example: -y "Contigs larger than 2kb have been annotated, representing approx. 87% of the total genome".
-YLike -y, but adds a COMMENT to each submission from a file.
-ZRuns the Discrepancy Report. Must supply an output file name. Recommended only for annotated genome submissions, complete or WGS. See theDiscrepancy Report pagefor information about its output.
-MMaster Genome Flags (combine any of the following letters):
    n :Normal. Combines flags for genomes submissions (replaces -a s -V v -c f; invokes FATAL calls when -Z discrep is included).
    b :Big. Combines flags for genome submissions with >20,000 contigs (like 'n' but uses the 'big' version for -Z discrep).
    p :Power users. Combines flags for genomes submissions (like 'n' but invokes the power-user FATAL calls for -Z discrep).
    t :TSA. Combines flags for TSA submissions (replaces -a s -V v -c f; invokes TSA-specific validations)
Sample command line: -M n

Example Command Lines

  • Single non-genome submission: a particular .fsa file, and only 1 sequence in the .fsa file:
    • tbl2asn -t template.sbt -i x.fsa -V v
  • Batch non-genome submission: a directory that contains .fsa files, and multiple sequences per file:
    • tbl2asn -t template.sbt -p path_to_files -a s -V v
  • Genome submission: a directory that contains multiple .fsa files of a single genome, and one or more sequences per file:
    • tbl2asn -t template.sbt -p path_to_files -M n -Z discrep
  • Genome submission for the most common gapped situation (= runs of 10or more Ns represent a gap, and there are no gaps of completelyunknown size, and the evidence for linkage across the gaps is"paired-ends"):
    • tbl2asn -t template -p path_to_files -M n -Z discrep -a r10k -l paired-ends

Before submitting your .sqn files to GenBank, review the .valfiles and correct any error-level errors. Taxonomy-related errorsabout missing lineages can generally be ignored. However, if thereis annotation and the genetic code is not the standard code, theninclude the correct code in the .fsa definition line as shown in the.fsa definition line , or with the -j乐动体育官方网站 in the command line,to avoid errors.

Creating the template file (.sbt)

Nucleotide sequence and FASTA defline formats (.fsa)

  • No size limit on nucleotide sequence, generally.
  • FASTA file should consist of a single definition line beginning with a '>'.
  • Minimum requirements for the FASTA defline are:
    • SeqID (sequence identifier) which is the text between the '>' and the first space. The SeqIDs limits are:
      • Must be <50 characters
      • Can only include letters, digits, hyphens (-), underscores (_), periods (.), colons (:), asterisks (*), and number signs (#).
    • Organism and related information (unless organism information is included with -j at the command line or in a .src file )
    • Optional defline information is in this list of source modifiers and includes:

Here is the list of source modifiers . See the Taxonomy pages for the genetic code values.

Biological

  • strain [strain=S288C]
  • isolate [isolate=CWS1]
  • chromosome [chromosome=XVI]

Other elements

  • topology [topology=circular]
  • location [location=mitochondrion]
  • molecule [moltype=mRNA] (DNA is the default)
  • technique [tech=wgs]
  • protein name [protein=helicase] (if using -c)
  • genetic code [gcode=4]

Here is the list of source modifiers. See the Taxonomy pages for the genetic code values.

Example FASTA

>Sc_16 [organism=Saccharomyces cerevisiae]tataggcgaatcgagtatattattttttctcaacatatgtatatgaacatgagaatatatttataggaatgtataaaattgtgacctctcctgctattttagttactgattttatgtatgtagggggaataggggctgcctttcttaatgcagttttaattttttcttttaattttttcttagtaaaattatttaaagtaaagattaatggaataaccattgcgcttttttttacagtttttggtttttcattttttggaaaaaatattttaaatattttacctttttatttagggggtattttatatagtatctatacttcaacagatttttctgaacatatagttcctattgctttttcaagtgcattagccccttttgtaagcagtgttgctttttatggagaaatatcctatgaaacatcatatataaatgcaattttaattggtattttaattggttttatagtggttcctttgtctaaaagtctttatgactttcatgagggatatgatttatataatttaggttttacagcaggtt

Feature table format (.tbl)

tbl2asn reads features from a five-column tab-delimited table called aFeature table. Thefeature table specifies the location and type of each feature. tbl2asnwill process the feature intervals and translate any CDSs intoproteins. The first line of the table should contain the followinginformation:

>Features SeqID table_name

乐动体育官方网站The SeqID must match the nucleotide sequence SeqID in the corresponding .fsa file.

Example Feature Table

>Feature Sc_16 Table169      543    gene                        gene       sde3p69      543    CDS                        product SDE3P                        protein_id     WS1030

Quality scores table format (.qvl)

  • Provides Phrap/Consed quality scores.
  • Has a defline with the corresponding SeqID from the .fsa file.
  • Generates Seq-graph data that will be included with the nucleotide sequence of the .fsa file in the final .sqn file.
  • The quality scores appear below the sequence in the .sqn file, and are shown in the Quality format option when the .sqn file is viewed in Sequin.
    >Sc_1651 63 70 82 82 82 90 90 90 90 86 8686 86 86 86 90 90 90 90 90 86 86 78...

Protein sequence format (.pep)

  • This file is not usually needed because GenBank generally presents on the conceptual translation of the nucleotide sequence, which will be automatically generated by tbl2asn.
  • This file will substitute the automatically translated products of the CDS features with the provided protein sequences, so is only needed in unusual cases.
  • It is FASTA file of the protein sequence, where the SeqID must match protein_id in the .tbl file

Example FASTA

>WS1030 [gene=sde3p] [protein=SDE3P]MYKIVTSPAILVTDFMYVGGIGAAFLNAVLIFSFNFFLVKLFKVKINGITIAAFFTVFGFSFFGKNILNILPFYLGGILYSIYTSTDFSEHIVPIAFSSALAPFVSSVAFYGEISYETSYINAILIGILIGFIVVPLSKSLYDFHEGYDLYNLGFTAG

Source table format (.src)

For sets of sequences, especially those with different sources, atab-delimited source modifier tablefile can be created, with a name that has a .src extension. The firstcolumn in the file must be the SeqIDs of the sequences. The first rowgives the names of the source qualifiers being added, separated bytabs. Any additional rows list the SeqID and source qualifiers foreach sequence in the corresponding .fsa file.

SeqID     organism     strain     isolateSc_16     Zea mays     A69Y       JH90.6-2x12

Tbl2asn Update Notification

To receive email notification about updates to tbl2asn, as well as a description of what is included in the update follow these directions.

Support Center

Last updated: 2016-11-13T22:53:44Z