Options
All
  • Public
  • Public/Protected
  • All
Menu

Class Seq

Seq Provide objects to represent biological sequences with alphabets.

Hierarchy

Index

Constructors

constructor

  • new Seq(sequence: string, alphabet?: ALPHABET, ignoredCharacters?: string[]): Seq
  • Create a new Seq object.

    throws

    an error if the alphabet is declared, but it doesn't match the sequence.

    Parameters

    • sequence: string

      A representation of the sequence as a string.

    • Optional alphabet: ALPHABET

      The type of sequence.

    • Default value ignoredCharacters: string[] = ['-']

      Characters that should be ignored in the sequence such as a gap "-".

    Returns Seq

Properties

alphabet

alphabet: ALPHABET | undefined

ignoredCharacters

ignoredCharacters: string[]

sequence

sequence: string

Methods

back_transcribe

  • back_transcribe(): Seq
  • back_transcribe() Return the DNA sequence from an RNA sequence by creating a new Seq object

    Returns Seq

binaryRepresentation

  • binaryRepresentation(additionalAcceptedCharacters?: string[]): number[]
  • Returns a binary representation of this sequence.

    Characters not in the alphabet (20 single letter amino acids and 4 nucleotide characters) or that are not a dash are ignored by default. This means that the array for these positions will be all zeros. Other characters will be included if added to the "additionalAcceptedCharacters" array parameter.

    Parameters

    • Default value additionalAcceptedCharacters: string[] = ['-']

      characters that should be considered valid and included in the return array.

    Returns number[]

    a binary array that is a concatenation of each position's individual binary array: for proteins - each position is represented by a binary array of length 20 plus the number of additional characters (parameter). A single one of the indices will be one and the rest zero. for oligos - each position is represented by a binary array of length 4 plus the number of additional characters (parameter). A single one of the indices will be one and the rest zero. NOTE: the index that is set to 1 is arbitrary but will be consistent with each function call - it is set from indexing the strings in IUPACData: protein_letters, unambiguous_dna_letters and unambiguous_rna_letters

complement

  • complement(reverse?: boolean): Seq
  • complement() Return the complement sequence by creating a new Seq object.

    throws

    errors if this sequence is not valid DNA or RNA.

    Parameters

    • Default value reverse: boolean = false

      if true, will return the sequence reversed

    Returns Seq

    a new sequence complemented from this sequence

determineAlphabet

  • determineAlphabet(ignoredCharacters?: string[]): ALPHABET | undefined
  • Determine the alphabet of the sequence. If this.alphabet is already set, return it directly, otherwise attempt to predict the alphabet. Not perfect - for example if the sequence only has 'agc' then it returns DNA even though this would be a valid protein or RNA. Preference order is DNA, RNA, then protein. Only canonical nucleotides and protein sequences are evaluated (not ambiguous).

    Parameters

    • Default value ignoredCharacters: string[] = ['-']

    Returns ALPHABET | undefined

equal

  • equal(seqToCompare: Seq): boolean
  • Test for equality between two sequences - case insensitive. this.ignoredCharacters are not evaluated in the equality comparison.

    Parameters

    • seqToCompare: Seq

      the sequence to compare this object to

    Returns boolean

    true if the sequences and their alphabets are equal, false otherwise.

integerRepresentation

  • integerRepresentation(additionalAcceptedCharacters?: string[]): number[]
  • integerRepresentation Returns an integer representation of this sequence. Each index in the array represents a single position. ** The values have no meaning other than to check for equality in a custom hamming distance function. **

    Characters not found will be represented as -1 in the returned array

    Parameters

    • Default value additionalAcceptedCharacters: string[] = ['-']

      characters to consider valid and add to give returned array.

    Returns number[]

isValidDNA

  • isValidDNA(ignoredCharacters?: string[]): boolean
  • Parameters

    • Default value ignoredCharacters: string[] = this.ignoredCharacters

    Returns boolean

isValidProtein

  • isValidProtein(ignoredCharacters?: string[]): boolean
  • Parameters

    • Default value ignoredCharacters: string[] = this.ignoredCharacters

    Returns boolean

isValidRNA

  • isValidRNA(ignoredCharacters?: string[]): boolean
  • Parameters

    • Default value ignoredCharacters: string[] = this.ignoredCharacters

    Returns boolean

lower

reverse_complement

  • reverse_complement(): Seq
  • reverse_complement() Return the reverse complement sequence by creating a new Seq object.

    Returns Seq

subSequence

  • subSequence(startIdx?: number, endIdx?: undefined | number): Seq
  • Parameters

    • Default value startIdx: number = 0
    • Optional endIdx: undefined | number

    Returns Seq

toString

  • toString(): string

transcribe

  • transcribe(): Seq
  • transcribe() Return the RNA sequence from a DNA sequence by creating a new Seq object

    Returns Seq

translate

  • translate(stopSymbol?: string, toStop?: boolean, cds?: boolean): Seq
  • translate(table, stop_symbol='*', to_stop=False, cds=False, gap=None) Turn a nucleotide sequence into a protein sequence by creating a new Seq object TODO: implement "gap" parameter.

    Parameters

    • Default value stopSymbol: string = "*"
    • Default value toStop: boolean = false
    • Default value cds: boolean = false

    Returns Seq

upper

Static Private checkSequenceValidity

  • checkSequenceValidity(sequence: string, validSequenceString: string, ignoredCharacters: string[]): boolean
  • checkSequenceValidity This convenience method will check whether a sequence contains only the characters in a particular string. Useful for asking whether a sequence contains only valid amino acids or nucleotides. Case insensitive.

    Parameters

    • sequence: string

      the sequence to evaluate.

    • validSequenceString: string

      a string that contains all the valid letters for this sequence e.g., IUPACData protein_letters.

    • ignoredCharacters: string[]

      a list of characters to ignore in determining whether the sequence is valid e.g., a dash that represents a gap.

    Returns boolean

Static fromSeqOpts

Generated using TypeDoc