Class: Bio::GFF::GFF3

Inherits:
Bio::GFF show all
Includes:
Escape
Defined in:
lib/bio/db/gff.rb

Overview

DESCRIPTION

Represents version 3 of GFF specification. For more information on version GFF3, see song.sourceforge.net/gff3.shtml – obsolete URL: flybase.bio.indiana.edu/annot/gff3.html ++

Defined Under Namespace

Modules: Escape Classes: Record, RecordBoundary, SequenceRegion

Constant Summary

VERSION =
3
MetaData =

stores GFF3 MetaData

GFF2::MetaData

Constants included from Escape

Escape::UNSAFE, Escape::UNSAFE_ATTRIBUTE, Escape::UNSAFE_SEQID

Instance Attribute Summary (collapse)

Attributes inherited from Bio::GFF

#records

Instance Method Summary (collapse)

Constructor Details

- (GFF3) initialize(str = nil)

Creates a Bio::GFF::GFF3 object by building a collection of Bio::GFF::GFF3::Record (and metadata) objects.


Arguments:

  • str: string in GFF format

Returns

Bio::GFF object



874
875
876
877
878
879
880
881
882
# File 'lib/bio/db/gff.rb', line 874

def initialize(str = nil)
  @gff_version = nil
  @records = []
  @sequence_regions = []
  @metadata = []
  @sequences = []
  @in_fasta = false
  parse(str) if str
end

Instance Attribute Details

- (Object) gff_version (readonly)

GFF3 version string (String or nil). nil means “3”.



885
886
887
# File 'lib/bio/db/gff.rb', line 885

def gff_version
  @gff_version
end

- (Object) metadata

Metadata (except “##sequence-region”, “##gff-version”, “###”). Must be an array of Bio::GFF::GFF3::MetaData objects.



893
894
895
# File 'lib/bio/db/gff.rb', line 893

def 
  @metadata
end

- (Object) sequence_regions

Metadata of “##sequence-region”. Must be an array of Bio::GFF::GFF3::SequenceRegion objects.



889
890
891
# File 'lib/bio/db/gff.rb', line 889

def sequence_regions
  @sequence_regions
end

- (Object) sequences

Sequences bundled within GFF3. Must be an array of Bio::Sequence objects.



897
898
899
# File 'lib/bio/db/gff.rb', line 897

def sequences
  @sequences
end

Instance Method Details

- (Object) parse(str)

Parses a GFF3 entries, and concatenated the parsed data.

Note that after “##FASTA” line is given, only fasta-formatted text is accepted.


Arguments:

  • str: string in GFF format

Returns

self



908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
# File 'lib/bio/db/gff.rb', line 908

def parse(str)
  # if already after the ##FASTA line, parses fasta format and return
  if @in_fasta then
    parse_fasta(str)
    return self
  end

  if str.respond_to?(:gets) then
    # str is a IO-like object
    fst = nil
  else
    # str is a String
    gff, sep, fst = str.split(/^(\>|##FASTA.*)/n, 2)
    fst = sep + fst if sep == '>' and fst
    str = gff
  end

  # parses GFF lines
  str.each_line do |line|
    if /^\#\#([^\s]+)/ =~ line then
      ($1, line)
      parse_fasta(str) if @in_fasta
    elsif /^\>/ =~ line then
      @in_fasta = true
      parse_fasta(str, line)
    else
      @records << GFF3::Record.new(line)
    end
  end

  # parses fasta format when str is a String and fasta data exists
  if fst then
    @in_fasta = true
    parse_fasta(fst)
  end

  self
end

- (Object) to_s

string representation of whole entry.



963
964
965
966
967
968
969
970
971
972
973
974
975
976
# File 'lib/bio/db/gff.rb', line 963

def to_s
  ver = @gff_version || VERSION.to_s
  if @sequences.size > 0 then
    seqs = "##FASTA\n" +
      @sequences.collect { |s| s.to_fasta(s.entry_id, 70) }.join('')
  else
    seqs = ''
  end

  ([ "##gff-version #{escape(ver)}\n" ] +
   @metadata.collect { |m| m.to_s } +
   @sequence_regions.collect { |m| m.to_s } +
   @records.collect{ |r| r.to_s }).join('') + seqs
end