Module: Mspire::Fasta

Defined in:
lib/mspire/fasta.rb

Overview

A convenience class for working with fasta formatted sequence databases. the file which includes this class also includes Enumerable with Bio::FlatFile so you can do things like this:

accessions = Mspire::Fasta.open("file.fasta") do |fasta| 
  fasta.map(&:accession)
end

A few aliases are added to Bio::FastaFormat

entry.header == entry.definition
entry.sequence == entry.seq

Mspire::Fasta.new accepts both an IO object or a String (a fasta formatted string itself)

# taking an io object:
File.open("file.fasta") do |io| 
  fasta = Mspire::Fasta.new(io)
  ... do something with it
end
# taking a string
string = ">id1 a simple header\nAAASDDEEEDDD\n>id2 header again\nPPPPPPWWWWWWTTTTYY\n"
fasta = Mspire::Fasta.new(string)
(simple, not_simple) = fasta.partition {|entry| entry.header =~ /simple/ }

Class Method Summary (collapse)

Class Method Details

+ (Object) foreach(file, &block)

yields each Bio::FastaFormat object in turn



48
49
50
51
52
53
# File 'lib/mspire/fasta.rb', line 48

def self.foreach(file, &block)
  block or return enum_for(__method__, file)
  Bio::FlatFile.open(Bio::FastaFormat, file) do |fasta|
    fasta.each(&block)
  end
end

+ (Object) new(io)

takes an IO object or a string that is the fasta data itself



56
57
58
59
# File 'lib/mspire/fasta.rb', line 56

def self.new(io)
  io = StringIO.new(io) if io.is_a?(String)
  Bio::FlatFile.new(Bio::FastaFormat, io)
end

+ (Object) open(file, &block)

opens the flatfile and yields a Bio::FlatFile object



43
44
45
# File 'lib/mspire/fasta.rb', line 43

def self.open(file, &block)
  Bio::FlatFile.open(Bio::FastaFormat, file, &block)
end

+ (Object) uniprot_id(header)

takes the header string and returns the uniprot id

'sp|Q04917|1433F_HUMAN' #=> 'Q04917'

This can also be found with BioFastaFormat#accession (but it may be much slower)



66
67
68
# File 'lib/mspire/fasta.rb', line 66

def self.uniprot_id(header)
  header[/^[^\|]+\|([^\|]+)\|/, 1]
end