Class: Bio::FlatFile

Inherits:
Object show all
Includes:
Enumerable
Defined in:
lib/bio/io/flatfile.rb,
lib/bio/io/flatfile/buffer.rb,
lib/bio/io/flatfile/splitter.rb,
lib/bio/io/flatfile/autodetection.rb

Overview

Bio::FlatFile is a helper and wrapper class to read a biological data file. It acts like a IO object. It can automatically detect data format, and users do not need to tell the class what the data is.

Defined Under Namespace

Modules: Splitter Classes: AutoDetect, BufferedInputStream, UnknownDataFormatError

Instance Attribute Summary (collapse)

Class Method Summary (collapse)

Instance Method Summary (collapse)

Constructor Details

- (FlatFile) initialize(dbclass, stream)

Same as FlatFile.open, except that 'stream' should be a opened stream object (IO, File, …, who have the 'gets' method).

  • Example 1

    Bio::FlatFile.new(Bio::GenBank, ARGF)
  • Example 2

    Bio::FlatFile.new(Bio::GenBank, IO.popen("gzip -dc nc1101.flat.gz"))

Compatibility Note: Now, you cannot specify “:raw => true” or “:raw => false”. Below styles are DEPRECATED.

  • Example 3 (deprecated)

    # Bio::FlatFile.new(nil, $stdin, :raw=>true) # => ERROR
    # Please rewrite as below.
    ff = Bio::FlatFile.new(nil, $stdin)
    ff.raw = true
  • Example 3 in old style (deprecated)

    # Bio::FlatFile.new(nil, $stdin, true) # => ERROR
    # Please rewrite as below.
    ff = Bio::FlatFile.new(nil, $stdin)
    ff.raw = true


225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
# File 'lib/bio/io/flatfile.rb', line 225

def initialize(dbclass, stream)
  # 2nd arg: IO object
  if stream.kind_of?(BufferedInputStream)
    @stream = stream
  else
    @stream = BufferedInputStream.for_io(stream)
  end
  # 1st arg: database class (or file format autodetection)
  if dbclass then
	self.dbclass = dbclass
  else
	autodetect
  end
  #
  @skip_leader_mode = :firsttime
  @firsttime_flag = true
  # default raw mode is false
  self.raw = false
end

Instance Attribute Details

- (Object) dbclass

Returns database class which is automatically detected or given in FlatFile#initialize.



421
422
423
# File 'lib/bio/io/flatfile.rb', line 421

def dbclass
  @dbclass
end

- (Object) entry (readonly)

Returns the value of attribute entry



299
300
301
# File 'lib/bio/io/flatfile.rb', line 299

def entry
  @entry
end

- (Object) raw

If true, raw mode.



391
392
393
# File 'lib/bio/io/flatfile.rb', line 391

def raw
  @raw
end

- (Object) skip_leader_mode

The mode how to skip leader of the data.

:firsttime

(DEFAULT) only head of file (= first time to read)

:everytime

everytime to read entry

nil

never skip



249
250
251
# File 'lib/bio/io/flatfile.rb', line 249

def skip_leader_mode
  @skip_leader_mode
end

Class Method Details

+ (Object) auto(*arg, &block)

Same as Bio::FlatFile.open(nil, filename_or_stream, mode, perm, options).

  • Example 1

    Bio::FlatFile.auto(ARGF)
  • Example 2

    Bio::FlatFile.auto("embl/est_hum17.dat")
  • Example 3

    Bio::FlatFile.auto(IO.popen("gzip -dc nc1101.flat.gz"))


122
123
124
# File 'lib/bio/io/flatfile.rb', line 122

def self.auto(*arg, &block)
  self.open(nil, *arg, &block)
end

+ (Object) autodetect(text)

Detects database class (== file format) of given string. If fails to determine, returns false or nil.



460
461
462
# File 'lib/bio/io/flatfile.rb', line 460

def self.autodetect(text)
  AutoDetect.default.autodetect(text)
end

+ (Object) autodetect_file(filename)

Detects database class (== file format) of given file. If fails to determine, returns nil.



440
441
442
# File 'lib/bio/io/flatfile.rb', line 440

def self.autodetect_file(filename)
  self.open_file(filename).dbclass
end

+ (Object) autodetect_io(io)

Detects database class (== file format) of given input stream. If fails to determine, returns nil. Caution: the method reads some data from the input stream, and the data will be lost.



448
449
450
# File 'lib/bio/io/flatfile.rb', line 448

def self.autodetect_io(io)
  self.new(nil, io).dbclass
end

+ (Object) autodetect_stream(io)

This is OBSOLETED. Please use autodetect_io(io) instead.



453
454
455
456
# File 'lib/bio/io/flatfile.rb', line 453

def self.autodetect_stream(io)
  $stderr.print "Bio::FlatFile.autodetect_stream will be deprecated." if $VERBOSE
  self.autodetect_io(io)
end

+ (Object) foreach(*arg)

Executes the block for every entry in the stream. Same as FlatFile.open(*arg) { |ff| ff.each { |entry| … }}.

  • Example

    Bio::FlatFile.foreach('test.fst') { |e| puts e.definition }


194
195
196
197
198
199
200
# File 'lib/bio/io/flatfile.rb', line 194

def self.foreach(*arg)
  self.open(*arg) do |flatfileobj|
    flatfileobj.each do |entry|
      yield entry
    end
  end
end

+ (Object) open(*arg, &block)

Bio::FlatFile.open(file, *arg)

Bio::FlatFile.open(dbclass, file, *arg)

Creates a new Bio::FlatFile object to read a file or a stream which contains dbclass data.

dbclass should be a class (or module) or nil. e.g. Bio::GenBank, Bio::FastaFormat.

If file is a filename (which doesn't have gets method), the method opens a local file named file with File.open(filename, *arg).

When dbclass is omitted or nil is given to dbclass, the method tries to determine database class (file format) automatically. When it fails to determine, dbclass is set to nil and FlatFile#next_entry would fail. You can still set dbclass using FlatFile#dbclass= method.

  • Example 1

    Bio::FlatFile.open(Bio::GenBank, "genbank/gbest40.seq")
  • Example 2

    Bio::FlatFile.open(nil, "embl/est_hum17.dat")
  • Example 3

    Bio::FlatFile.open("genbank/gbest40.seq")
  • Example 4

    Bio::FlatFile.open(Bio::GenBank, $stdin)

If it is called with a block, the block will be executed with a new Bio::FlatFile object. If filename is given, the file is automatically closed when leaving the block.

  • Example 5

    Bio::FlatFile.open(nil, 'test4.fst') do |ff|
        ff.each { |e| print e.definition, "\n" }
    end
  • Example 6

    Bio::FlatFile.open('test4.fst') do |ff|
        ff.each { |e| print e.definition, "\n" }
    end

Compatibility Note: *arg is completely passed to the File.open and you cannot specify “:raw => true” or “:raw => false”.



80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# File 'lib/bio/io/flatfile.rb', line 80

def self.open(*arg, &block)
  # FlatFile.open(dbclass, file, mode, perm)
  # FlatFile.open(file, mode, perm)
  if arg.size <= 0
    raise ArgumentError, 'wrong number of arguments (0 for 1)'
  end
  x = arg.shift
  if x.is_a?(Module) then
    # FlatFile.open(dbclass, filename_or_io, ...)
    dbclass = x
  elsif x.nil? then
    # FlatFile.open(nil, filename_or_io, ...)
    dbclass = nil
  else
    # FlatFile.open(filename, ...)
    dbclass = nil
    arg.unshift(x)
  end
  if arg.size <= 0
    raise ArgumentError, 'wrong number of arguments (1 for 2)'
  end
  file = arg.shift
  # check if file is filename or IO object
  unless file.respond_to?(:gets)
    # 'file' is a filename
    _open_file(dbclass, file, *arg, &block)
  else
    # 'file' is a IO object
    ff = self.new(dbclass, file)
    block_given? ? (yield ff) : ff
  end
end

+ (Object) open_file(filename, *arg)

Same as FlatFile.auto(filename, *arg), except that it only accept filename and doesn't accept IO object. File format is automatically determined.

It can accept a block. If a block is given, it returns the block's return value. Otherwise, it returns a new FlatFile object.



144
145
146
# File 'lib/bio/io/flatfile.rb', line 144

def self.open_file(filename, *arg)
  _open_file(nil, filename, *arg)
end

+ (Object) open_uri(uri, *arg)

Opens URI specified as uri. uri must be a String or URI object. *arg is passed to OpenURI.open_uri or URI#open.

Like FlatFile#open, it can accept a block.

Note that you MUST explicitly require 'open-uri'. Because open-uri.rb modifies existing class, it isn't required by default.



177
178
179
180
181
182
183
184
185
186
# File 'lib/bio/io/flatfile.rb', line 177

def self.open_uri(uri, *arg)
  if block_given? then
    BufferedInputStream.open_uri(uri, *arg) do |stream|
      yield self.new(nil, stream)
    end
  else
    stream = BufferedInputStream.open_uri(uri, *arg)
    self.new(nil, stream)
  end
end

+ (Object) to_a(*arg)

Same as FlatFile.auto(filename_or_stream, *arg).to_a

(This method might be OBSOLETED in the future.)



129
130
131
132
133
134
# File 'lib/bio/io/flatfile.rb', line 129

def self.to_a(*arg)
  self.auto(*arg) do |ff|
    raise 'cannot determine file format' unless ff.dbclass
    ff.to_a
  end
end

Instance Method Details

- (Object) autodetect(lines = 31, ad = AutoDetect.default)

Performs determination of database class (file format). Pre-reads lines lines for format determination (default 31 lines). If fails, returns nil or false. Otherwise, returns database class.

The method can be called anytime if you want (but not recommended). This might be useful if input file is a mixture of muitiple format data.



429
430
431
432
433
434
435
436
# File 'lib/bio/io/flatfile.rb', line 429

def autodetect(lines = 31, ad = AutoDetect.default)
  if r = ad.autodetect_flatfile(self, lines)
    self.dbclass = r
  else
    self.dbclass = nil unless self.dbclass
  end
  r
end

- (Object) close

Closes input stream. (similar to IO#close)



351
352
353
# File 'lib/bio/io/flatfile.rb', line 351

def close
  @stream.close
end

- (Object) each_entry Also known as: each

Iterates over each entry in the flatfile.

  • Example

    include Bio
    ff = FlatFile.open(GenBank, "genbank/gbhtg14.seq")
    ff.each_entry do |x|
      puts x.definition
    end


334
335
336
337
338
# File 'lib/bio/io/flatfile.rb', line 334

def each_entry
  while e = self.next_entry
	yield e
  end
end

- (Object) entry_ended_pos

(end position of the last entry) + 1



322
323
324
# File 'lib/bio/io/flatfile.rb', line 322

def entry_ended_pos
  @splitter.entry_ended_pos
end

- (Object) entry_pos_flag

a flag to write down entry start and end positions



307
308
309
# File 'lib/bio/io/flatfile.rb', line 307

def entry_pos_flag
  @splitter.entry_pos_flag
end

- (Object) entry_pos_flag=(x)

Sets flag to write down entry start and end positions



312
313
314
# File 'lib/bio/io/flatfile.rb', line 312

def entry_pos_flag=(x)
  @splitter.entry_pos_flag = x
end

- (Object) entry_raw

Returns the last raw entry as a string.



302
303
304
# File 'lib/bio/io/flatfile.rb', line 302

def entry_raw
  @splitter.entry
end

- (Object) entry_start_pos

start position of the last entry



317
318
319
# File 'lib/bio/io/flatfile.rb', line 317

def entry_start_pos
  @splitter.entry_start_pos
end

- (Boolean) eof?

Returns true if input stream is end-of-file. Otherwise, returns false. (Similar to IO#eof?, but may not be equal to io.eof?, because FlatFile has its own internal buffer.)

Returns:

  • (Boolean)


380
381
382
# File 'lib/bio/io/flatfile.rb', line 380

def eof?
  @stream.eof?
end

- (Object) gets(*arg)

Similar to IO#gets. Internal use only. Users should not call it directly.



395
396
397
# File 'lib/bio/io/flatfile.rb', line 395

def gets(*arg)
  @stream.gets(*arg)
end

- (Object) io

(DEPRECATED) IO object in the flatfile object.

Compatibility Note: Bio::FlatFile#io is deprecated. Please use Bio::FlatFile#to_io instead.



255
256
257
258
# File 'lib/bio/io/flatfile.rb', line 255

def io
  warn "Bio::FlatFile#io is deprecated."
  @stream.to_io
end

- (Object) next_entry

Get next entry.



277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
# File 'lib/bio/io/flatfile.rb', line 277

def next_entry
  raise UnknownDataFormatError, 
  'file format auto-detection failed?' unless @dbclass
  if @skip_leader_mode and
      ((@firsttime_flag and @skip_leader_mode == :firsttime) or
         @skip_leader_mode == :everytime)
    @splitter.skip_leader
  end
  if raw then
    r = @splitter.get_entry
  else
    r = @splitter.get_parsed_entry
  end
  @firsttime_flag = false
  return nil unless r
  if raw then
	r
  else
    @entry = r
    @entry
  end
end

- (Object) path

Pathname, filename or URI (or nil).



268
269
270
# File 'lib/bio/io/flatfile.rb', line 268

def path
  @stream.path
end

- (Object) pos

Returns current position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos. Note that it will not be equal to io.pos, because FlatFile has its own internal buffer.



361
362
363
# File 'lib/bio/io/flatfile.rb', line 361

def pos
  @stream.pos
end

- (Object) pos=(p)

(Not recommended to use it.) Sets position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos=. Note that it will not be equal to io.pos=, because FlatFile has its own internal buffer.



372
373
374
# File 'lib/bio/io/flatfile.rb', line 372

def pos=(p)
  @stream.pos=(p)
end

- (Object) rewind

Resets file pointer to the start of the flatfile. (similar to IO#rewind)



343
344
345
346
347
# File 'lib/bio/io/flatfile.rb', line 343

def rewind
  r = (@splitter || @stream).rewind
  @firsttime_flag = true
  r
end

- (Object) to_io

IO object in the flatfile object.

Compatibility Note: Bio::FlatFile#io is deprecated.



263
264
265
# File 'lib/bio/io/flatfile.rb', line 263

def to_io
  @stream.to_io
end